* xfs trace in 4.4.2 @ 2016-02-20 8:02 Stefan Priebe 2016-02-20 14:45 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe @ 2016-02-20 8:02 UTC (permalink / raw) To: xfs@oss.sgi.com; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com Hi, got this one today. Not sure if this is a bug. [67674.907736] ------------[ cut here ]------------ [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage+0xa9/0xe0() [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ sas scsi_transport_sas pps_core [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 [67675.277120] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 0000000000000001 [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 ffff88007950fae8 [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 ffffea00208834a0 [67675.506112] Call Trace: [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 [67676.172075] [<ffffffffa3166160>] ? mem_cgroup_shrink_node_zone+0x150/0x150 [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 [67676.433499] ---[ end trace cb1827fe308f7f6b ]--- Greets Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 2016-02-20 8:02 xfs trace in 4.4.2 Stefan Priebe @ 2016-02-20 14:45 ` Brian Foster 2016-02-20 18:02 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-02-20 14:45 UTC (permalink / raw) To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > Hi, > > got this one today. Not sure if this is a bug. > That looks like the releasepage() delayed allocation block warning. I'm not sure we've had any fixes for (or reports of) that issue since the v4.2 timeframe. What is the xfs_info of the associated filesystem? Also, do you have any insight as to the possible reproducer application or workload? Is this reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning won't fire again regardless until after a reboot. Brian > [67674.907736] ------------[ cut here ]------------ > [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 > xfs_vm_releasepage+0xa9/0xe0() > [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT > nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl > es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd > sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg > handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit > ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ > sas scsi_transport_sas pps_core > [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 > [67675.277120] Hardware name: Supermicro > X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 > [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 > 0000000000000001 > [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 > ffff88007950fae8 > [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 > ffffea00208834a0 > [67675.506112] Call Trace: > [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 > [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 > [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 > [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 > [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 > [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 > [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 > [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 > [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 > [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 > [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 > [67676.172075] [<ffffffffa3166160>] ? > mem_cgroup_shrink_node_zone+0x150/0x150 > [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 > [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 > [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 > [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 > [67676.433499] ---[ end trace cb1827fe308f7f6b ]--- > > Greets Stefan > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 2016-02-20 14:45 ` Brian Foster @ 2016-02-20 18:02 ` Stefan Priebe - Profihost AG 2016-03-04 18:47 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-02-20 18:02 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com > Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >> Hi, >> >> got this one today. Not sure if this is a bug. > > That looks like the releasepage() delayed allocation block warning. I'm > not sure we've had any fixes for (or reports of) that issue since the > v4.2 timeframe. > > What is the xfs_info of the associated filesystem? Also, do you have any > insight as to the possible reproducer application or workload? Is this > reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning > won't fire again regardless until after a reboot. Sorry no reproducer and also no xfs Info. As i didn't know which fs this was. But the job running is doing: mount /dev/loop0p3 /mpt xfs_repair -n /mpt unount /mpt Stefan > > Brian > >> [67674.907736] ------------[ cut here ]------------ >> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 >> xfs_vm_releasepage+0xa9/0xe0() >> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT >> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl >> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd >> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg >> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit >> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ >> sas scsi_transport_sas pps_core >> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 >> [67675.277120] Hardware name: Supermicro >> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 >> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 >> 0000000000000001 >> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 >> ffff88007950fae8 >> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 >> ffffea00208834a0 >> [67675.506112] Call Trace: >> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 >> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 >> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 >> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 >> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 >> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 >> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 >> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 >> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 >> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 >> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 >> [67676.172075] [<ffffffffa3166160>] ? >> mem_cgroup_shrink_node_zone+0x150/0x150 >> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 >> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 >> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >> [67676.433499] ---[ end trace cb1827fe308f7f6b ]--- >> >> Greets Stefan >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-02-20 18:02 ` Stefan Priebe - Profihost AG @ 2016-03-04 18:47 ` Stefan Priebe 2016-03-04 19:13 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe @ 2016-03-04 18:47 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >> >>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>> Hi, >>> >>> got this one today. Not sure if this is a bug. >> >> That looks like the releasepage() delayed allocation block warning. I'm >> not sure we've had any fixes for (or reports of) that issue since the >> v4.2 timeframe. >> >> What is the xfs_info of the associated filesystem? Also, do you have any >> insight as to the possible reproducer application or workload? Is this >> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning >> won't fire again regardless until after a reboot. Toda i got this one running 4.3.3. [154152.949610] ------------[ cut here ]------------ [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage+0xc3/0xf0() [154152.952596] Modules linked in: netconsole mpt3sas raid_class nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas megaraid_sas pps_core [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 [154152.964625] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a 03/06/2012 [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f 0000000000000000 [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 0000000000000000 [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 ffffea0001e7bfe0 [154152.972447] Call Trace: [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 [154152.984380] [<ffffffffa7166ea0>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- This time with an xfs info: # xfs_info / meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 agcount=4, agsize=58224256 blks = sectsz=512 attr=2, projid32bit=0 = crc=0 finobt=0 data = bsize=4096 blocks=232897024, imaxpct=25 = sunit=64 swidth=384 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=113728, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 > >> >> Brian >> >>> [67674.907736] ------------[ cut here ]------------ >>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 >>> xfs_vm_releasepage+0xa9/0xe0() >>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT >>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl >>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd >>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg >>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit >>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ >>> sas scsi_transport_sas pps_core >>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 >>> [67675.277120] Hardware name: Supermicro >>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 >>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 >>> 0000000000000001 >>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 >>> ffff88007950fae8 >>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 >>> ffffea00208834a0 >>> [67675.506112] Call Trace: >>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 >>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 >>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 >>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 >>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 >>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 >>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 >>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 >>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 >>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 >>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 >>> [67676.172075] [<ffffffffa3166160>] ? >>> mem_cgroup_shrink_node_zone+0x150/0x150 >>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 >>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 >>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]--- >>> >>> Greets Stefan >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-04 18:47 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe @ 2016-03-04 19:13 ` Brian Foster 2016-03-04 20:02 ` Stefan Priebe 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-03-04 19:13 UTC (permalink / raw) To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > > > >>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >> > >>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >>>Hi, > >>> > >>>got this one today. Not sure if this is a bug. > >> > >>That looks like the releasepage() delayed allocation block warning. I'm > >>not sure we've had any fixes for (or reports of) that issue since the > >>v4.2 timeframe. > >> > >>What is the xfs_info of the associated filesystem? Also, do you have any > >>insight as to the possible reproducer application or workload? Is this > >>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning > >>won't fire again regardless until after a reboot. > > Toda i got this one running 4.3.3. > > [154152.949610] ------------[ cut here ]------------ > [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 > xfs_vm_releasepage+0xc3/0xf0() > [154152.952596] Modules linked in: netconsole mpt3sas raid_class > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT > nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q > garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core > ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod > i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas > megaraid_sas pps_core > [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 > [154152.964625] Hardware name: Supermicro > X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a > 03/06/2012 > [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f > 0000000000000000 > [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 > 0000000000000000 > [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 > ffffea0001e7bfe0 > [154152.972447] Call Trace: > [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 > [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 > [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 > [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 > [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 > [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 > [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 > [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 > [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 > [154152.984380] [<ffffffffa7166ea0>] ? > mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 > [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 > [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 > [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 > [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- > > This time with an xfs info: > # xfs_info / > meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 > agcount=4, agsize=58224256 blks > = sectsz=512 attr=2, projid32bit=0 > = crc=0 finobt=0 > data = bsize=4096 blocks=232897024, imaxpct=25 > = sunit=64 swidth=384 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > log =internal bsize=4096 blocks=113728, version=2 > = sectsz=512 sunit=64 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > Can you describe the workload to the filesystem? Brian > > > >> > >>Brian > >> > >>>[67674.907736] ------------[ cut here ]------------ > >>>[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 > >>>xfs_vm_releasepage+0xa9/0xe0() > >>>[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT > >>>nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl > >>>es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd > >>>sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg > >>>handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit > >>>ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ > >>>sas scsi_transport_sas pps_core > >>>[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 > >>>[67675.277120] Hardware name: Supermicro > >>>X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 > >>>[67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 > >>>0000000000000001 > >>>[67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 > >>>ffff88007950fae8 > >>>[67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 > >>>ffffea00208834a0 > >>>[67675.506112] Call Trace: > >>>[67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 > >>>[67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 > >>>[67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 > >>>[67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 > >>>[67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 > >>>[67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 > >>>[67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 > >>>[67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 > >>>[67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 > >>>[67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 > >>>[67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 > >>>[67676.172075] [<ffffffffa3166160>] ? > >>>mem_cgroup_shrink_node_zone+0x150/0x150 > >>>[67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 > >>>[67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 > >>>[67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 > >>>[67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 > >>>[67676.433499] ---[ end trace cb1827fe308f7f6b ]--- > >>> > >>>Greets Stefan > >>> > >>>_______________________________________________ > >>>xfs mailing list > >>>xfs@oss.sgi.com > >>>http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-04 19:13 ` Brian Foster @ 2016-03-04 20:02 ` Stefan Priebe 2016-03-04 21:03 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe @ 2016-03-04 20:02 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 04.03.2016 um 20:13 schrieb Brian Foster: > On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>> >>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>> >>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>> Hi, >>>>> >>>>> got this one today. Not sure if this is a bug. >>>> >>>> That looks like the releasepage() delayed allocation block warning. I'm >>>> not sure we've had any fixes for (or reports of) that issue since the >>>> v4.2 timeframe. >>>> >>>> What is the xfs_info of the associated filesystem? Also, do you have any >>>> insight as to the possible reproducer application or workload? Is this >>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning >>>> won't fire again regardless until after a reboot. >> >> Toda i got this one running 4.3.3. >> >> [154152.949610] ------------[ cut here ]------------ >> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 >> xfs_vm_releasepage+0xc3/0xf0() >> [154152.952596] Modules linked in: netconsole mpt3sas raid_class >> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT >> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q >> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core >> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod >> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas >> megaraid_sas pps_core >> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 >> [154152.964625] Hardware name: Supermicro >> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a >> 03/06/2012 >> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f >> 0000000000000000 >> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 >> 0000000000000000 >> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 >> ffffea0001e7bfe0 >> [154152.972447] Call Trace: >> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 >> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 >> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 >> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 >> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 >> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 >> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 >> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 >> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 >> [154152.984380] [<ffffffffa7166ea0>] ? >> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 >> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 >> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- >> >> This time with an xfs info: >> # xfs_info / >> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 >> agcount=4, agsize=58224256 blks >> = sectsz=512 attr=2, projid32bit=0 >> = crc=0 finobt=0 >> data = bsize=4096 blocks=232897024, imaxpct=25 >> = sunit=64 swidth=384 blks >> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >> log =internal bsize=4096 blocks=113728, version=2 >> = sectsz=512 sunit=64 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> > > Can you describe the workload to the filesystem? At the time of this trace the rsync backup of the fs has started. So the workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. Stefan > Brian > >>> >>>> >>>> Brian >>>> >>>>> [67674.907736] ------------[ cut here ]------------ >>>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 >>>>> xfs_vm_releasepage+0xa9/0xe0() >>>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT >>>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl >>>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd >>>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg >>>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit >>>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ >>>>> sas scsi_transport_sas pps_core >>>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 >>>>> [67675.277120] Hardware name: Supermicro >>>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 >>>>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 >>>>> 0000000000000001 >>>>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 >>>>> ffff88007950fae8 >>>>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 >>>>> ffffea00208834a0 >>>>> [67675.506112] Call Trace: >>>>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 >>>>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 >>>>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 >>>>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 >>>>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 >>>>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 >>>>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 >>>>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 >>>>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 >>>>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 >>>>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 >>>>> [67676.172075] [<ffffffffa3166160>] ? >>>>> mem_cgroup_shrink_node_zone+0x150/0x150 >>>>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 >>>>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >>>>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 >>>>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >>>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]--- >>>>> >>>>> Greets Stefan >>>>> >>>>> _______________________________________________ >>>>> xfs mailing list >>>>> xfs@oss.sgi.com >>>>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-04 20:02 ` Stefan Priebe @ 2016-03-04 21:03 ` Brian Foster 2016-03-04 21:15 ` Stefan Priebe 2016-03-05 22:48 ` Dave Chinner 0 siblings, 2 replies; 49+ messages in thread From: Brian Foster @ 2016-03-04 21:03 UTC (permalink / raw) To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > > Am 04.03.2016 um 20:13 schrieb Brian Foster: > >On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>> > >>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>> > >>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >>>>>Hi, > >>>>> > >>>>>got this one today. Not sure if this is a bug. > >>>> > >>>>That looks like the releasepage() delayed allocation block warning. I'm > >>>>not sure we've had any fixes for (or reports of) that issue since the > >>>>v4.2 timeframe. > >>>> > >>>>What is the xfs_info of the associated filesystem? Also, do you have any > >>>>insight as to the possible reproducer application or workload? Is this > >>>>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning > >>>>won't fire again regardless until after a reboot. > >> > >>Toda i got this one running 4.3.3. > >> > >>[154152.949610] ------------[ cut here ]------------ > >>[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 > >>xfs_vm_releasepage+0xc3/0xf0() > >>[154152.952596] Modules linked in: netconsole mpt3sas raid_class > >>nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT > >>nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q > >>garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core > >>ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod > >>i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas > >>megaraid_sas pps_core > >>[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 > >>[154152.964625] Hardware name: Supermicro > >>X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a > >>03/06/2012 > >>[154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f > >>0000000000000000 > >>[154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 > >>0000000000000000 > >>[154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 > >>ffffea0001e7bfe0 > >>[154152.972447] Call Trace: > >>[154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 > >>[154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 > >>[154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 > >>[154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 > >>[154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 > >>[154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 > >>[154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 > >>[154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 > >>[154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 > >>[154152.984380] [<ffffffffa7166ea0>] ? > >>mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > >>[154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 > >>[154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 > >>[154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 > >>[154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 > >>[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- > >> > >>This time with an xfs info: > >># xfs_info / > >>meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 > >>agcount=4, agsize=58224256 blks > >> = sectsz=512 attr=2, projid32bit=0 > >> = crc=0 finobt=0 > >>data = bsize=4096 blocks=232897024, imaxpct=25 > >> = sunit=64 swidth=384 blks > >>naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > >>log =internal bsize=4096 blocks=113728, version=2 > >> = sectsz=512 sunit=64 blks, lazy-count=1 > >>realtime =none extsz=4096 blocks=0, rtextents=0 > >> > > > >Can you describe the workload to the filesystem? > > At the time of this trace the rsync backup of the fs has started. So the > workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. > Interesting. The warning is associated with releasing a page that has a delayed allocation when it shouldn't. That means something had written to a file to cause the delalloc in the first place. Any idea what could have been writing at the time or shortly before the rsync read workload had kicked in? Brian > Stefan > > >Brian > > > >>> > >>>> > >>>>Brian > >>>> > >>>>>[67674.907736] ------------[ cut here ]------------ > >>>>>[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 > >>>>>xfs_vm_releasepage+0xa9/0xe0() > >>>>>[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT > >>>>>nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl > >>>>>es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd > >>>>>sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg > >>>>>handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit > >>>>>ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ > >>>>>sas scsi_transport_sas pps_core > >>>>>[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 > >>>>>[67675.277120] Hardware name: Supermicro > >>>>>X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 > >>>>>[67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 > >>>>>0000000000000001 > >>>>>[67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 > >>>>>ffff88007950fae8 > >>>>>[67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 > >>>>>ffffea00208834a0 > >>>>>[67675.506112] Call Trace: > >>>>>[67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 > >>>>>[67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 > >>>>>[67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 > >>>>>[67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 > >>>>>[67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 > >>>>>[67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 > >>>>>[67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 > >>>>>[67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 > >>>>>[67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 > >>>>>[67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 > >>>>>[67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 > >>>>>[67676.172075] [<ffffffffa3166160>] ? > >>>>>mem_cgroup_shrink_node_zone+0x150/0x150 > >>>>>[67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 > >>>>>[67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 > >>>>>[67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 > >>>>>[67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 > >>>>>[67676.433499] ---[ end trace cb1827fe308f7f6b ]--- > >>>>> > >>>>>Greets Stefan > >>>>> > >>>>>_______________________________________________ > >>>>>xfs mailing list > >>>>>xfs@oss.sgi.com > >>>>>http://oss.sgi.com/mailman/listinfo/xfs > >> > >>_______________________________________________ > >>xfs mailing list > >>xfs@oss.sgi.com > >>http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-04 21:03 ` Brian Foster @ 2016-03-04 21:15 ` Stefan Priebe 2016-03-05 22:48 ` Dave Chinner 1 sibling, 0 replies; 49+ messages in thread From: Stefan Priebe @ 2016-03-04 21:15 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 04.03.2016 um 22:03 schrieb Brian Foster: > On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >> >> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>> >>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>> >>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>> Hi, >>>>>>> >>>>>>> got this one today. Not sure if this is a bug. >>>>>> >>>>>> That looks like the releasepage() delayed allocation block warning. I'm >>>>>> not sure we've had any fixes for (or reports of) that issue since the >>>>>> v4.2 timeframe. >>>>>> >>>>>> What is the xfs_info of the associated filesystem? Also, do you have any >>>>>> insight as to the possible reproducer application or workload? Is this >>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning >>>>>> won't fire again regardless until after a reboot. >>>> >>>> Toda i got this one running 4.3.3. >>>> >>>> [154152.949610] ------------[ cut here ]------------ >>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 >>>> xfs_vm_releasepage+0xc3/0xf0() >>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class >>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT >>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q >>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core >>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod >>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas >>>> megaraid_sas pps_core >>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 >>>> [154152.964625] Hardware name: Supermicro >>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a >>>> 03/06/2012 >>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f >>>> 0000000000000000 >>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 >>>> 0000000000000000 >>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 >>>> ffffea0001e7bfe0 >>>> [154152.972447] Call Trace: >>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 >>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 >>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 >>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 >>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 >>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 >>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 >>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 >>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 >>>> [154152.984380] [<ffffffffa7166ea0>] ? >>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 >>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 >>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- >>>> >>>> This time with an xfs info: >>>> # xfs_info / >>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 >>>> agcount=4, agsize=58224256 blks >>>> = sectsz=512 attr=2, projid32bit=0 >>>> = crc=0 finobt=0 >>>> data = bsize=4096 blocks=232897024, imaxpct=25 >>>> = sunit=64 swidth=384 blks >>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >>>> log =internal bsize=4096 blocks=113728, version=2 >>>> = sectsz=512 sunit=64 blks, lazy-count=1 >>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>> >>> >>> Can you describe the workload to the filesystem? >> >> At the time of this trace the rsync backup of the fs has started. So the >> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. >> > > Interesting. The warning is associated with releasing a page that has a > delayed allocation when it shouldn't. That means something had written > to a file to cause the delalloc in the first place. Any idea what could > have been writing at the time or shortly before the rsync read workload > had kicked in? The systen itself is a lamp system so PHP and MySQL are running and may write data to files but at the time the trace happens the system was nearly idle but not completely. It was 3am. Stefan > > Brian > >> Stefan >> >>> Brian >>> >>>>> >>>>>> >>>>>> Brian >>>>>> >>>>>>> [67674.907736] ------------[ cut here ]------------ >>>>>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232 >>>>>>> xfs_vm_releasepage+0xa9/0xe0() >>>>>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT >>>>>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl >>>>>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd >>>>>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg >>>>>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit >>>>>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_ >>>>>>> sas scsi_transport_sas pps_core >>>>>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1 >>>>>>> [67675.277120] Hardware name: Supermicro >>>>>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015 >>>>>>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1 >>>>>>> 0000000000000001 >>>>>>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587 >>>>>>> ffff88007950fae8 >>>>>>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0 >>>>>>> ffffea00208834a0 >>>>>>> [67675.506112] Call Trace: >>>>>>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64 >>>>>>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0 >>>>>>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20 >>>>>>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0 >>>>>>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0 >>>>>>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150 >>>>>>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50 >>>>>>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0 >>>>>>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0 >>>>>>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0 >>>>>>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930 >>>>>>> [67676.172075] [<ffffffffa3166160>] ? >>>>>>> mem_cgroup_shrink_node_zone+0x150/0x150 >>>>>>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0 >>>>>>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >>>>>>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70 >>>>>>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0 >>>>>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]--- >>>>>>> >>>>>>> Greets Stefan >>>>>>> >>>>>>> _______________________________________________ >>>>>>> xfs mailing list >>>>>>> xfs@oss.sgi.com >>>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-04 21:03 ` Brian Foster 2016-03-04 21:15 ` Stefan Priebe @ 2016-03-05 22:48 ` Dave Chinner 2016-03-05 22:58 ` Stefan Priebe ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: Dave Chinner @ 2016-03-05 22:48 UTC (permalink / raw) To: Brian Foster Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com, Stefan Priebe On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > > Am 04.03.2016 um 20:13 schrieb Brian Foster: > > >On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > > >>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > > >>> > > >>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > > >>>> > > >>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > > >>>>>Hi, > > >>>>> > > >>>>>got this one today. Not sure if this is a bug. > > >>>> > > >>>>That looks like the releasepage() delayed allocation block warning. I'm > > >>>>not sure we've had any fixes for (or reports of) that issue since the > > >>>>v4.2 timeframe. > > >>>> > > >>>>What is the xfs_info of the associated filesystem? Also, do you have any > > >>>>insight as to the possible reproducer application or workload? Is this > > >>>>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning > > >>>>won't fire again regardless until after a reboot. > > >> > > >>Toda i got this one running 4.3.3. > > >> > > >>[154152.949610] ------------[ cut here ]------------ > > >>[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 > > >>xfs_vm_releasepage+0xc3/0xf0() > > >>[154152.952596] Modules linked in: netconsole mpt3sas raid_class > > >>nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT > > >>nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q > > >>garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core > > >>ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod > > >>i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas > > >>megaraid_sas pps_core > > >>[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 > > >>[154152.964625] Hardware name: Supermicro > > >>X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a > > >>03/06/2012 > > >>[154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f > > >>0000000000000000 > > >>[154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 > > >>0000000000000000 > > >>[154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 > > >>ffffea0001e7bfe0 > > >>[154152.972447] Call Trace: > > >>[154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 > > >>[154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 > > >>[154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 > > >>[154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 > > >>[154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 > > >>[154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 > > >>[154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 > > >>[154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 > > >>[154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 > > >>[154152.984380] [<ffffffffa7166ea0>] ? > > >>mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > >>[154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 > > >>[154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 > > >>[154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 > > >>[154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 > > >>[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- > > >> > > >>This time with an xfs info: > > >># xfs_info / > > >>meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 > > >>agcount=4, agsize=58224256 blks > > >> = sectsz=512 attr=2, projid32bit=0 > > >> = crc=0 finobt=0 > > >>data = bsize=4096 blocks=232897024, imaxpct=25 > > >> = sunit=64 swidth=384 blks > > >>naming =version 2 bsize=4096 ascii-ci=0 ftype=0 > > >>log =internal bsize=4096 blocks=113728, version=2 > > >> = sectsz=512 sunit=64 blks, lazy-count=1 > > >>realtime =none extsz=4096 blocks=0, rtextents=0 > > >> > > > > > >Can you describe the workload to the filesystem? > > > > At the time of this trace the rsync backup of the fs has started. So the > > workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. > > > > Interesting. The warning is associated with releasing a page that has a > delayed allocation when it shouldn't. That means something had written > to a file to cause the delalloc in the first place. Any idea what could > have been writing at the time or shortly before the rsync read workload > had kicked in? It's memory reclaim that tripped over it, so the cause is long gone - couple have been anything in the previous 24 hours that caused the issue. i.e. rsync has triggered memory reclaim which triggered the warning, but I don't think rsync has anything to do with causing the page to be in a state that caused the warning. I'd be interested to know if there are any other warnings in the logs - stuff like IO errors, page discards, ENOSPC issues, etc that could trigger less travelled write error paths... -Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-05 22:48 ` Dave Chinner @ 2016-03-05 22:58 ` Stefan Priebe 2016-03-23 13:26 ` Stefan Priebe - Profihost AG 2016-03-23 13:28 ` Stefan Priebe - Profihost AG 2 siblings, 0 replies; 49+ messages in thread From: Stefan Priebe @ 2016-03-05 22:58 UTC (permalink / raw) To: Dave Chinner, Brian Foster Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 05.03.2016 um 23:48 schrieb Dave Chinner: > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>> >>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>> >>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> got this one today. Not sure if this is a bug. >>>>>>> >>>>>>> That looks like the releasepage() delayed allocation block warning. I'm >>>>>>> not sure we've had any fixes for (or reports of) that issue since the >>>>>>> v4.2 timeframe. >>>>>>> >>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any >>>>>>> insight as to the possible reproducer application or workload? Is this >>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning >>>>>>> won't fire again regardless until after a reboot. >>>>> >>>>> Toda i got this one running 4.3.3. >>>>> >>>>> [154152.949610] ------------[ cut here ]------------ >>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 >>>>> xfs_vm_releasepage+0xc3/0xf0() >>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class >>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT >>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q >>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core >>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod >>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas >>>>> megaraid_sas pps_core >>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 >>>>> [154152.964625] Hardware name: Supermicro >>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a >>>>> 03/06/2012 >>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f >>>>> 0000000000000000 >>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 >>>>> 0000000000000000 >>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 >>>>> ffffea0001e7bfe0 >>>>> [154152.972447] Call Trace: >>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 >>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 >>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 >>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 >>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 >>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 >>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 >>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 >>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 >>>>> [154152.984380] [<ffffffffa7166ea0>] ? >>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 >>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 >>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- >>>>> >>>>> This time with an xfs info: >>>>> # xfs_info / >>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 >>>>> agcount=4, agsize=58224256 blks >>>>> = sectsz=512 attr=2, projid32bit=0 >>>>> = crc=0 finobt=0 >>>>> data = bsize=4096 blocks=232897024, imaxpct=25 >>>>> = sunit=64 swidth=384 blks >>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >>>>> log =internal bsize=4096 blocks=113728, version=2 >>>>> = sectsz=512 sunit=64 blks, lazy-count=1 >>>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>>> >>>> >>>> Can you describe the workload to the filesystem? >>> >>> At the time of this trace the rsync backup of the fs has started. So the >>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. >>> >> >> Interesting. The warning is associated with releasing a page that has a >> delayed allocation when it shouldn't. That means something had written >> to a file to cause the delalloc in the first place. Any idea what could >> have been writing at the time or shortly before the rsync read workload >> had kicked in? > > It's memory reclaim that tripped over it, so the cause is long gone > - couple have been anything in the previous 24 hours that caused the > issue. i.e. rsync has triggered memory reclaim which triggered the > warning, but I don't think rsync has anything to do with causing the > page to be in a state that caused the warning. > > I'd be interested to know if there are any other warnings in the > logs - stuff like IO errors, page discards, ENOSPC issues, etc that > could trigger less travelled write error paths... No dmesg is absolutely clean. This hasn't happened with 4.1.18 before. It has started after upgrade from 4.1 to 4.4. Stefan > > -Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-05 22:48 ` Dave Chinner 2016-03-05 22:58 ` Stefan Priebe @ 2016-03-23 13:26 ` Stefan Priebe - Profihost AG 2016-03-23 13:28 ` Stefan Priebe - Profihost AG 2 siblings, 0 replies; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-03-23 13:26 UTC (permalink / raw) To: Dave Chinner, Brian Foster Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 05.03.2016 um 23:48 schrieb Dave Chinner: > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>> >>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>> >>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> got this one today. Not sure if this is a bug. >>>>>>> >>>>>>> That looks like the releasepage() delayed allocation block warning. I'm >>>>>>> not sure we've had any fixes for (or reports of) that issue since the >>>>>>> v4.2 timeframe. >>>>>>> >>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any >>>>>>> insight as to the possible reproducer application or workload? Is this >>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning >>>>>>> won't fire again regardless until after a reboot. >>>>> >>>>> Toda i got this one running 4.3.3. >>>>> >>>>> [154152.949610] ------------[ cut here ]------------ >>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 >>>>> xfs_vm_releasepage+0xc3/0xf0() >>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class >>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT >>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q >>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core >>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod >>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas >>>>> megaraid_sas pps_core >>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 >>>>> [154152.964625] Hardware name: Supermicro >>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a >>>>> 03/06/2012 >>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f >>>>> 0000000000000000 >>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 >>>>> 0000000000000000 >>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 >>>>> ffffea0001e7bfe0 >>>>> [154152.972447] Call Trace: >>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 >>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 >>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 >>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 >>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 >>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 >>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 >>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 >>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 >>>>> [154152.984380] [<ffffffffa7166ea0>] ? Mit freundlichen Grüßen Stefan Priebe Bachelor of Science in Computer Science (BSCS) Vorstand (CTO) ------------------------------- Profihost AG Expo Plaza 1 30539 Hannover Deutschland Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 URL: http://www.profihost.com | E-Mail: info@profihost.com Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827 Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350 Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender) >>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 >>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 >>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- >>>>> >>>>> This time with an xfs info: >>>>> # xfs_info / >>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 >>>>> agcount=4, agsize=58224256 blks >>>>> = sectsz=512 attr=2, projid32bit=0 >>>>> = crc=0 finobt=0 >>>>> data = bsize=4096 blocks=232897024, imaxpct=25 >>>>> = sunit=64 swidth=384 blks >>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >>>>> log =internal bsize=4096 blocks=113728, version=2 >>>>> = sectsz=512 sunit=64 blks, lazy-count=1 >>>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>>> >>>> >>>> Can you describe the workload to the filesystem? >>> >>> At the time of this trace the rsync backup of the fs has started. So the >>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. >>> >> >> Interesting. The warning is associated with releasing a page that has a >> delayed allocation when it shouldn't. That means something had written >> to a file to cause the delalloc in the first place. Any idea what could >> have been writing at the time or shortly before the rsync read workload >> had kicked in? > > It's memory reclaim that tripped over it, so the cause is long gone > - couple have been anything in the previous 24 hours that caused the > issue. i.e. rsync has triggered memory reclaim which triggered the > warning, but I don't think rsync has anything to do with causing the > page to be in a state that caused the warning. > > I'd be interested to know if there are any other warnings in the > logs - stuff like IO errors, page discards, ENOSPC issues, etc that > could trigger less travelled write error paths... This has happened again on 8 different hosts in the last 24 hours running 4.4.6. All of those are KVM / Qemu hosts and are doing NO I/O except the normal OS stuff as the VMs have remote storage. So no database, no rsync on those hosts - just the OS doing nearly nothing. All those show: [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 xfs_vm_releasepage+0xe2/0xf0() Stefan > > -Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-05 22:48 ` Dave Chinner 2016-03-05 22:58 ` Stefan Priebe 2016-03-23 13:26 ` Stefan Priebe - Profihost AG @ 2016-03-23 13:28 ` Stefan Priebe - Profihost AG 2016-03-23 14:07 ` Brian Foster 2 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-03-23 13:28 UTC (permalink / raw) To: Dave Chinner, Brian Foster Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com sorry new one the last one got mangled. Comments inside. Am 05.03.2016 um 23:48 schrieb Dave Chinner: > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>> >>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>> >>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> got this one today. Not sure if this is a bug. >>>>>>> >>>>>>> That looks like the releasepage() delayed allocation block warning. I'm >>>>>>> not sure we've had any fixes for (or reports of) that issue since the >>>>>>> v4.2 timeframe. >>>>>>> >>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any >>>>>>> insight as to the possible reproducer application or workload? Is this >>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning >>>>>>> won't fire again regardless until after a reboot. >>>>> >>>>> Toda i got this one running 4.3.3. >>>>> >>>>> [154152.949610] ------------[ cut here ]------------ >>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232 >>>>> xfs_vm_releasepage+0xc3/0xf0() >>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class >>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT >>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q >>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core >>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod >>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas >>>>> megaraid_sas pps_core >>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1 >>>>> [154152.964625] Hardware name: Supermicro >>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a >>>>> 03/06/2012 >>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f >>>>> 0000000000000000 >>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757 >>>>> 0000000000000000 >>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0 >>>>> ffffea0001e7bfe0 >>>>> [154152.972447] Call Trace: >>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84 >>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0 >>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20 >>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0 >>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50 >>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0 >>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0 >>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0 >>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970 >>>>> [154152.984380] [<ffffffffa7166ea0>] ? >>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0 >>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70 >>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100 >>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]--- >>>>> >>>>> This time with an xfs info: >>>>> # xfs_info / >>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256 >>>>> agcount=4, agsize=58224256 blks >>>>> = sectsz=512 attr=2, projid32bit=0 >>>>> = crc=0 finobt=0 >>>>> data = bsize=4096 blocks=232897024, imaxpct=25 >>>>> = sunit=64 swidth=384 blks >>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >>>>> log =internal bsize=4096 blocks=113728, version=2 >>>>> = sectsz=512 sunit=64 blks, lazy-count=1 >>>>> realtime =none extsz=4096 blocks=0, rtextents=0 >>>>> >>>> >>>> Can you describe the workload to the filesystem? >>> >>> At the time of this trace the rsync backup of the fs has started. So the >>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak. >>> >> >> Interesting. The warning is associated with releasing a page that has a >> delayed allocation when it shouldn't. That means something had written >> to a file to cause the delalloc in the first place. Any idea what could >> have been writing at the time or shortly before the rsync read workload >> had kicked in? > > It's memory reclaim that tripped over it, so the cause is long gone > - couple have been anything in the previous 24 hours that caused the > issue. i.e. rsync has triggered memory reclaim which triggered the > warning, but I don't think rsync has anything to do with causing the > page to be in a state that caused the warning. > > I'd be interested to know if there are any other warnings in the > logs - stuff like IO errors, page discards, ENOSPC issues, etc that > could trigger less travelled write error paths... This has happened again on 8 different hosts in the last 24 hours running 4.4.6. All of those are KVM / Qemu hosts and are doing NO I/O except the normal OS stuff as the VMs have remote storage. So no database, no rsync on those hosts - just the OS doing nearly nothing. All those show: [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 xfs_vm_releasepage+0xe2/0xf0() Stefan > > -Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-23 13:28 ` Stefan Priebe - Profihost AG @ 2016-03-23 14:07 ` Brian Foster 2016-03-24 8:10 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-03-23 14:07 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > sorry new one the last one got mangled. Comments inside. > > Am 05.03.2016 um 23:48 schrieb Dave Chinner: > > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > >> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > >>> Am 04.03.2016 um 20:13 schrieb Brian Foster: > >>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>>>>> > >>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>>>>> > >>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: ... > > This has happened again on 8 different hosts in the last 24 hours > running 4.4.6. > > All of those are KVM / Qemu hosts and are doing NO I/O except the normal > OS stuff as the VMs have remote storage. So no database, no rsync on > those hosts - just the OS doing nearly nothing. > > All those show: > [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > xfs_vm_releasepage+0xe2/0xf0() > Ok, well at this point the warning isn't telling us anything beyond you're reproducing the problem. We can't really make progress without more information. We don't necessarily know what application or operations caused this by the time it occurs, but perhaps knowing what file is affected could give us a hint. We have the xfs_releasepage tracepoint, but that's unconditional and so might generate a lot of noise by default. Could you enable the xfs_releasepage tracepoint and hunt for instances where delalloc != 0? E.g., we could leave a long running 'trace-cmd record -e "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the problem to occur. Alternatively (and maybe easier), run 'trace-cmd start -e "xfs:xfs_releasepage"' and leave something like 'cat /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > ~/trace.out' running to capture instances. If we can get a tracepoint hit, it will include the inode number and something like 'find / -inum <ino>' can point us at the file. Brian > Stefan > > > > > -Dave. > > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-23 14:07 ` Brian Foster @ 2016-03-24 8:10 ` Stefan Priebe - Profihost AG 2016-03-24 8:15 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-03-24 8:10 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 23.03.2016 um 15:07 schrieb Brian Foster: > On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >> sorry new one the last one got mangled. Comments inside. >> >> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>> >>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>> >>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > ... >> >> This has happened again on 8 different hosts in the last 24 hours >> running 4.4.6. >> >> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >> OS stuff as the VMs have remote storage. So no database, no rsync on >> those hosts - just the OS doing nearly nothing. >> >> All those show: >> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >> xfs_vm_releasepage+0xe2/0xf0() >> > > Ok, well at this point the warning isn't telling us anything beyond > you're reproducing the problem. We can't really make progress without > more information. We don't necessarily know what application or > operations caused this by the time it occurs, but perhaps knowing what > file is affected could give us a hint. > > We have the xfs_releasepage tracepoint, but that's unconditional and so > might generate a lot of noise by default. Could you enable the > xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > E.g., we could leave a long running 'trace-cmd record -e > "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > -e "xfs:xfs_releasepage"' and leave something like 'cat > /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > ~/trace.out' running to capture instances. > > If we can get a tracepoint hit, it will include the inode number and > something like 'find / -inum <ino>' can point us at the file. thanks - need to compile trace-cmd first. Do you know if and how it influences performance? Stefan > > Brian > >> Stefan >> >>> >>> -Dave. >>> >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-24 8:10 ` Stefan Priebe - Profihost AG @ 2016-03-24 8:15 ` Stefan Priebe - Profihost AG 2016-03-24 11:17 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-03-24 8:15 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > > Am 23.03.2016 um 15:07 schrieb Brian Foster: >> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>> sorry new one the last one got mangled. Comments inside. >>> >>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>> >>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>> >>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >> ... >>> >>> This has happened again on 8 different hosts in the last 24 hours >>> running 4.4.6. >>> >>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>> OS stuff as the VMs have remote storage. So no database, no rsync on >>> those hosts - just the OS doing nearly nothing. >>> >>> All those show: >>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>> xfs_vm_releasepage+0xe2/0xf0() >>> >> >> Ok, well at this point the warning isn't telling us anything beyond >> you're reproducing the problem. We can't really make progress without >> more information. We don't necessarily know what application or >> operations caused this by the time it occurs, but perhaps knowing what >> file is affected could give us a hint. >> >> We have the xfs_releasepage tracepoint, but that's unconditional and so >> might generate a lot of noise by default. Could you enable the >> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >> E.g., we could leave a long running 'trace-cmd record -e >> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >> -e "xfs:xfs_releasepage"' and leave something like 'cat >> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >> ~/trace.out' running to capture instances. Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the it in the trace.out even the WARN_ONCE was already triggered? Stefan > > Stefan > >> >> Brian >> >>> Stefan >>> >>>> >>>> -Dave. >>>> >>> >>> _______________________________________________ >>> xfs mailing list >>> xfs@oss.sgi.com >>> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-24 8:15 ` Stefan Priebe - Profihost AG @ 2016-03-24 11:17 ` Brian Foster 2016-03-24 12:17 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-03-24 11:17 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: > > Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > > > > Am 23.03.2016 um 15:07 schrieb Brian Foster: > >> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > >>> sorry new one the last one got mangled. Comments inside. > >>> > >>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: > >>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > >>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > >>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: > >>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>>>>>>>> > >>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>>>>>>>> > >>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >> ... > >>> > >>> This has happened again on 8 different hosts in the last 24 hours > >>> running 4.4.6. > >>> > >>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal > >>> OS stuff as the VMs have remote storage. So no database, no rsync on > >>> those hosts - just the OS doing nearly nothing. > >>> > >>> All those show: > >>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > >>> xfs_vm_releasepage+0xe2/0xf0() > >>> > >> > >> Ok, well at this point the warning isn't telling us anything beyond > >> you're reproducing the problem. We can't really make progress without > >> more information. We don't necessarily know what application or > >> operations caused this by the time it occurs, but perhaps knowing what > >> file is affected could give us a hint. > >> > >> We have the xfs_releasepage tracepoint, but that's unconditional and so > >> might generate a lot of noise by default. Could you enable the > >> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > >> E.g., we could leave a long running 'trace-cmd record -e > >> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > >> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > >> -e "xfs:xfs_releasepage"' and leave something like 'cat > >> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > >> ~/trace.out' running to capture instances. > > Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the > it in the trace.out even the WARN_ONCE was already triggered? > The tracepoint is independent from the warning (see xfs_vm_releasepage()), so the tracepoint will fire every invocation of the function regardless of whether delalloc blocks still exist at that point. That creates the need to filter the entries. With regard to performance, I believe the tracepoints are intended to be pretty lightweight. I don't think it should hurt to try it on a box, observe for a bit and make sure there isn't a huge impact. Note that the 'trace-cmd record' approach will save everything to file, so that's something to consider I suppose. Brian > Stefan > > > > > > Stefan > > > >> > >> Brian > >> > >>> Stefan > >>> > >>>> > >>>> -Dave. > >>>> > >>> > >>> _______________________________________________ > >>> xfs mailing list > >>> xfs@oss.sgi.com > >>> http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-24 11:17 ` Brian Foster @ 2016-03-24 12:17 ` Stefan Priebe - Profihost AG 2016-03-24 12:24 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-03-24 12:17 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 24.03.2016 um 12:17 schrieb Brian Foster: > On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >> >> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>> >>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>> sorry new one the last one got mangled. Comments inside. >>>>> >>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>> >>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>> ... >>>>> >>>>> This has happened again on 8 different hosts in the last 24 hours >>>>> running 4.4.6. >>>>> >>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>> those hosts - just the OS doing nearly nothing. >>>>> >>>>> All those show: >>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>> >>>> >>>> Ok, well at this point the warning isn't telling us anything beyond >>>> you're reproducing the problem. We can't really make progress without >>>> more information. We don't necessarily know what application or >>>> operations caused this by the time it occurs, but perhaps knowing what >>>> file is affected could give us a hint. >>>> >>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>> might generate a lot of noise by default. Could you enable the >>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>> E.g., we could leave a long running 'trace-cmd record -e >>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>> ~/trace.out' running to capture instances. >> >> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >> it in the trace.out even the WARN_ONCE was already triggered? >> > > The tracepoint is independent from the warning (see > xfs_vm_releasepage()), so the tracepoint will fire every invocation of > the function regardless of whether delalloc blocks still exist at that > point. That creates the need to filter the entries. > > With regard to performance, I believe the tracepoints are intended to be > pretty lightweight. I don't think it should hurt to try it on a box, > observe for a bit and make sure there isn't a huge impact. Note that the > 'trace-cmd record' approach will save everything to file, so that's > something to consider I suppose. Tests / cat is running. Is there any way to test if it works? Or is it enough that cat prints stuff from time to time but does not match -v delalloc 0 Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-24 12:17 ` Stefan Priebe - Profihost AG @ 2016-03-24 12:24 ` Brian Foster 2016-04-04 6:12 ` Stefan Priebe - Profihost AG 2016-05-11 12:26 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 49+ messages in thread From: Brian Foster @ 2016-03-24 12:24 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: > > Am 24.03.2016 um 12:17 schrieb Brian Foster: > > On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: > >> > >> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > >>> > >>> Am 23.03.2016 um 15:07 schrieb Brian Foster: > >>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > >>>>> sorry new one the last one got mangled. Comments inside. > >>>>> > >>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: > >>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > >>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > >>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: > >>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>>>>>>>>>> > >>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>>>>>>>>>> > >>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >>>> ... > >>>>> > >>>>> This has happened again on 8 different hosts in the last 24 hours > >>>>> running 4.4.6. > >>>>> > >>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal > >>>>> OS stuff as the VMs have remote storage. So no database, no rsync on > >>>>> those hosts - just the OS doing nearly nothing. > >>>>> > >>>>> All those show: > >>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > >>>>> xfs_vm_releasepage+0xe2/0xf0() > >>>>> > >>>> > >>>> Ok, well at this point the warning isn't telling us anything beyond > >>>> you're reproducing the problem. We can't really make progress without > >>>> more information. We don't necessarily know what application or > >>>> operations caused this by the time it occurs, but perhaps knowing what > >>>> file is affected could give us a hint. > >>>> > >>>> We have the xfs_releasepage tracepoint, but that's unconditional and so > >>>> might generate a lot of noise by default. Could you enable the > >>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > >>>> E.g., we could leave a long running 'trace-cmd record -e > >>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > >>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > >>>> -e "xfs:xfs_releasepage"' and leave something like 'cat > >>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > >>>> ~/trace.out' running to capture instances. > >> > >> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the > >> it in the trace.out even the WARN_ONCE was already triggered? > >> > > > > The tracepoint is independent from the warning (see > > xfs_vm_releasepage()), so the tracepoint will fire every invocation of > > the function regardless of whether delalloc blocks still exist at that > > point. That creates the need to filter the entries. > > > > With regard to performance, I believe the tracepoints are intended to be > > pretty lightweight. I don't think it should hurt to try it on a box, > > observe for a bit and make sure there isn't a huge impact. Note that the > > 'trace-cmd record' approach will save everything to file, so that's > > something to consider I suppose. > > Tests / cat is running. Is there any way to test if it works? Or is it > enough that cat prints stuff from time to time but does not match -v > delalloc 0 > What is it printing where delalloc != 0? You could always just cat trace_pipe and make sure the event is firing, it's just that I suspect most entries will have delalloc == unwritten == 0. Also, while the tracepoint fires independent of the warning, it might not be a bad idea to restart a system that has already seen the warning since boot, just to provide some correlation or additional notification when the problem occurs. Brian > Stefan > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-24 12:24 ` Brian Foster @ 2016-04-04 6:12 ` Stefan Priebe - Profihost AG 2016-05-11 12:26 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-04-04 6:12 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 24.03.2016 um 13:24 schrieb Brian Foster: > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >> >> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>> >>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>> >>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>> >>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>> ... >>>>>>> >>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>> running 4.4.6. >>>>>>> >>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>> >>>>>>> All those show: >>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>> >>>>>> >>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>> you're reproducing the problem. We can't really make progress without >>>>>> more information. We don't necessarily know what application or >>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>> file is affected could give us a hint. >>>>>> >>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>> might generate a lot of noise by default. Could you enable the >>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>> ~/trace.out' running to capture instances. >>>> >>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>> it in the trace.out even the WARN_ONCE was already triggered? >>>> >>> >>> The tracepoint is independent from the warning (see >>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>> the function regardless of whether delalloc blocks still exist at that >>> point. That creates the need to filter the entries. >>> >>> With regard to performance, I believe the tracepoints are intended to be >>> pretty lightweight. I don't think it should hurt to try it on a box, >>> observe for a bit and make sure there isn't a huge impact. Note that the >>> 'trace-cmd record' approach will save everything to file, so that's >>> something to consider I suppose. >> >> Tests / cat is running. Is there any way to test if it works? Or is it >> enough that cat prints stuff from time to time but does not match -v >> delalloc 0 >> > > What is it printing where delalloc != 0? You could always just cat > trace_pipe and make sure the event is firing, it's just that I suspect > most entries will have delalloc == unwritten == 0. > > Also, while the tracepoint fires independent of the warning, it might > not be a bad idea to restart a system that has already seen the warning > since boot, just to provide some correlation or additional notification > when the problem occurs. I still wasn't able to catch one with trace-cmd. But i notice that it happens mostly in the first 48hours after a reboot. All systems running since some days but noone triggers this again. All systems who have triggered this bug got rebootet. Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-03-24 12:24 ` Brian Foster 2016-04-04 6:12 ` Stefan Priebe - Profihost AG @ 2016-05-11 12:26 ` Stefan Priebe - Profihost AG 2016-05-11 13:34 ` Brian Foster 1 sibling, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-11 12:26 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Hi Brian, i'm still unable to grab anything to the trace file? Is there anything to check if it's working at all? This still happens in the first 48 hours after a fresh reboot. Stefan Am 24.03.2016 um 13:24 schrieb Brian Foster: > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >> >> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>> >>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>> >>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>> >>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>> ... >>>>>>> >>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>> running 4.4.6. >>>>>>> >>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>> >>>>>>> All those show: >>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>> >>>>>> >>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>> you're reproducing the problem. We can't really make progress without >>>>>> more information. We don't necessarily know what application or >>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>> file is affected could give us a hint. >>>>>> >>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>> might generate a lot of noise by default. Could you enable the >>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>> ~/trace.out' running to capture instances. >>>> >>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>> it in the trace.out even the WARN_ONCE was already triggered? >>>> >>> >>> The tracepoint is independent from the warning (see >>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>> the function regardless of whether delalloc blocks still exist at that >>> point. That creates the need to filter the entries. >>> >>> With regard to performance, I believe the tracepoints are intended to be >>> pretty lightweight. I don't think it should hurt to try it on a box, >>> observe for a bit and make sure there isn't a huge impact. Note that the >>> 'trace-cmd record' approach will save everything to file, so that's >>> something to consider I suppose. >> >> Tests / cat is running. Is there any way to test if it works? Or is it >> enough that cat prints stuff from time to time but does not match -v >> delalloc 0 >> > > What is it printing where delalloc != 0? You could always just cat > trace_pipe and make sure the event is firing, it's just that I suspect > most entries will have delalloc == unwritten == 0. > > Also, while the tracepoint fires independent of the warning, it might > not be a bad idea to restart a system that has already seen the warning > since boot, just to provide some correlation or additional notification > when the problem occurs. > > Brian > >> Stefan >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-11 12:26 ` Stefan Priebe - Profihost AG @ 2016-05-11 13:34 ` Brian Foster 2016-05-11 14:03 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-05-11 13:34 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: > Hi Brian, > > i'm still unable to grab anything to the trace file? Is there anything > to check if it's working at all? > See my previous mail: http://oss.sgi.com/pipermail/xfs/2016-March/047793.html E.g., something like this should work after writing to and removing a new file: # trace-cmd start -e "xfs:xfs_releasepage" # cat /sys/kernel/debug/tracing/trace_pipe ... rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 Once that is working, add the grep command to filter out "delalloc 0" instances, etc. For example: cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out Brian > This still happens in the first 48 hours after a fresh reboot. > > Stefan > > Am 24.03.2016 um 13:24 schrieb Brian Foster: > > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: > >> > >> Am 24.03.2016 um 12:17 schrieb Brian Foster: > >>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: > >>>> > >>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > >>>>> > >>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: > >>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > >>>>>>> sorry new one the last one got mangled. Comments inside. > >>>>>>> > >>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: > >>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > >>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > >>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: > >>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >>>>>> ... > >>>>>>> > >>>>>>> This has happened again on 8 different hosts in the last 24 hours > >>>>>>> running 4.4.6. > >>>>>>> > >>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal > >>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on > >>>>>>> those hosts - just the OS doing nearly nothing. > >>>>>>> > >>>>>>> All those show: > >>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > >>>>>>> xfs_vm_releasepage+0xe2/0xf0() > >>>>>>> > >>>>>> > >>>>>> Ok, well at this point the warning isn't telling us anything beyond > >>>>>> you're reproducing the problem. We can't really make progress without > >>>>>> more information. We don't necessarily know what application or > >>>>>> operations caused this by the time it occurs, but perhaps knowing what > >>>>>> file is affected could give us a hint. > >>>>>> > >>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so > >>>>>> might generate a lot of noise by default. Could you enable the > >>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > >>>>>> E.g., we could leave a long running 'trace-cmd record -e > >>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > >>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > >>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat > >>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > >>>>>> ~/trace.out' running to capture instances. > >>>> > >>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the > >>>> it in the trace.out even the WARN_ONCE was already triggered? > >>>> > >>> > >>> The tracepoint is independent from the warning (see > >>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of > >>> the function regardless of whether delalloc blocks still exist at that > >>> point. That creates the need to filter the entries. > >>> > >>> With regard to performance, I believe the tracepoints are intended to be > >>> pretty lightweight. I don't think it should hurt to try it on a box, > >>> observe for a bit and make sure there isn't a huge impact. Note that the > >>> 'trace-cmd record' approach will save everything to file, so that's > >>> something to consider I suppose. > >> > >> Tests / cat is running. Is there any way to test if it works? Or is it > >> enough that cat prints stuff from time to time but does not match -v > >> delalloc 0 > >> > > > > What is it printing where delalloc != 0? You could always just cat > > trace_pipe and make sure the event is firing, it's just that I suspect > > most entries will have delalloc == unwritten == 0. > > > > Also, while the tracepoint fires independent of the warning, it might > > not be a bad idea to restart a system that has already seen the warning > > since boot, just to provide some correlation or additional notification > > when the problem occurs. > > > > Brian > > > >> Stefan > >> > >> _______________________________________________ > >> xfs mailing list > >> xfs@oss.sgi.com > >> http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-11 13:34 ` Brian Foster @ 2016-05-11 14:03 ` Stefan Priebe - Profihost AG 2016-05-11 15:59 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-11 14:03 UTC (permalink / raw) To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 11.05.2016 um 15:34 schrieb Brian Foster: > On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: >> Hi Brian, >> >> i'm still unable to grab anything to the trace file? Is there anything >> to check if it's working at all? >> > > See my previous mail: > > http://oss.sgi.com/pipermail/xfs/2016-March/047793.html > > E.g., something like this should work after writing to and removing a > new file: > > # trace-cmd start -e "xfs:xfs_releasepage" > # cat /sys/kernel/debug/tracing/trace_pipe > ... > rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 arg sorry yes that's working but delalloc is always 0. May be i have to hook that into my initramfs to be fast enough? Stefan > Once that is working, add the grep command to filter out "delalloc 0" > instances, etc. For example: > > cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out > > Brian > >> This still happens in the first 48 hours after a fresh reboot. >> >> Stefan >> >> Am 24.03.2016 um 13:24 schrieb Brian Foster: >>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>>>> >>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>>>> >>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>> ... >>>>>>>>> >>>>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>>>> running 4.4.6. >>>>>>>>> >>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>>>> >>>>>>>>> All those show: >>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>>>> >>>>>>>> >>>>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>>>> you're reproducing the problem. We can't really make progress without >>>>>>>> more information. We don't necessarily know what application or >>>>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>>>> file is affected could give us a hint. >>>>>>>> >>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>>>> might generate a lot of noise by default. Could you enable the >>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>>>> ~/trace.out' running to capture instances. >>>>>> >>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>>>> it in the trace.out even the WARN_ONCE was already triggered? >>>>>> >>>>> >>>>> The tracepoint is independent from the warning (see >>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>>>> the function regardless of whether delalloc blocks still exist at that >>>>> point. That creates the need to filter the entries. >>>>> >>>>> With regard to performance, I believe the tracepoints are intended to be >>>>> pretty lightweight. I don't think it should hurt to try it on a box, >>>>> observe for a bit and make sure there isn't a huge impact. Note that the >>>>> 'trace-cmd record' approach will save everything to file, so that's >>>>> something to consider I suppose. >>>> >>>> Tests / cat is running. Is there any way to test if it works? Or is it >>>> enough that cat prints stuff from time to time but does not match -v >>>> delalloc 0 >>>> >>> >>> What is it printing where delalloc != 0? You could always just cat >>> trace_pipe and make sure the event is firing, it's just that I suspect >>> most entries will have delalloc == unwritten == 0. >>> >>> Also, while the tracepoint fires independent of the warning, it might >>> not be a bad idea to restart a system that has already seen the warning >>> since boot, just to provide some correlation or additional notification >>> when the problem occurs. >>> >>> Brian >>> >>>> Stefan >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-11 14:03 ` Stefan Priebe - Profihost AG @ 2016-05-11 15:59 ` Brian Foster 2016-05-11 19:20 ` Stefan Priebe 2016-05-15 11:03 ` Stefan Priebe 0 siblings, 2 replies; 49+ messages in thread From: Brian Foster @ 2016-05-11 15:59 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com Dropped non-XFS cc's, probably no need to spam other lists at this point... On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: > > Am 11.05.2016 um 15:34 schrieb Brian Foster: > > On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: > >> Hi Brian, > >> > >> i'm still unable to grab anything to the trace file? Is there anything > >> to check if it's working at all? > >> > > > > See my previous mail: > > > > http://oss.sgi.com/pipermail/xfs/2016-March/047793.html > > > > E.g., something like this should work after writing to and removing a > > new file: > > > > # trace-cmd start -e "xfs:xfs_releasepage" > > # cat /sys/kernel/debug/tracing/trace_pipe > > ... > > rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 > > arg sorry yes that's working but delalloc is always 0. > Hrm, Ok. That is strange. > May be i have to hook that into my initramfs to be fast enough? > Not sure that would matter.. you said it occurs within 48 hours? I take that to mean it doesn't occur immediately on boot. You should be able to tell from the logs or dmesg if it happens before you get a chance to start the tracing. Well, the options I can think of are: - Perhaps I botched matching up the line number to the warning, in which case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch any delalloc or unwritten blocks at releasepage() time. - Perhaps there's a race that the tracepoint doesn't catch. The warnings are based on local vars, so we could instrument the code to print a warning[1] to try and get the inode number. Brian [1] - compile tested diff: diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 40645a4..94738ea 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -1038,11 +1038,18 @@ xfs_vm_releasepage( gfp_t gfp_mask) { int delalloc, unwritten; + struct xfs_inode *ip = XFS_I(page->mapping->host); trace_xfs_releasepage(page->mapping->host, page, 0, 0); xfs_count_page_state(page, &delalloc, &unwritten); + if (delalloc || unwritten) + xfs_warn(ip->i_mount, + "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", + ip->i_ino, delalloc, unwritten, page_offset(page), + i_size_read(page->mapping->host)); + if (WARN_ON_ONCE(delalloc)) return 0; if (WARN_ON_ONCE(unwritten)) > Stefan > > > Once that is working, add the grep command to filter out "delalloc 0" > > instances, etc. For example: > > > > cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out > > > > Brian > > > >> This still happens in the first 48 hours after a fresh reboot. > >> > >> Stefan > >> > >> Am 24.03.2016 um 13:24 schrieb Brian Foster: > >>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: > >>>> > >>>> Am 24.03.2016 um 12:17 schrieb Brian Foster: > >>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: > >>>>>> > >>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > >>>>>>> > >>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: > >>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > >>>>>>>>> sorry new one the last one got mangled. Comments inside. > >>>>>>>>> > >>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: > >>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > >>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > >>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: > >>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >>>>>>>> ... > >>>>>>>>> > >>>>>>>>> This has happened again on 8 different hosts in the last 24 hours > >>>>>>>>> running 4.4.6. > >>>>>>>>> > >>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal > >>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on > >>>>>>>>> those hosts - just the OS doing nearly nothing. > >>>>>>>>> > >>>>>>>>> All those show: > >>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > >>>>>>>>> xfs_vm_releasepage+0xe2/0xf0() > >>>>>>>>> > >>>>>>>> > >>>>>>>> Ok, well at this point the warning isn't telling us anything beyond > >>>>>>>> you're reproducing the problem. We can't really make progress without > >>>>>>>> more information. We don't necessarily know what application or > >>>>>>>> operations caused this by the time it occurs, but perhaps knowing what > >>>>>>>> file is affected could give us a hint. > >>>>>>>> > >>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so > >>>>>>>> might generate a lot of noise by default. Could you enable the > >>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > >>>>>>>> E.g., we could leave a long running 'trace-cmd record -e > >>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > >>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > >>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat > >>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > >>>>>>>> ~/trace.out' running to capture instances. > >>>>>> > >>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the > >>>>>> it in the trace.out even the WARN_ONCE was already triggered? > >>>>>> > >>>>> > >>>>> The tracepoint is independent from the warning (see > >>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of > >>>>> the function regardless of whether delalloc blocks still exist at that > >>>>> point. That creates the need to filter the entries. > >>>>> > >>>>> With regard to performance, I believe the tracepoints are intended to be > >>>>> pretty lightweight. I don't think it should hurt to try it on a box, > >>>>> observe for a bit and make sure there isn't a huge impact. Note that the > >>>>> 'trace-cmd record' approach will save everything to file, so that's > >>>>> something to consider I suppose. > >>>> > >>>> Tests / cat is running. Is there any way to test if it works? Or is it > >>>> enough that cat prints stuff from time to time but does not match -v > >>>> delalloc 0 > >>>> > >>> > >>> What is it printing where delalloc != 0? You could always just cat > >>> trace_pipe and make sure the event is firing, it's just that I suspect > >>> most entries will have delalloc == unwritten == 0. > >>> > >>> Also, while the tracepoint fires independent of the warning, it might > >>> not be a bad idea to restart a system that has already seen the warning > >>> since boot, just to provide some correlation or additional notification > >>> when the problem occurs. > >>> > >>> Brian > >>> > >>>> Stefan > >>>> > >>>> _______________________________________________ > >>>> xfs mailing list > >>>> xfs@oss.sgi.com > >>>> http://oss.sgi.com/mailman/listinfo/xfs > >> > >> _______________________________________________ > >> xfs mailing list > >> xfs@oss.sgi.com > >> http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-11 15:59 ` Brian Foster @ 2016-05-11 19:20 ` Stefan Priebe 2016-05-15 11:03 ` Stefan Priebe 1 sibling, 0 replies; 49+ messages in thread From: Stefan Priebe @ 2016-05-11 19:20 UTC (permalink / raw) To: Brian Foster; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 11.05.2016 um 17:59 schrieb Brian Foster: > Dropped non-XFS cc's, probably no need to spam other lists at this > point... > > On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: >> >> Am 11.05.2016 um 15:34 schrieb Brian Foster: >>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: >>>> Hi Brian, >>>> >>>> i'm still unable to grab anything to the trace file? Is there anything >>>> to check if it's working at all? >>>> >>> >>> See my previous mail: >>> >>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html >>> >>> E.g., something like this should work after writing to and removing a >>> new file: >>> >>> # trace-cmd start -e "xfs:xfs_releasepage" >>> # cat /sys/kernel/debug/tracing/trace_pipe >>> ... >>> rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 >> >> arg sorry yes that's working but delalloc is always 0. >> > > Hrm, Ok. That is strange. > >> May be i have to hook that into my initramfs to be fast enough? >> > > Not sure that would matter.. you said it occurs within 48 hours? I take > that to mean it doesn't occur immediately on boot. You should be able to > tell from the logs or dmesg if it happens before you get a chance to > start the tracing. > > Well, the options I can think of are: > > - Perhaps I botched matching up the line number to the warning, in which > case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch > any delalloc or unwritten blocks at releasepage() time. OK i changed the grep command. > > - Perhaps there's a race that the tracepoint doesn't catch. The warnings > are based on local vars, so we could instrument the code to print a > warning[1] to try and get the inode number. Thx i also added your patch. So we need to wait another 48h. Greets, Stefan > Brian > > [1] - compile tested diff: > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 40645a4..94738ea 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -1038,11 +1038,18 @@ xfs_vm_releasepage( > gfp_t gfp_mask) > { > int delalloc, unwritten; > + struct xfs_inode *ip = XFS_I(page->mapping->host); > > trace_xfs_releasepage(page->mapping->host, page, 0, 0); > > xfs_count_page_state(page, &delalloc, &unwritten); > > + if (delalloc || unwritten) > + xfs_warn(ip->i_mount, > + "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", > + ip->i_ino, delalloc, unwritten, page_offset(page), > + i_size_read(page->mapping->host)); > + > if (WARN_ON_ONCE(delalloc)) > return 0; > if (WARN_ON_ONCE(unwritten)) > >> Stefan >> >>> Once that is working, add the grep command to filter out "delalloc 0" >>> instances, etc. For example: >>> >>> cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out >>> >>> Brian >>> >>>> This still happens in the first 48 hours after a fresh reboot. >>>> >>>> Stefan >>>> >>>> Am 24.03.2016 um 13:24 schrieb Brian Foster: >>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>> >>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>>>>>> >>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>>>>>> >>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>>>>>> running 4.4.6. >>>>>>>>>>> >>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>>>>>> >>>>>>>>>>> All those show: >>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>>>>>> you're reproducing the problem. We can't really make progress without >>>>>>>>>> more information. We don't necessarily know what application or >>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>>>>>> file is affected could give us a hint. >>>>>>>>>> >>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>>>>>> might generate a lot of noise by default. Could you enable the >>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>>>>>> ~/trace.out' running to capture instances. >>>>>>>> >>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>>>>>> it in the trace.out even the WARN_ONCE was already triggered? >>>>>>>> >>>>>>> >>>>>>> The tracepoint is independent from the warning (see >>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>>>>>> the function regardless of whether delalloc blocks still exist at that >>>>>>> point. That creates the need to filter the entries. >>>>>>> >>>>>>> With regard to performance, I believe the tracepoints are intended to be >>>>>>> pretty lightweight. I don't think it should hurt to try it on a box, >>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the >>>>>>> 'trace-cmd record' approach will save everything to file, so that's >>>>>>> something to consider I suppose. >>>>>> >>>>>> Tests / cat is running. Is there any way to test if it works? Or is it >>>>>> enough that cat prints stuff from time to time but does not match -v >>>>>> delalloc 0 >>>>>> >>>>> >>>>> What is it printing where delalloc != 0? You could always just cat >>>>> trace_pipe and make sure the event is firing, it's just that I suspect >>>>> most entries will have delalloc == unwritten == 0. >>>>> >>>>> Also, while the tracepoint fires independent of the warning, it might >>>>> not be a bad idea to restart a system that has already seen the warning >>>>> since boot, just to provide some correlation or additional notification >>>>> when the problem occurs. >>>>> >>>>> Brian >>>>> >>>>>> Stefan >>>>>> >>>>>> _______________________________________________ >>>>>> xfs mailing list >>>>>> xfs@oss.sgi.com >>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-11 15:59 ` Brian Foster 2016-05-11 19:20 ` Stefan Priebe @ 2016-05-15 11:03 ` Stefan Priebe 2016-05-15 11:50 ` Brian Foster 1 sibling, 1 reply; 49+ messages in thread From: Stefan Priebe @ 2016-05-15 11:03 UTC (permalink / raw) To: Brian Foster; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com Hi Brian, here's the new trace: [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff 0x19f000 size 0x1a0000 [310740.407265] ------------[ cut here ]------------ [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 xfs_vm_releasepage+0x12e/0x140() [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G O 4.4.10+25-ph #1 [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 [310740.407291] 0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f 0000000000000000 [310740.407292] ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 ffff880c4da1fae8 [310740.407293] 0000000000000000 ffffea0000e38140 ffff8807e20bfd10 ffffea0000e38160 [310740.407295] Call Trace: [310740.407299] [<ffffffffa13c6d0f>] dump_stack+0x63/0x84 [310740.407301] [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0 [310740.407302] [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20 [310740.407303] [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140 [310740.407305] [<ffffffffa11520c2>] try_to_release_page+0x32/0x50 [310740.407308] [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0 [310740.407309] [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0 [310740.407311] [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0 [310740.407312] [<ffffffffa1168499>] kswapd+0x4f9/0x970 [310740.407314] [<ffffffffa1167fa0>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [310740.407316] [<ffffffffa10a0d99>] kthread+0xc9/0xe0 [310740.407318] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 [310740.407320] [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70 [310740.407321] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 [310740.407322] ---[ end trace bf76ad5e8a4d863e ]--- Stefan Am 11.05.2016 um 17:59 schrieb Brian Foster: > Dropped non-XFS cc's, probably no need to spam other lists at this > point... > > On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: >> >> Am 11.05.2016 um 15:34 schrieb Brian Foster: >>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: >>>> Hi Brian, >>>> >>>> i'm still unable to grab anything to the trace file? Is there anything >>>> to check if it's working at all? >>>> >>> >>> See my previous mail: >>> >>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html >>> >>> E.g., something like this should work after writing to and removing a >>> new file: >>> >>> # trace-cmd start -e "xfs:xfs_releasepage" >>> # cat /sys/kernel/debug/tracing/trace_pipe >>> ... >>> rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 >> >> arg sorry yes that's working but delalloc is always 0. >> > > Hrm, Ok. That is strange. > >> May be i have to hook that into my initramfs to be fast enough? >> > > Not sure that would matter.. you said it occurs within 48 hours? I take > that to mean it doesn't occur immediately on boot. You should be able to > tell from the logs or dmesg if it happens before you get a chance to > start the tracing. > > Well, the options I can think of are: > > - Perhaps I botched matching up the line number to the warning, in which > case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch > any delalloc or unwritten blocks at releasepage() time. > > - Perhaps there's a race that the tracepoint doesn't catch. The warnings > are based on local vars, so we could instrument the code to print a > warning[1] to try and get the inode number. > > Brian > > [1] - compile tested diff: > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 40645a4..94738ea 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -1038,11 +1038,18 @@ xfs_vm_releasepage( > gfp_t gfp_mask) > { > int delalloc, unwritten; > + struct xfs_inode *ip = XFS_I(page->mapping->host); > > trace_xfs_releasepage(page->mapping->host, page, 0, 0); > > xfs_count_page_state(page, &delalloc, &unwritten); > > + if (delalloc || unwritten) > + xfs_warn(ip->i_mount, > + "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", > + ip->i_ino, delalloc, unwritten, page_offset(page), > + i_size_read(page->mapping->host)); > + > if (WARN_ON_ONCE(delalloc)) > return 0; > if (WARN_ON_ONCE(unwritten)) > >> Stefan >> >>> Once that is working, add the grep command to filter out "delalloc 0" >>> instances, etc. For example: >>> >>> cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out >>> >>> Brian >>> >>>> This still happens in the first 48 hours after a fresh reboot. >>>> >>>> Stefan >>>> >>>> Am 24.03.2016 um 13:24 schrieb Brian Foster: >>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>> >>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>>>>>> >>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>>>>>> >>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>>>>>> running 4.4.6. >>>>>>>>>>> >>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>>>>>> >>>>>>>>>>> All those show: >>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>>>>>> you're reproducing the problem. We can't really make progress without >>>>>>>>>> more information. We don't necessarily know what application or >>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>>>>>> file is affected could give us a hint. >>>>>>>>>> >>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>>>>>> might generate a lot of noise by default. Could you enable the >>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>>>>>> ~/trace.out' running to capture instances. >>>>>>>> >>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>>>>>> it in the trace.out even the WARN_ONCE was already triggered? >>>>>>>> >>>>>>> >>>>>>> The tracepoint is independent from the warning (see >>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>>>>>> the function regardless of whether delalloc blocks still exist at that >>>>>>> point. That creates the need to filter the entries. >>>>>>> >>>>>>> With regard to performance, I believe the tracepoints are intended to be >>>>>>> pretty lightweight. I don't think it should hurt to try it on a box, >>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the >>>>>>> 'trace-cmd record' approach will save everything to file, so that's >>>>>>> something to consider I suppose. >>>>>> >>>>>> Tests / cat is running. Is there any way to test if it works? Or is it >>>>>> enough that cat prints stuff from time to time but does not match -v >>>>>> delalloc 0 >>>>>> >>>>> >>>>> What is it printing where delalloc != 0? You could always just cat >>>>> trace_pipe and make sure the event is firing, it's just that I suspect >>>>> most entries will have delalloc == unwritten == 0. >>>>> >>>>> Also, while the tracepoint fires independent of the warning, it might >>>>> not be a bad idea to restart a system that has already seen the warning >>>>> since boot, just to provide some correlation or additional notification >>>>> when the problem occurs. >>>>> >>>>> Brian >>>>> >>>>>> Stefan >>>>>> >>>>>> _______________________________________________ >>>>>> xfs mailing list >>>>>> xfs@oss.sgi.com >>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-15 11:03 ` Stefan Priebe @ 2016-05-15 11:50 ` Brian Foster 2016-05-15 12:41 ` Stefan Priebe 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-05-15 11:50 UTC (permalink / raw) To: Stefan Priebe; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote: > Hi Brian, > > here's the new trace: > [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff > 0x19f000 size 0x1a0000 So it is actually an unwritten buffer, on what appears to be the last page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers on write failure") that went into 4.6, but that was reproducible on sub-4k block size filesystems and depends on some kind of write error. Are either of those applicable here? Are you close to ENOSPC, for example? Otherwise, have you determined what file is associated with that inode (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some insight on what actually preallocates/writes the file and perhaps that helps us identify something we can trace. Also, if you think the file has not been modified since the error, an 'xfs_bmap -v <file>' might be interesting as well... Brian > [310740.407265] ------------[ cut here ]------------ > [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 > xfs_vm_releasepage+0x12e/0x140() > [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 > xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp > fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan > ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci > ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp > mpt3sas pps_core raid_class scsi_transport_sas > [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G O > 4.4.10+25-ph #1 > [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b > 05/18/2015 > [310740.407291] 0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f > 0000000000000000 > [310740.407292] ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 > ffff880c4da1fae8 > [310740.407293] 0000000000000000 ffffea0000e38140 ffff8807e20bfd10 > ffffea0000e38160 > [310740.407295] Call Trace: > [310740.407299] [<ffffffffa13c6d0f>] dump_stack+0x63/0x84 > [310740.407301] [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0 > [310740.407302] [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20 > [310740.407303] [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140 > [310740.407305] [<ffffffffa11520c2>] try_to_release_page+0x32/0x50 > [310740.407308] [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0 > [310740.407309] [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0 > [310740.407311] [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0 > [310740.407312] [<ffffffffa1168499>] kswapd+0x4f9/0x970 > [310740.407314] [<ffffffffa1167fa0>] ? > mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > [310740.407316] [<ffffffffa10a0d99>] kthread+0xc9/0xe0 > [310740.407318] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 > [310740.407320] [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70 > [310740.407321] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 > [310740.407322] ---[ end trace bf76ad5e8a4d863e ]--- > > > Stefan > > Am 11.05.2016 um 17:59 schrieb Brian Foster: > >Dropped non-XFS cc's, probably no need to spam other lists at this > >point... > > > >On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: > >> > >>Am 11.05.2016 um 15:34 schrieb Brian Foster: > >>>On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: > >>>>Hi Brian, > >>>> > >>>>i'm still unable to grab anything to the trace file? Is there anything > >>>>to check if it's working at all? > >>>> > >>> > >>>See my previous mail: > >>> > >>>http://oss.sgi.com/pipermail/xfs/2016-March/047793.html > >>> > >>>E.g., something like this should work after writing to and removing a > >>>new file: > >>> > >>># trace-cmd start -e "xfs:xfs_releasepage" > >>># cat /sys/kernel/debug/tracing/trace_pipe > >>>... > >>>rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 > >> > >>arg sorry yes that's working but delalloc is always 0. > >> > > > >Hrm, Ok. That is strange. > > > >>May be i have to hook that into my initramfs to be fast enough? > >> > > > >Not sure that would matter.. you said it occurs within 48 hours? I take > >that to mean it doesn't occur immediately on boot. You should be able to > >tell from the logs or dmesg if it happens before you get a chance to > >start the tracing. > > > >Well, the options I can think of are: > > > >- Perhaps I botched matching up the line number to the warning, in which > > case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch > > any delalloc or unwritten blocks at releasepage() time. > > > >- Perhaps there's a race that the tracepoint doesn't catch. The warnings > > are based on local vars, so we could instrument the code to print a > > warning[1] to try and get the inode number. > > > >Brian > > > >[1] - compile tested diff: > > > >diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > >index 40645a4..94738ea 100644 > >--- a/fs/xfs/xfs_aops.c > >+++ b/fs/xfs/xfs_aops.c > >@@ -1038,11 +1038,18 @@ xfs_vm_releasepage( > > gfp_t gfp_mask) > > { > > int delalloc, unwritten; > >+ struct xfs_inode *ip = XFS_I(page->mapping->host); > > > > trace_xfs_releasepage(page->mapping->host, page, 0, 0); > > > > xfs_count_page_state(page, &delalloc, &unwritten); > > > >+ if (delalloc || unwritten) > >+ xfs_warn(ip->i_mount, > >+ "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", > >+ ip->i_ino, delalloc, unwritten, page_offset(page), > >+ i_size_read(page->mapping->host)); > >+ > > if (WARN_ON_ONCE(delalloc)) > > return 0; > > if (WARN_ON_ONCE(unwritten)) > > > >>Stefan > >> > >>>Once that is working, add the grep command to filter out "delalloc 0" > >>>instances, etc. For example: > >>> > >>> cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out > >>> > >>>Brian > >>> > >>>>This still happens in the first 48 hours after a fresh reboot. > >>>> > >>>>Stefan > >>>> > >>>>Am 24.03.2016 um 13:24 schrieb Brian Foster: > >>>>>On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: > >>>>>> > >>>>>>Am 24.03.2016 um 12:17 schrieb Brian Foster: > >>>>>>>On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: > >>>>>>>> > >>>>>>>>Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > >>>>>>>>> > >>>>>>>>>Am 23.03.2016 um 15:07 schrieb Brian Foster: > >>>>>>>>>>On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>sorry new one the last one got mangled. Comments inside. > >>>>>>>>>>> > >>>>>>>>>>>Am 05.03.2016 um 23:48 schrieb Dave Chinner: > >>>>>>>>>>>>On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > >>>>>>>>>>>>>On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > >>>>>>>>>>>>>>Am 04.03.2016 um 20:13 schrieb Brian Foster: > >>>>>>>>>>>>>>>On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > >>>>>>>>>>>>>>>>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > >>>>>>>>>>... > >>>>>>>>>>> > >>>>>>>>>>>This has happened again on 8 different hosts in the last 24 hours > >>>>>>>>>>>running 4.4.6. > >>>>>>>>>>> > >>>>>>>>>>>All of those are KVM / Qemu hosts and are doing NO I/O except the normal > >>>>>>>>>>>OS stuff as the VMs have remote storage. So no database, no rsync on > >>>>>>>>>>>those hosts - just the OS doing nearly nothing. > >>>>>>>>>>> > >>>>>>>>>>>All those show: > >>>>>>>>>>>[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > >>>>>>>>>>>xfs_vm_releasepage+0xe2/0xf0() > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>Ok, well at this point the warning isn't telling us anything beyond > >>>>>>>>>>you're reproducing the problem. We can't really make progress without > >>>>>>>>>>more information. We don't necessarily know what application or > >>>>>>>>>>operations caused this by the time it occurs, but perhaps knowing what > >>>>>>>>>>file is affected could give us a hint. > >>>>>>>>>> > >>>>>>>>>>We have the xfs_releasepage tracepoint, but that's unconditional and so > >>>>>>>>>>might generate a lot of noise by default. Could you enable the > >>>>>>>>>>xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > >>>>>>>>>>E.g., we could leave a long running 'trace-cmd record -e > >>>>>>>>>>"xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > >>>>>>>>>>problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > >>>>>>>>>>-e "xfs:xfs_releasepage"' and leave something like 'cat > >>>>>>>>>>/sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > >>>>>>>>>>~/trace.out' running to capture instances. > >>>>>>>> > >>>>>>>>Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the > >>>>>>>>it in the trace.out even the WARN_ONCE was already triggered? > >>>>>>>> > >>>>>>> > >>>>>>>The tracepoint is independent from the warning (see > >>>>>>>xfs_vm_releasepage()), so the tracepoint will fire every invocation of > >>>>>>>the function regardless of whether delalloc blocks still exist at that > >>>>>>>point. That creates the need to filter the entries. > >>>>>>> > >>>>>>>With regard to performance, I believe the tracepoints are intended to be > >>>>>>>pretty lightweight. I don't think it should hurt to try it on a box, > >>>>>>>observe for a bit and make sure there isn't a huge impact. Note that the > >>>>>>>'trace-cmd record' approach will save everything to file, so that's > >>>>>>>something to consider I suppose. > >>>>>> > >>>>>>Tests / cat is running. Is there any way to test if it works? Or is it > >>>>>>enough that cat prints stuff from time to time but does not match -v > >>>>>>delalloc 0 > >>>>>> > >>>>> > >>>>>What is it printing where delalloc != 0? You could always just cat > >>>>>trace_pipe and make sure the event is firing, it's just that I suspect > >>>>>most entries will have delalloc == unwritten == 0. > >>>>> > >>>>>Also, while the tracepoint fires independent of the warning, it might > >>>>>not be a bad idea to restart a system that has already seen the warning > >>>>>since boot, just to provide some correlation or additional notification > >>>>>when the problem occurs. > >>>>> > >>>>>Brian > >>>>> > >>>>>>Stefan > >>>>>> > >>>>>>_______________________________________________ > >>>>>>xfs mailing list > >>>>>>xfs@oss.sgi.com > >>>>>>http://oss.sgi.com/mailman/listinfo/xfs > >>>> > >>>>_______________________________________________ > >>>>xfs mailing list > >>>>xfs@oss.sgi.com > >>>>http://oss.sgi.com/mailman/listinfo/xfs > >> > >>_______________________________________________ > >>xfs mailing list > >>xfs@oss.sgi.com > >>http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-15 11:50 ` Brian Foster @ 2016-05-15 12:41 ` Stefan Priebe 2016-05-16 1:06 ` Brian Foster 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe @ 2016-05-15 12:41 UTC (permalink / raw) To: Brian Foster; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com Hi, find shows a ceph object file: /var/lib/ceph/osd/ceph-13/current/3.29f_head/DIR_F/DIR_9/DIR_2/DIR_D/rbd\udata.904a406b8b4567.00000000000052d6__head_143BD29F__3 File was again modified since than. At another system i've different output. [Sun May 15 07:00:44 2016] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x50000 size 0x13d1c8 [Sun May 15 07:00:44 2016] ------------[ cut here ]------------ [Sun May 15 07:00:44 2016] WARNING: CPU: 2 PID: 108 at fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140() [Sun May 15 07:00:44 2016] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse xhci_pci xhci_hcd sb_edac edac_core i2c_i801 i40e(O) shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O 4.4.10+25-ph #1 [Sun May 15 07:00:44 2016] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 [Sun May 15 07:00:44 2016] 0000000000000000 ffff880c4da37a88 ffffffff9c3c6d0f 0000000000000000 [Sun May 15 07:00:44 2016] ffffffff9ca51a1c ffff880c4da37ac8 ffffffff9c0837a7 ffff880c4da37ae8 [Sun May 15 07:00:44 2016] 0000000000000001 ffffea0001053080 ffff8801429ef490 ffffea00010530a0 [Sun May 15 07:00:44 2016] Call Trace: [Sun May 15 07:00:44 2016] [<ffffffff9c3c6d0f>] dump_stack+0x63/0x84 [Sun May 15 07:00:44 2016] [<ffffffff9c0837a7>] warn_slowpath_common+0x97/0xe0 [Sun May 15 07:00:44 2016] [<ffffffff9c08380a>] warn_slowpath_null+0x1a/0x20 [Sun May 15 07:00:44 2016] [<ffffffff9c326f4f>] xfs_vm_releasepage+0x10f/0x140 [Sun May 15 07:00:44 2016] [<ffffffff9c1520c2>] try_to_release_page+0x32/0x50 [Sun May 15 07:00:44 2016] [<ffffffff9c166a8e>] shrink_active_list+0x3ce/0x3e0 [Sun May 15 07:00:44 2016] [<ffffffff9c167127>] shrink_lruvec+0x687/0x7d0 [Sun May 15 07:00:44 2016] [<ffffffff9c16734c>] shrink_zone+0xdc/0x2c0 [Sun May 15 07:00:44 2016] [<ffffffff9c168499>] kswapd+0x4f9/0x970 [Sun May 15 07:00:44 2016] [<ffffffff9c167fa0>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [Sun May 15 07:00:44 2016] [<ffffffff9c0a0d99>] kthread+0xc9/0xe0 [Sun May 15 07:00:44 2016] [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100 [Sun May 15 07:00:44 2016] [<ffffffff9c6b58cf>] ret_from_fork+0x3f/0x70 [Sun May 15 07:00:44 2016] [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100 [Sun May 15 07:00:44 2016] ---[ end trace 9497d464aafe5b88 ]--- [295086.353469] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x51000 size 0x13d1c8 [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x52000 size 0x13d1c8 [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x53000 size 0x13d1c8 [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x54000 size 0x13d1c8 [295086.353480] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x55000 size 0x13d1c8 [295086.353482] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x56000 size 0x13d1c8 [295086.353489] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x57000 size 0x13d1c8 [295086.353491] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x58000 size 0x13d1c8 [295086.353494] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x59000 size 0x13d1c8 [295086.353496] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x5a000 size 0x13d1c8 [295086.353498] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x5b000 size 0x13d1c8 [295086.353500] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x5c000 size 0x13d1c8 [295086.353503] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x5d000 size 0x13d1c8 [295086.353505] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x5e000 size 0x13d1c8 [295086.353513] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x5f000 size 0x13d1c8 [295086.353515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x60000 size 0x13d1c8 [295086.353517] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x61000 size 0x13d1c8 [295086.353521] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x62000 size 0x13d1c8 [295086.353523] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x63000 size 0x13d1c8 [295086.353525] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x64000 size 0x13d1c8 [295086.353528] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x65000 size 0x13d1c8 [295086.353530] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x66000 size 0x13d1c8 [295086.353536] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x67000 size 0x13d1c8 [295086.353538] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x68000 size 0x13d1c8 [295086.353541] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x69000 size 0x13d1c8 [295086.353543] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x6a000 size 0x13d1c8 [295086.353545] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x6b000 size 0x13d1c8 [295086.353548] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x6c000 size 0x13d1c8 [295086.353550] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x6d000 size 0x13d1c8 [295086.353553] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x6e000 size 0x13d1c8 [295086.567308] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x6f000 size 0x13d1c8 [295086.567313] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x70000 size 0x13d1c8 [295086.567317] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x71000 size 0x13d1c8 [295086.567319] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x72000 size 0x13d1c8 [295086.567321] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x73000 size 0x13d1c8 [295086.567326] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x74000 size 0x13d1c8 [295086.567328] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x75000 size 0x13d1c8 [295086.567331] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x76000 size 0x13d1c8 [295086.567341] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x77000 size 0x13d1c8 [295086.567343] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x78000 size 0x13d1c8 [295086.567346] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x79000 size 0x13d1c8 [295086.567348] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x7a000 size 0x13d1c8 [295086.567350] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x7b000 size 0x13d1c8 [295086.567353] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x7c000 size 0x13d1c8 [295086.567355] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x7d000 size 0x13d1c8 [295086.567357] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x7e000 size 0x13d1c8 [295086.567367] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x7f000 size 0x13d1c8 [295086.567369] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x80000 size 0x13d1c8 [295086.567372] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x81000 size 0x13d1c8 [295086.567374] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x82000 size 0x13d1c8 [295086.567376] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x83000 size 0x13d1c8 [295086.567380] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x84000 size 0x13d1c8 [295086.567382] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x85000 size 0x13d1c8 [295086.567385] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x86000 size 0x13d1c8 [295086.567394] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x87000 size 0x13d1c8 [295086.567396] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x88000 size 0x13d1c8 [295086.567399] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x89000 size 0x13d1c8 [295086.567401] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x8a000 size 0x13d1c8 [295086.567403] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x8b000 size 0x13d1c8 [295086.567405] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x8c000 size 0x13d1c8 [295086.567408] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x8d000 size 0x13d1c8 [295086.567410] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x8e000 size 0x13d1c8 [295086.567416] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x8f000 size 0x13d1c8 [295086.567419] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x90000 size 0x13d1c8 [295086.567421] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x91000 size 0x13d1c8 [295086.567423] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x92000 size 0x13d1c8 [295086.567427] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x93000 size 0x13d1c8 [295086.567429] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x94000 size 0x13d1c8 [295086.567431] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x95000 size 0x13d1c8 [295086.567434] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x96000 size 0x13d1c8 [295086.567447] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x97000 size 0x13d1c8 [295086.567450] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x98000 size 0x13d1c8 [295086.567452] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x99000 size 0x13d1c8 [295086.567454] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x9a000 size 0x13d1c8 [295086.567456] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x9b000 size 0x13d1c8 [295086.567458] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x9c000 size 0x13d1c8 [295086.567461] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x9d000 size 0x13d1c8 [295086.567463] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x9e000 size 0x13d1c8 [295086.567471] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0x9f000 size 0x13d1c8 [295086.567474] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa0000 size 0x13d1c8 [295086.567476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa1000 size 0x13d1c8 [295086.567479] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa2000 size 0x13d1c8 [295086.567483] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa3000 size 0x13d1c8 [295086.567485] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa4000 size 0x13d1c8 [295086.567488] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa5000 size 0x13d1c8 [295086.567490] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa6000 size 0x13d1c8 [295086.567499] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa7000 size 0x13d1c8 [295086.567501] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa8000 size 0x13d1c8 [295086.567503] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xa9000 size 0x13d1c8 [295086.567505] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xaa000 size 0x13d1c8 [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xab000 size 0x13d1c8 [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xac000 size 0x13d1c8 [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff 0xad000 size 0x13d1c8 The file to the inode number is: /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en dmesg output / trace was at 7 am today and last modify of the file was yesterday 11 pm. Stefan Am 15.05.2016 um 13:50 schrieb Brian Foster: > On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote: >> Hi Brian, >> >> here's the new trace: >> [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff >> 0x19f000 size 0x1a0000 > > So it is actually an unwritten buffer, on what appears to be the last > page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers > on write failure") that went into 4.6, but that was reproducible on > sub-4k block size filesystems and depends on some kind of write error. > Are either of those applicable here? Are you close to ENOSPC, for > example? > > Otherwise, have you determined what file is associated with that inode > (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some > insight on what actually preallocates/writes the file and perhaps that > helps us identify something we can trace. Also, if you think the file > has not been modified since the error, an 'xfs_bmap -v <file>' might be > interesting as well... > > Brian > >> [310740.407265] ------------[ cut here ]------------ >> [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 >> xfs_vm_releasepage+0x12e/0x140() >> [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 >> xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp >> fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan >> ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor >> raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci >> ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp >> mpt3sas pps_core raid_class scsi_transport_sas >> [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G O >> 4.4.10+25-ph #1 >> [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b >> 05/18/2015 >> [310740.407291] 0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f >> 0000000000000000 >> [310740.407292] ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 >> ffff880c4da1fae8 >> [310740.407293] 0000000000000000 ffffea0000e38140 ffff8807e20bfd10 >> ffffea0000e38160 >> [310740.407295] Call Trace: >> [310740.407299] [<ffffffffa13c6d0f>] dump_stack+0x63/0x84 >> [310740.407301] [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0 >> [310740.407302] [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20 >> [310740.407303] [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140 >> [310740.407305] [<ffffffffa11520c2>] try_to_release_page+0x32/0x50 >> [310740.407308] [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0 >> [310740.407309] [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0 >> [310740.407311] [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0 >> [310740.407312] [<ffffffffa1168499>] kswapd+0x4f9/0x970 >> [310740.407314] [<ffffffffa1167fa0>] ? >> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >> [310740.407316] [<ffffffffa10a0d99>] kthread+0xc9/0xe0 >> [310740.407318] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 >> [310740.407320] [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70 >> [310740.407321] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 >> [310740.407322] ---[ end trace bf76ad5e8a4d863e ]--- >> >> >> Stefan >> >> Am 11.05.2016 um 17:59 schrieb Brian Foster: >>> Dropped non-XFS cc's, probably no need to spam other lists at this >>> point... >>> >>> On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 11.05.2016 um 15:34 schrieb Brian Foster: >>>>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: >>>>>> Hi Brian, >>>>>> >>>>>> i'm still unable to grab anything to the trace file? Is there anything >>>>>> to check if it's working at all? >>>>>> >>>>> >>>>> See my previous mail: >>>>> >>>>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html >>>>> >>>>> E.g., something like this should work after writing to and removing a >>>>> new file: >>>>> >>>>> # trace-cmd start -e "xfs:xfs_releasepage" >>>>> # cat /sys/kernel/debug/tracing/trace_pipe >>>>> ... >>>>> rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 >>>> >>>> arg sorry yes that's working but delalloc is always 0. >>>> >>> >>> Hrm, Ok. That is strange. >>> >>>> May be i have to hook that into my initramfs to be fast enough? >>>> >>> >>> Not sure that would matter.. you said it occurs within 48 hours? I take >>> that to mean it doesn't occur immediately on boot. You should be able to >>> tell from the logs or dmesg if it happens before you get a chance to >>> start the tracing. >>> >>> Well, the options I can think of are: >>> >>> - Perhaps I botched matching up the line number to the warning, in which >>> case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch >>> any delalloc or unwritten blocks at releasepage() time. >>> >>> - Perhaps there's a race that the tracepoint doesn't catch. The warnings >>> are based on local vars, so we could instrument the code to print a >>> warning[1] to try and get the inode number. >>> >>> Brian >>> >>> [1] - compile tested diff: >>> >>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c >>> index 40645a4..94738ea 100644 >>> --- a/fs/xfs/xfs_aops.c >>> +++ b/fs/xfs/xfs_aops.c >>> @@ -1038,11 +1038,18 @@ xfs_vm_releasepage( >>> gfp_t gfp_mask) >>> { >>> int delalloc, unwritten; >>> + struct xfs_inode *ip = XFS_I(page->mapping->host); >>> >>> trace_xfs_releasepage(page->mapping->host, page, 0, 0); >>> >>> xfs_count_page_state(page, &delalloc, &unwritten); >>> >>> + if (delalloc || unwritten) >>> + xfs_warn(ip->i_mount, >>> + "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", >>> + ip->i_ino, delalloc, unwritten, page_offset(page), >>> + i_size_read(page->mapping->host)); >>> + >>> if (WARN_ON_ONCE(delalloc)) >>> return 0; >>> if (WARN_ON_ONCE(unwritten)) >>> >>>> Stefan >>>> >>>>> Once that is working, add the grep command to filter out "delalloc 0" >>>>> instances, etc. For example: >>>>> >>>>> cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out >>>>> >>>>> Brian >>>>> >>>>>> This still happens in the first 48 hours after a fresh reboot. >>>>>> >>>>>> Stefan >>>>>> >>>>>> Am 24.03.2016 um 13:24 schrieb Brian Foster: >>>>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>> >>>>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>>>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> >>>>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>> >>>>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>>>>>>>> >>>>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>>>>>> ... >>>>>>>>>>>>> >>>>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>>>>>>>> running 4.4.6. >>>>>>>>>>>>> >>>>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>>>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>>>>>>>> >>>>>>>>>>>>> All those show: >>>>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>>>>>>>> you're reproducing the problem. We can't really make progress without >>>>>>>>>>>> more information. We don't necessarily know what application or >>>>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>>>>>>>> file is affected could give us a hint. >>>>>>>>>>>> >>>>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>>>>>>>> might generate a lot of noise by default. Could you enable the >>>>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>>>>>>>> ~/trace.out' running to capture instances. >>>>>>>>>> >>>>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>>>>>>>> it in the trace.out even the WARN_ONCE was already triggered? >>>>>>>>>> >>>>>>>>> >>>>>>>>> The tracepoint is independent from the warning (see >>>>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>>>>>>>> the function regardless of whether delalloc blocks still exist at that >>>>>>>>> point. That creates the need to filter the entries. >>>>>>>>> >>>>>>>>> With regard to performance, I believe the tracepoints are intended to be >>>>>>>>> pretty lightweight. I don't think it should hurt to try it on a box, >>>>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the >>>>>>>>> 'trace-cmd record' approach will save everything to file, so that's >>>>>>>>> something to consider I suppose. >>>>>>>> >>>>>>>> Tests / cat is running. Is there any way to test if it works? Or is it >>>>>>>> enough that cat prints stuff from time to time but does not match -v >>>>>>>> delalloc 0 >>>>>>>> >>>>>>> >>>>>>> What is it printing where delalloc != 0? You could always just cat >>>>>>> trace_pipe and make sure the event is firing, it's just that I suspect >>>>>>> most entries will have delalloc == unwritten == 0. >>>>>>> >>>>>>> Also, while the tracepoint fires independent of the warning, it might >>>>>>> not be a bad idea to restart a system that has already seen the warning >>>>>>> since boot, just to provide some correlation or additional notification >>>>>>> when the problem occurs. >>>>>>> >>>>>>> Brian >>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> xfs mailing list >>>>>>>> xfs@oss.sgi.com >>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>>>> >>>>>> _______________________________________________ >>>>>> xfs mailing list >>>>>> xfs@oss.sgi.com >>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>> >>>> _______________________________________________ >>>> xfs mailing list >>>> xfs@oss.sgi.com >>>> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-15 12:41 ` Stefan Priebe @ 2016-05-16 1:06 ` Brian Foster 2016-05-22 19:36 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Brian Foster @ 2016-05-16 1:06 UTC (permalink / raw) To: Stefan Priebe; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com On Sun, May 15, 2016 at 02:41:40PM +0200, Stefan Priebe wrote: > Hi, > > find shows a ceph object file: > /var/lib/ceph/osd/ceph-13/current/3.29f_head/DIR_F/DIR_9/DIR_2/DIR_D/rbd\udata.904a406b8b4567.00000000000052d6__head_143BD29F__3 > Any idea what this file is? Does it represent user data, Ceph metadata? How was it created? Can you create others like it (I'm assuming via some file/block operation through Ceph) and/or reproduce the error? (Also, this thread is 20+ mails strong at this point, why is this the first reference to Ceph? :/) > File was again modified since than. > xfs_bmap -v might still be interesting. > > At another system i've different output. > [Sun May 15 07:00:44 2016] XFS (md127p3): ino 0x600204f delalloc 1 unwritten > 0 pgoff 0x50000 size 0x13d1c8 > [Sun May 15 07:00:44 2016] ------------[ cut here ]------------ > [Sun May 15 07:00:44 2016] WARNING: CPU: 2 PID: 108 at > fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140() This one is different, being a lingering delalloc block in this case. > [Sun May 15 07:00:44 2016] Modules linked in: netconsole ipt_REJECT > nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding > coretemp 8021q garp fuse xhci_pci xhci_hcd sb_edac edac_core i2c_i801 > i40e(O) shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler > button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg > sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas > [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O > 4.4.10+25-ph #1 How close is this to an upstream kernel? Upstream XFS? Have you tried to reproduce this on an upstream kernel? > [Sun May 15 07:00:44 2016] Hardware name: Supermicro Super Server/X10SRH-CF, > BIOS 1.0b 05/18/2015 > [Sun May 15 07:00:44 2016] 0000000000000000 ffff880c4da37a88 > ffffffff9c3c6d0f 0000000000000000 > [Sun May 15 07:00:44 2016] ffffffff9ca51a1c ffff880c4da37ac8 > ffffffff9c0837a7 ffff880c4da37ae8 > [Sun May 15 07:00:44 2016] 0000000000000001 ffffea0001053080 > ffff8801429ef490 ffffea00010530a0 > [Sun May 15 07:00:44 2016] Call Trace: > [Sun May 15 07:00:44 2016] [<ffffffff9c3c6d0f>] dump_stack+0x63/0x84 > [Sun May 15 07:00:44 2016] [<ffffffff9c0837a7>] > warn_slowpath_common+0x97/0xe0 > [Sun May 15 07:00:44 2016] [<ffffffff9c08380a>] > warn_slowpath_null+0x1a/0x20 > [Sun May 15 07:00:44 2016] [<ffffffff9c326f4f>] > xfs_vm_releasepage+0x10f/0x140 > [Sun May 15 07:00:44 2016] [<ffffffff9c1520c2>] > try_to_release_page+0x32/0x50 > [Sun May 15 07:00:44 2016] [<ffffffff9c166a8e>] > shrink_active_list+0x3ce/0x3e0 > [Sun May 15 07:00:44 2016] [<ffffffff9c167127>] shrink_lruvec+0x687/0x7d0 > [Sun May 15 07:00:44 2016] [<ffffffff9c16734c>] shrink_zone+0xdc/0x2c0 > [Sun May 15 07:00:44 2016] [<ffffffff9c168499>] kswapd+0x4f9/0x970 > [Sun May 15 07:00:44 2016] [<ffffffff9c167fa0>] ? > mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > [Sun May 15 07:00:44 2016] [<ffffffff9c0a0d99>] kthread+0xc9/0xe0 > [Sun May 15 07:00:44 2016] [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100 > [Sun May 15 07:00:44 2016] [<ffffffff9c6b58cf>] ret_from_fork+0x3f/0x70 > [Sun May 15 07:00:44 2016] [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100 > [Sun May 15 07:00:44 2016] ---[ end trace 9497d464aafe5b88 ]--- > [295086.353469] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0x51000 size 0x13d1c8 What is md127p3, is the root fs on some kind of raid device? Can you provide xfs_info for this filesystem? > [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0x52000 size 0x13d1c8 > [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0x53000 size 0x13d1c8 > [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0x54000 size 0x13d1c8 ... > [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0xab000 size 0x13d1c8 > [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0xac000 size 0x13d1c8 > [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > 0xad000 size 0x13d1c8 > > The file to the inode number is: > /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en > xfs_bmap -v might be interesting here as well. This certainly seems like it is more repeatable. According to google, the content of /var/lib/apt/lists/ can be removed and repopulated safely with 'apt-get update' (please verify before trying). Does that reproduce this variant of the problem? Note that the apt command might not directly cause the error message, but rather only create the conditions for it to occur sometime later via memory reclaim. E.g., you might need to run 'sync; echo 3 > /proc/sys/vm/drop_caches' after, or possibly run a dummy workload of some kind (e.g., dd if=/dev/zero of=tmpfile bs=1M ...) to cause memory pressure and reclaim the pagecache of the package list file. Brian > dmesg output / trace was at 7 am today and last modify of the file was > yesterday 11 pm. > > Stefan > > Am 15.05.2016 um 13:50 schrieb Brian Foster: > > On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote: > > > Hi Brian, > > > > > > here's the new trace: > > > [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff > > > 0x19f000 size 0x1a0000 > > > > So it is actually an unwritten buffer, on what appears to be the last > > page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers > > on write failure") that went into 4.6, but that was reproducible on > > sub-4k block size filesystems and depends on some kind of write error. > > Are either of those applicable here? Are you close to ENOSPC, for > > example? > > > > Otherwise, have you determined what file is associated with that inode > > (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some > > insight on what actually preallocates/writes the file and perhaps that > > helps us identify something we can trace. Also, if you think the file > > has not been modified since the error, an 'xfs_bmap -v <file>' might be > > interesting as well... > > > > Brian > > > > > [310740.407265] ------------[ cut here ]------------ > > > [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 > > > xfs_vm_releasepage+0x12e/0x140() > > > [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 > > > xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp > > > fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan > > > ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci > > > ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp > > > mpt3sas pps_core raid_class scsi_transport_sas > > > [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G O > > > 4.4.10+25-ph #1 > > > [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b > > > 05/18/2015 > > > [310740.407291] 0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f > > > 0000000000000000 > > > [310740.407292] ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 > > > ffff880c4da1fae8 > > > [310740.407293] 0000000000000000 ffffea0000e38140 ffff8807e20bfd10 > > > ffffea0000e38160 > > > [310740.407295] Call Trace: > > > [310740.407299] [<ffffffffa13c6d0f>] dump_stack+0x63/0x84 > > > [310740.407301] [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0 > > > [310740.407302] [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20 > > > [310740.407303] [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140 > > > [310740.407305] [<ffffffffa11520c2>] try_to_release_page+0x32/0x50 > > > [310740.407308] [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0 > > > [310740.407309] [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0 > > > [310740.407311] [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0 > > > [310740.407312] [<ffffffffa1168499>] kswapd+0x4f9/0x970 > > > [310740.407314] [<ffffffffa1167fa0>] ? > > > mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > > [310740.407316] [<ffffffffa10a0d99>] kthread+0xc9/0xe0 > > > [310740.407318] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 > > > [310740.407320] [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70 > > > [310740.407321] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 > > > [310740.407322] ---[ end trace bf76ad5e8a4d863e ]--- > > > > > > > > > Stefan > > > > > > Am 11.05.2016 um 17:59 schrieb Brian Foster: > > > > Dropped non-XFS cc's, probably no need to spam other lists at this > > > > point... > > > > > > > > On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: > > > > > > > > > > Am 11.05.2016 um 15:34 schrieb Brian Foster: > > > > > > On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: > > > > > > > Hi Brian, > > > > > > > > > > > > > > i'm still unable to grab anything to the trace file? Is there anything > > > > > > > to check if it's working at all? > > > > > > > > > > > > > > > > > > > See my previous mail: > > > > > > > > > > > > http://oss.sgi.com/pipermail/xfs/2016-March/047793.html > > > > > > > > > > > > E.g., something like this should work after writing to and removing a > > > > > > new file: > > > > > > > > > > > > # trace-cmd start -e "xfs:xfs_releasepage" > > > > > > # cat /sys/kernel/debug/tracing/trace_pipe > > > > > > ... > > > > > > rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 > > > > > > > > > > arg sorry yes that's working but delalloc is always 0. > > > > > > > > > > > > > Hrm, Ok. That is strange. > > > > > > > > > May be i have to hook that into my initramfs to be fast enough? > > > > > > > > > > > > > Not sure that would matter.. you said it occurs within 48 hours? I take > > > > that to mean it doesn't occur immediately on boot. You should be able to > > > > tell from the logs or dmesg if it happens before you get a chance to > > > > start the tracing. > > > > > > > > Well, the options I can think of are: > > > > > > > > - Perhaps I botched matching up the line number to the warning, in which > > > > case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch > > > > any delalloc or unwritten blocks at releasepage() time. > > > > > > > > - Perhaps there's a race that the tracepoint doesn't catch. The warnings > > > > are based on local vars, so we could instrument the code to print a > > > > warning[1] to try and get the inode number. > > > > > > > > Brian > > > > > > > > [1] - compile tested diff: > > > > > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > > > index 40645a4..94738ea 100644 > > > > --- a/fs/xfs/xfs_aops.c > > > > +++ b/fs/xfs/xfs_aops.c > > > > @@ -1038,11 +1038,18 @@ xfs_vm_releasepage( > > > > gfp_t gfp_mask) > > > > { > > > > int delalloc, unwritten; > > > > + struct xfs_inode *ip = XFS_I(page->mapping->host); > > > > > > > > trace_xfs_releasepage(page->mapping->host, page, 0, 0); > > > > > > > > xfs_count_page_state(page, &delalloc, &unwritten); > > > > > > > > + if (delalloc || unwritten) > > > > + xfs_warn(ip->i_mount, > > > > + "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", > > > > + ip->i_ino, delalloc, unwritten, page_offset(page), > > > > + i_size_read(page->mapping->host)); > > > > + > > > > if (WARN_ON_ONCE(delalloc)) > > > > return 0; > > > > if (WARN_ON_ONCE(unwritten)) > > > > > > > > > Stefan > > > > > > > > > > > Once that is working, add the grep command to filter out "delalloc 0" > > > > > > instances, etc. For example: > > > > > > > > > > > > cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out > > > > > > > > > > > > Brian > > > > > > > > > > > > > This still happens in the first 48 hours after a fresh reboot. > > > > > > > > > > > > > > Stefan > > > > > > > > > > > > > > Am 24.03.2016 um 13:24 schrieb Brian Foster: > > > > > > > > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: > > > > > > > > > > > > > > > > > > Am 24.03.2016 um 12:17 schrieb Brian Foster: > > > > > > > > > > On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: > > > > > > > > > > > > > > > > > > > > > > Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: > > > > > > > > > > > > > > > > > > > > > > > > Am 23.03.2016 um 15:07 schrieb Brian Foster: > > > > > > > > > > > > > On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: > > > > > > > > > > > > > > sorry new one the last one got mangled. Comments inside. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am 05.03.2016 um 23:48 schrieb Dave Chinner: > > > > > > > > > > > > > > > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: > > > > > > > > > > > > > > > > On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: > > > > > > > > > > > > > > > > > Am 04.03.2016 um 20:13 schrieb Brian Foster: > > > > > > > > > > > > > > > > > > On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: > > > > > > > > > > > > > > > > > > > Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > This has happened again on 8 different hosts in the last 24 hours > > > > > > > > > > > > > > running 4.4.6. > > > > > > > > > > > > > > > > > > > > > > > > > > > > All of those are KVM / Qemu hosts and are doing NO I/O except the normal > > > > > > > > > > > > > > OS stuff as the VMs have remote storage. So no database, no rsync on > > > > > > > > > > > > > > those hosts - just the OS doing nearly nothing. > > > > > > > > > > > > > > > > > > > > > > > > > > > > All those show: > > > > > > > > > > > > > > [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 > > > > > > > > > > > > > > xfs_vm_releasepage+0xe2/0xf0() > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok, well at this point the warning isn't telling us anything beyond > > > > > > > > > > > > > you're reproducing the problem. We can't really make progress without > > > > > > > > > > > > > more information. We don't necessarily know what application or > > > > > > > > > > > > > operations caused this by the time it occurs, but perhaps knowing what > > > > > > > > > > > > > file is affected could give us a hint. > > > > > > > > > > > > > > > > > > > > > > > > > > We have the xfs_releasepage tracepoint, but that's unconditional and so > > > > > > > > > > > > > might generate a lot of noise by default. Could you enable the > > > > > > > > > > > > > xfs_releasepage tracepoint and hunt for instances where delalloc != 0? > > > > > > > > > > > > > E.g., we could leave a long running 'trace-cmd record -e > > > > > > > > > > > > > "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the > > > > > > > > > > > > > problem to occur. Alternatively (and maybe easier), run 'trace-cmd start > > > > > > > > > > > > > -e "xfs:xfs_releasepage"' and leave something like 'cat > > > > > > > > > > > > > /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > > > > > > > > > > > > > > ~/trace.out' running to capture instances. > > > > > > > > > > > > > > > > > > > > > > Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the > > > > > > > > > > > it in the trace.out even the WARN_ONCE was already triggered? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The tracepoint is independent from the warning (see > > > > > > > > > > xfs_vm_releasepage()), so the tracepoint will fire every invocation of > > > > > > > > > > the function regardless of whether delalloc blocks still exist at that > > > > > > > > > > point. That creates the need to filter the entries. > > > > > > > > > > > > > > > > > > > > With regard to performance, I believe the tracepoints are intended to be > > > > > > > > > > pretty lightweight. I don't think it should hurt to try it on a box, > > > > > > > > > > observe for a bit and make sure there isn't a huge impact. Note that the > > > > > > > > > > 'trace-cmd record' approach will save everything to file, so that's > > > > > > > > > > something to consider I suppose. > > > > > > > > > > > > > > > > > > Tests / cat is running. Is there any way to test if it works? Or is it > > > > > > > > > enough that cat prints stuff from time to time but does not match -v > > > > > > > > > delalloc 0 > > > > > > > > > > > > > > > > > > > > > > > > > What is it printing where delalloc != 0? You could always just cat > > > > > > > > trace_pipe and make sure the event is firing, it's just that I suspect > > > > > > > > most entries will have delalloc == unwritten == 0. > > > > > > > > > > > > > > > > Also, while the tracepoint fires independent of the warning, it might > > > > > > > > not be a bad idea to restart a system that has already seen the warning > > > > > > > > since boot, just to provide some correlation or additional notification > > > > > > > > when the problem occurs. > > > > > > > > > > > > > > > > Brian > > > > > > > > > > > > > > > > > Stefan > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > xfs mailing list > > > > > > > > > xfs@oss.sgi.com > > > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs > > > > > > > > > > > > > > _______________________________________________ > > > > > > > xfs mailing list > > > > > > > xfs@oss.sgi.com > > > > > > > http://oss.sgi.com/mailman/listinfo/xfs > > > > > > > > > > _______________________________________________ > > > > > xfs mailing list > > > > > xfs@oss.sgi.com > > > > > http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-16 1:06 ` Brian Foster @ 2016-05-22 19:36 ` Stefan Priebe - Profihost AG 2016-05-22 21:38 ` Dave Chinner 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-22 19:36 UTC (permalink / raw) To: Brian Foster; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com Am 16.05.2016 um 03:06 schrieb Brian Foster: > On Sun, May 15, 2016 at 02:41:40PM +0200, Stefan Priebe wrote: >> Hi, >> >> find shows a ceph object file: >> /var/lib/ceph/osd/ceph-13/current/3.29f_head/DIR_F/DIR_9/DIR_2/DIR_D/rbd\udata.904a406b8b4567.00000000000052d6__head_143BD29F__3 >> > > Any idea what this file is? Does it represent user data, Ceph metadata? It's user data. > How was it created? Can you create others like it (I'm assuming via some > file/block operation through Ceph) and/or reproduce the error? It's the ceph osd daemon creating those files. It works with normal file operations. I'm not able to force this. > (Also, this thread is 20+ mails strong at this point, why is this the > first reference to Ceph? :/) Cause i still see no reference to ceph. It happens also on non ceph systems. >> File was again modified since than. >> > xfs_bmap -v might still be interesting. I was on holiday the last days - the file got deleted by ceph. Should i recollect everything from a new trace? >> At another system i've different output. >> [Sun May 15 07:00:44 2016] XFS (md127p3): ino 0x600204f delalloc 1 unwritten >> 0 pgoff 0x50000 size 0x13d1c8 >> [Sun May 15 07:00:44 2016] ------------[ cut here ]------------ >> [Sun May 15 07:00:44 2016] WARNING: CPU: 2 PID: 108 at >> fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140() > > This one is different, being a lingering delalloc block in this case. > >> [Sun May 15 07:00:44 2016] Modules linked in: netconsole ipt_REJECT >> nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding >> coretemp 8021q garp fuse xhci_pci xhci_hcd sb_edac edac_core i2c_i801 >> i40e(O) shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler >> button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg >> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci >> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas >> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O >> 4.4.10+25-ph #1 > > How close is this to an upstream kernel? Upstream XFS? Have you tried to > reproduce this on an upstream kernel? It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq patches from 4.5 and 4.6 but i can try to replace the kernel on one machine with a 100% vanilla one if this helps. >> [Sun May 15 07:00:44 2016] Hardware name: Supermicro Super Server/X10SRH-CF, >> BIOS 1.0b 05/18/2015 >> [Sun May 15 07:00:44 2016] 0000000000000000 ffff880c4da37a88 >> ffffffff9c3c6d0f 0000000000000000 >> [Sun May 15 07:00:44 2016] ffffffff9ca51a1c ffff880c4da37ac8 >> ffffffff9c0837a7 ffff880c4da37ae8 >> [Sun May 15 07:00:44 2016] 0000000000000001 ffffea0001053080 >> ffff8801429ef490 ffffea00010530a0 >> [Sun May 15 07:00:44 2016] Call Trace: >> [Sun May 15 07:00:44 2016] [<ffffffff9c3c6d0f>] dump_stack+0x63/0x84 >> [Sun May 15 07:00:44 2016] [<ffffffff9c0837a7>] >> warn_slowpath_common+0x97/0xe0 >> [Sun May 15 07:00:44 2016] [<ffffffff9c08380a>] >> warn_slowpath_null+0x1a/0x20 >> [Sun May 15 07:00:44 2016] [<ffffffff9c326f4f>] >> xfs_vm_releasepage+0x10f/0x140 >> [Sun May 15 07:00:44 2016] [<ffffffff9c1520c2>] >> try_to_release_page+0x32/0x50 >> [Sun May 15 07:00:44 2016] [<ffffffff9c166a8e>] >> shrink_active_list+0x3ce/0x3e0 >> [Sun May 15 07:00:44 2016] [<ffffffff9c167127>] shrink_lruvec+0x687/0x7d0 >> [Sun May 15 07:00:44 2016] [<ffffffff9c16734c>] shrink_zone+0xdc/0x2c0 >> [Sun May 15 07:00:44 2016] [<ffffffff9c168499>] kswapd+0x4f9/0x970 >> [Sun May 15 07:00:44 2016] [<ffffffff9c167fa0>] ? >> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >> [Sun May 15 07:00:44 2016] [<ffffffff9c0a0d99>] kthread+0xc9/0xe0 >> [Sun May 15 07:00:44 2016] [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100 >> [Sun May 15 07:00:44 2016] [<ffffffff9c6b58cf>] ret_from_fork+0x3f/0x70 >> [Sun May 15 07:00:44 2016] [<ffffffff9c0a0cd0>] ? kthread_stop+0x100/0x100 >> [Sun May 15 07:00:44 2016] ---[ end trace 9497d464aafe5b88 ]--- >> [295086.353469] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0x51000 size 0x13d1c8 > > What is md127p3, is the root fs on some kind of raid device? Can you > provide xfs_info for this filesystem? It's a mdadm raid 1 amd the root fs. # xfs_info / meta-data=/dev/disk/by-uuid/afffa232-0025-4222-9952-adb31482fe4a isize=256 agcount=4, agsize=1703936 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=6815744, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=3328, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 >> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0x52000 size 0x13d1c8 >> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0x53000 size 0x13d1c8 >> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0x54000 size 0x13d1c8 > ... >> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0xab000 size 0x13d1c8 >> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0xac000 size 0x13d1c8 >> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >> 0xad000 size 0x13d1c8 >> >> The file to the inode number is: >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >> > > xfs_bmap -v might be interesting here as well. # xfs_bmap -v /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..2567]: 41268928..41271495 3 (374464..377031) 2568 > This certainly seems like it is more repeatable. According to google, > the content of /var/lib/apt/lists/ can be removed and repopulated safely > with 'apt-get update' (please verify before trying). Does that reproduce > this variant of the problem? > Note that the apt command might not directly cause the error message, > but rather only create the conditions for it to occur sometime later via > memory reclaim. E.g., you might need to run 'sync; echo 3 > > /proc/sys/vm/drop_caches' after, or possibly run a dummy workload of > some kind (e.g., dd if=/dev/zero of=tmpfile bs=1M ...) to cause memory > pressure and reclaim the pagecache of the package list file. OK - this is what i did but no trace: 106 22.05.2016 - 21:31:03 reboot 108 22.05.2016 - 21:33:25 dmesg -c 109 22.05.2016 - 21:33:51 mv /var/lib/apt/lists /var/lib/apt/lists.backup 110 22.05.2016 - 21:33:54 apt-get update 111 22.05.2016 - 21:34:09 ls -la /var/lib/apt/lists 112 22.05.2016 - 21:34:58 dmesg 113 22.05.2016 - 21:35:14 sync; echo 3 >/proc/sys/vm/drop_caches 114 22.05.2016 - 21:35:17 dmesg 115 22.05.2016 - 21:35:50 dd if=/dev/zero of=tmpfile bs=1M count=4096; rm -v tmpfile 116 22.05.2016 - 21:35:55 dmesg Greets, Stefan > Brian > >> dmesg output / trace was at 7 am today and last modify of the file was >> yesterday 11 pm. >> >> Stefan >> >> Am 15.05.2016 um 13:50 schrieb Brian Foster: >>> On Sun, May 15, 2016 at 01:03:07PM +0200, Stefan Priebe wrote: >>>> Hi Brian, >>>> >>>> here's the new trace: >>>> [310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff >>>> 0x19f000 size 0x1a0000 >>> >>> So it is actually an unwritten buffer, on what appears to be the last >>> page of the file. Well, we had 60630fe ("xfs: clean up unwritten buffers >>> on write failure") that went into 4.6, but that was reproducible on >>> sub-4k block size filesystems and depends on some kind of write error. >>> Are either of those applicable here? Are you close to ENOSPC, for >>> example? >>> >>> Otherwise, have you determined what file is associated with that inode >>> (e.g., 'find <mnt> -inum 0x27c69cd -print')? I'm hoping that gives some >>> insight on what actually preallocates/writes the file and perhaps that >>> helps us identify something we can trace. Also, if you think the file >>> has not been modified since the error, an 'xfs_bmap -v <file>' might be >>> interesting as well... >>> >>> Brian >>> >>>> [310740.407265] ------------[ cut here ]------------ >>>> [310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 >>>> xfs_vm_releasepage+0x12e/0x140() >>>> [310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 >>>> xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp >>>> fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan >>>> ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs xor >>>> raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci >>>> ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit i2c_core ptp >>>> mpt3sas pps_core raid_class scsi_transport_sas >>>> [310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G O >>>> 4.4.10+25-ph #1 >>>> [310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b >>>> 05/18/2015 >>>> [310740.407291] 0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f >>>> 0000000000000000 >>>> [310740.407292] ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 >>>> ffff880c4da1fae8 >>>> [310740.407293] 0000000000000000 ffffea0000e38140 ffff8807e20bfd10 >>>> ffffea0000e38160 >>>> [310740.407295] Call Trace: >>>> [310740.407299] [<ffffffffa13c6d0f>] dump_stack+0x63/0x84 >>>> [310740.407301] [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0 >>>> [310740.407302] [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20 >>>> [310740.407303] [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140 >>>> [310740.407305] [<ffffffffa11520c2>] try_to_release_page+0x32/0x50 >>>> [310740.407308] [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0 >>>> [310740.407309] [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0 >>>> [310740.407311] [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0 >>>> [310740.407312] [<ffffffffa1168499>] kswapd+0x4f9/0x970 >>>> [310740.407314] [<ffffffffa1167fa0>] ? >>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >>>> [310740.407316] [<ffffffffa10a0d99>] kthread+0xc9/0xe0 >>>> [310740.407318] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 >>>> [310740.407320] [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70 >>>> [310740.407321] [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100 >>>> [310740.407322] ---[ end trace bf76ad5e8a4d863e ]--- >>>> >>>> >>>> Stefan >>>> >>>> Am 11.05.2016 um 17:59 schrieb Brian Foster: >>>>> Dropped non-XFS cc's, probably no need to spam other lists at this >>>>> point... >>>>> >>>>> On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 11.05.2016 um 15:34 schrieb Brian Foster: >>>>>>> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hi Brian, >>>>>>>> >>>>>>>> i'm still unable to grab anything to the trace file? Is there anything >>>>>>>> to check if it's working at all? >>>>>>>> >>>>>>> >>>>>>> See my previous mail: >>>>>>> >>>>>>> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html >>>>>>> >>>>>>> E.g., something like this should work after writing to and removing a >>>>>>> new file: >>>>>>> >>>>>>> # trace-cmd start -e "xfs:xfs_releasepage" >>>>>>> # cat /sys/kernel/debug/tracing/trace_pipe >>>>>>> ... >>>>>>> rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0 >>>>>> >>>>>> arg sorry yes that's working but delalloc is always 0. >>>>>> >>>>> >>>>> Hrm, Ok. That is strange. >>>>> >>>>>> May be i have to hook that into my initramfs to be fast enough? >>>>>> >>>>> >>>>> Not sure that would matter.. you said it occurs within 48 hours? I take >>>>> that to mean it doesn't occur immediately on boot. You should be able to >>>>> tell from the logs or dmesg if it happens before you get a chance to >>>>> start the tracing. >>>>> >>>>> Well, the options I can think of are: >>>>> >>>>> - Perhaps I botched matching up the line number to the warning, in which >>>>> case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch >>>>> any delalloc or unwritten blocks at releasepage() time. >>>>> >>>>> - Perhaps there's a race that the tracepoint doesn't catch. The warnings >>>>> are based on local vars, so we could instrument the code to print a >>>>> warning[1] to try and get the inode number. >>>>> >>>>> Brian >>>>> >>>>> [1] - compile tested diff: >>>>> >>>>> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c >>>>> index 40645a4..94738ea 100644 >>>>> --- a/fs/xfs/xfs_aops.c >>>>> +++ b/fs/xfs/xfs_aops.c >>>>> @@ -1038,11 +1038,18 @@ xfs_vm_releasepage( >>>>> gfp_t gfp_mask) >>>>> { >>>>> int delalloc, unwritten; >>>>> + struct xfs_inode *ip = XFS_I(page->mapping->host); >>>>> >>>>> trace_xfs_releasepage(page->mapping->host, page, 0, 0); >>>>> >>>>> xfs_count_page_state(page, &delalloc, &unwritten); >>>>> >>>>> + if (delalloc || unwritten) >>>>> + xfs_warn(ip->i_mount, >>>>> + "ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx", >>>>> + ip->i_ino, delalloc, unwritten, page_offset(page), >>>>> + i_size_read(page->mapping->host)); >>>>> + >>>>> if (WARN_ON_ONCE(delalloc)) >>>>> return 0; >>>>> if (WARN_ON_ONCE(unwritten)) >>>>> >>>>>> Stefan >>>>>> >>>>>>> Once that is working, add the grep command to filter out "delalloc 0" >>>>>>> instances, etc. For example: >>>>>>> >>>>>>> cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out >>>>>>> >>>>>>> Brian >>>>>>> >>>>>>>> This still happens in the first 48 hours after a fresh reboot. >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>> Am 24.03.2016 um 13:24 schrieb Brian Foster: >>>>>>>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> >>>>>>>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster: >>>>>>>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>> >>>>>>>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>> >>>>>>>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster: >>>>>>>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>> sorry new one the last one got mangled. Comments inside. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner: >>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote: >>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster: >>>>>>>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote: >>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours >>>>>>>>>>>>>>> running 4.4.6. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal >>>>>>>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on >>>>>>>>>>>>>>> those hosts - just the OS doing nearly nothing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All those show: >>>>>>>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234 >>>>>>>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0() >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond >>>>>>>>>>>>>> you're reproducing the problem. We can't really make progress without >>>>>>>>>>>>>> more information. We don't necessarily know what application or >>>>>>>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what >>>>>>>>>>>>>> file is affected could give us a hint. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so >>>>>>>>>>>>>> might generate a lot of noise by default. Could you enable the >>>>>>>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0? >>>>>>>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e >>>>>>>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the >>>>>>>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start >>>>>>>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat >>>>>>>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" > >>>>>>>>>>>>>> ~/trace.out' running to capture instances. >>>>>>>>>>>> >>>>>>>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the >>>>>>>>>>>> it in the trace.out even the WARN_ONCE was already triggered? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The tracepoint is independent from the warning (see >>>>>>>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of >>>>>>>>>>> the function regardless of whether delalloc blocks still exist at that >>>>>>>>>>> point. That creates the need to filter the entries. >>>>>>>>>>> >>>>>>>>>>> With regard to performance, I believe the tracepoints are intended to be >>>>>>>>>>> pretty lightweight. I don't think it should hurt to try it on a box, >>>>>>>>>>> observe for a bit and make sure there isn't a huge impact. Note that the >>>>>>>>>>> 'trace-cmd record' approach will save everything to file, so that's >>>>>>>>>>> something to consider I suppose. >>>>>>>>>> >>>>>>>>>> Tests / cat is running. Is there any way to test if it works? Or is it >>>>>>>>>> enough that cat prints stuff from time to time but does not match -v >>>>>>>>>> delalloc 0 >>>>>>>>>> >>>>>>>>> >>>>>>>>> What is it printing where delalloc != 0? You could always just cat >>>>>>>>> trace_pipe and make sure the event is firing, it's just that I suspect >>>>>>>>> most entries will have delalloc == unwritten == 0. >>>>>>>>> >>>>>>>>> Also, while the tracepoint fires independent of the warning, it might >>>>>>>>> not be a bad idea to restart a system that has already seen the warning >>>>>>>>> since boot, just to provide some correlation or additional notification >>>>>>>>> when the problem occurs. >>>>>>>>> >>>>>>>>> Brian >>>>>>>>> >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> xfs mailing list >>>>>>>>>> xfs@oss.sgi.com >>>>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> xfs mailing list >>>>>>>> xfs@oss.sgi.com >>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs >>>>>> >>>>>> _______________________________________________ >>>>>> xfs mailing list >>>>>> xfs@oss.sgi.com >>>>>> http://oss.sgi.com/mailman/listinfo/xfs >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-22 19:36 ` Stefan Priebe - Profihost AG @ 2016-05-22 21:38 ` Dave Chinner 2016-05-30 7:23 ` Stefan Priebe - Profihost AG 2016-06-03 17:56 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG 0 siblings, 2 replies; 49+ messages in thread From: Dave Chinner @ 2016-05-22 21:38 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: xfs-masters@oss.sgi.com, Brian Foster, xfs@oss.sgi.com On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote: > Am 16.05.2016 um 03:06 schrieb Brian Foster: > >> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > >> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas > >> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O > >> 4.4.10+25-ph #1 > > > > How close is this to an upstream kernel? Upstream XFS? Have you tried to > > reproduce this on an upstream kernel? > > It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq > patches from 4.5 and 4.6 but i can try to replace the kernel on one > machine with a 100% vanilla one if this helps. Please do. > >> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > >> 0x52000 size 0x13d1c8 > >> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > >> 0x53000 size 0x13d1c8 > >> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > >> 0x54000 size 0x13d1c8 > > ... > >> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > >> 0xab000 size 0x13d1c8 > >> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > >> 0xac000 size 0x13d1c8 > >> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff > >> 0xad000 size 0x13d1c8 > >> > >> The file to the inode number is: > >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en > >> > > > > xfs_bmap -v might be interesting here as well. > > # xfs_bmap -v > /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en > /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL > 0: [0..2567]: 41268928..41271495 3 (374464..377031) 2568 So the last file offset with a block is 0x140e00. This means the file is fully allocated. However, the pages inside the file range are still marked delayed allocation. That implies that we've failed to write the pages over a delayed allocation region after we've allocated the space. That, in turn, tends to indicate a problem in page writeback - the first page to be written has triggered delayed allocation of the entire range, but then the subsequent pages have not been written (for some as yet unknown reason). When a page is written, we map it to the current block via xfs_map_at_offset(), and that clears both the buffer delay and unwritten flags. This clearly isn't happening which means either the VFS doesn't think the inode is dirty anymore, writeback is never asking for these pages to be written, or XFs is screwing something up in ->writepage. The XFS writepage code changed significantly in 4.6, so it might be worth seeing if a 4.6 kernel reproduces this same problem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-22 21:38 ` Dave Chinner @ 2016-05-30 7:23 ` Stefan Priebe - Profihost AG 2016-05-30 22:36 ` shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) Dave Chinner 2016-06-03 17:56 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG 1 sibling, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-30 7:23 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs-masters@oss.sgi.com, Brian Foster, xfs@oss.sgi.com Hi Dave, Hi Brian, below are the results with a vanilla 4.4.11 kernel. Am 22.05.2016 um 23:38 schrieb Dave Chinner: > On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote: >> Am 16.05.2016 um 03:06 schrieb Brian Foster: >>>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci >>>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas >>>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O >>>> 4.4.10+25-ph #1 >>> >>> How close is this to an upstream kernel? Upstream XFS? Have you tried to >>> reproduce this on an upstream kernel? >> >> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq >> patches from 4.5 and 4.6 but i can try to replace the kernel on one >> machine with a 100% vanilla one if this helps. > > Please do. > >>>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0x52000 size 0x13d1c8 >>>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0x53000 size 0x13d1c8 >>>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0x54000 size 0x13d1c8 >>> ... >>>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0xab000 size 0x13d1c8 >>>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0xac000 size 0x13d1c8 >>>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0xad000 size 0x13d1c8 >>>> >>>> The file to the inode number is: >>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >>>> >>> >>> xfs_bmap -v might be interesting here as well. >> >> # xfs_bmap -v >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL >> 0: [0..2567]: 41268928..41271495 3 (374464..377031) 2568 > > So the last file offset with a block is 0x140e00. This means the > file is fully allocated. However, the pages inside the file range > are still marked delayed allocation. That implies that we've failed > to write the pages over a delayed allocation region after we've > allocated the space. > > That, in turn, tends to indicate a problem in page writeback - the > first page to be written has triggered delayed allocation of the > entire range, but then the subsequent pages have not been written > (for some as yet unknown reason). When a page is written, we map it > to the current block via xfs_map_at_offset(), and that clears both > the buffer delay and unwritten flags. > > This clearly isn't happening which means either the VFS doesn't > think the inode is dirty anymore, writeback is never asking for > these pages to be written, or XFs is screwing something up in > ->writepage. The XFS writepage code changed significantly in 4.6, so > it might be worth seeing if a 4.6 kernel reproduces this same > problem.... i've now used a vanilla 4.4.11 Kernel and the issue remains. After a fresh reboot it has happened again on the root FS for a debian apt file: XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 ------------[ cut here ]------------ WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 xfs_vm_releasepage+0x10f/0x140() Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 Call Trace: [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 [<ffffffffa2168539>] kswapd+0x4f9/0x970 [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [<ffffffffa20a0d99>] kthread+0xc9/0xe0 [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 ---[ end trace c9d679f8ed4d7610 ]--- XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size 0x12b990 XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size 0x12b990 XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x3000 size 0x12b990 XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x4000 size 0x12b990 XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x5000 size 0x12b990 XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x6000 size 0x12b990 XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x7000 size 0x12b990 XFS (md127p3): ino 0x400de4c delalloc 1 unwritten 0 pgoff 0x12000 size 0x2cc69 # find / -inum $(printf "%d" 0x41221d1) -print /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_source_Sources # xfs_bmap -v /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_source_Sources /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_source_Sources: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..2399]: 27851552..27853951 2 (588576..590975) 2400 So you mean the next step would be to test 4.6? I hope this is stable enough for production usage. Greets, Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-30 7:23 ` Stefan Priebe - Profihost AG @ 2016-05-30 22:36 ` Dave Chinner 2016-05-31 1:07 ` Minchan Kim 0 siblings, 1 reply; 49+ messages in thread From: Dave Chinner @ 2016-05-30 22:36 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, Brian Foster, linux-kernel, xfs@oss.sgi.com [adding lkml and linux-mm to the cc list] On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: > Hi Dave, > Hi Brian, > > below are the results with a vanilla 4.4.11 kernel. Thanks for persisting with the testing, Stefan. .... > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a > fresh reboot it has happened again on the root FS for a debian apt file: > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 > xfs_vm_releasepage+0x10f/0x140() > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 > Call Trace: > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 > [<ffffffffa2168539>] kswapd+0x4f9/0x970 > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > ---[ end trace c9d679f8ed4d7610 ]--- > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size > 0x12b990 > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size ..... Ok, I suspect this may be a VM bug. I've been looking at the 4.6 code (so please try to reproduce on that kernel!) but it looks to me like the only way we can get from shrink_active_list() direct to try_to_release_page() is if we are over the maximum bufferhead threshold (i.e buffer_heads_over_limit = true) and we are trying to reclaim pages direct from the active list. Because we are called from kswapd()->balance_pgdat(), we have: struct scan_control sc = { .gfp_mask = GFP_KERNEL, .order = order, .priority = DEF_PRIORITY, .may_writepage = !laptop_mode, .may_unmap = 1, .may_swap = 1, }; The key point here is reclaim is being run with .may_writepage = true for default configuration kernels. when we get to shrink_active_list(): if (!sc->may_writepage) isolate_mode |= ISOLATE_CLEAN; But sc->may_writepage = true and this allows isolate_lru_pages() to isolate dirty pages from the active list. Normally this isn't a problem, because the isolated active list pages are rotated to the inactive list, and nothing else happens to them. *Except when buffer_heads_over_limit = true*. This special condition would explain why I have never seen apt/dpkg cause this problem on any of my (many) Debian systems that all use XFS.... In that case, shrink_active_list() runs: if (unlikely(buffer_heads_over_limit)) { if (page_has_private(page) && trylock_page(page)) { if (page_has_private(page)) try_to_release_page(page, 0); unlock_page(page); } } i.e. it locks the page, and if it has buffer heads it trys to get the bufferheads freed from the page. But this is a dirty page, which means it may have delalloc or unwritten state on it's buffers, both of which indicate that there is dirty data in teh page that hasn't been written. XFS issues a warning on this because neither shrink_active_list nor try_to_release_page() check for whether the page is dirty or not. Hence it seems to me that shrink_active_list() is calling try_to_release_page() inappropriately, and XFS is just the messenger. If you turn laptop mode on, it is likely the problem will go away as kswapd will run with .may_writepage = false, but that will also cause other behavioural changes relating to writeback and memory reclaim. It might be worth trying as a workaround for now. MM-folk - is this analysis correct? If so, why is shrink_active_list() calling try_to_release_page() on dirty pages? Is this just an oversight or is there some problem that this is trying to work around? It seems trivial to fix to me (add a !PageDirty check), but I don't know why the check is there in the first place... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-30 22:36 ` shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) Dave Chinner @ 2016-05-31 1:07 ` Minchan Kim 2016-05-31 2:55 ` Dave Chinner 2016-05-31 9:50 ` Jan Kara 0 siblings, 2 replies; 49+ messages in thread From: Minchan Kim @ 2016-05-31 1:07 UTC (permalink / raw) To: Dave Chinner Cc: linux-mm, Brian Foster, xfs@oss.sgi.com, linux-kernel, Stefan Priebe - Profihost AG On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > [adding lkml and linux-mm to the cc list] > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: > > Hi Dave, > > Hi Brian, > > > > below are the results with a vanilla 4.4.11 kernel. > > Thanks for persisting with the testing, Stefan. > > .... > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a > > fresh reboot it has happened again on the root FS for a debian apt file: > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 > > ------------[ cut here ]------------ > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 > > xfs_vm_releasepage+0x10f/0x140() > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 > > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 > > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 > > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 > > Call Trace: > > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 > > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 > > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 > > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 > > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 > > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 > > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 > > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 > > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 > > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 > > [<ffffffffa2168539>] kswapd+0x4f9/0x970 > > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > ---[ end trace c9d679f8ed4d7610 ]--- > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size > > 0x12b990 > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size > ..... > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6 > code (so please try to reproduce on that kernel!) but it looks to me > like the only way we can get from shrink_active_list() direct to > try_to_release_page() is if we are over the maximum bufferhead > threshold (i.e buffer_heads_over_limit = true) and we are trying to > reclaim pages direct from the active list. > > Because we are called from kswapd()->balance_pgdat(), we have: > > struct scan_control sc = { > .gfp_mask = GFP_KERNEL, > .order = order, > .priority = DEF_PRIORITY, > .may_writepage = !laptop_mode, > .may_unmap = 1, > .may_swap = 1, > }; > > The key point here is reclaim is being run with .may_writepage = > true for default configuration kernels. when we get to > shrink_active_list(): > > if (!sc->may_writepage) > isolate_mode |= ISOLATE_CLEAN; > > But sc->may_writepage = true and this allows isolate_lru_pages() to > isolate dirty pages from the active list. Normally this isn't a > problem, because the isolated active list pages are rotated to the > inactive list, and nothing else happens to them. *Except when > buffer_heads_over_limit = true*. This special condition would > explain why I have never seen apt/dpkg cause this problem on any of > my (many) Debian systems that all use XFS.... > > In that case, shrink_active_list() runs: > > if (unlikely(buffer_heads_over_limit)) { > if (page_has_private(page) && trylock_page(page)) { > if (page_has_private(page)) > try_to_release_page(page, 0); > unlock_page(page); > } > } > > i.e. it locks the page, and if it has buffer heads it trys to get > the bufferheads freed from the page. > > But this is a dirty page, which means it may have delalloc or > unwritten state on it's buffers, both of which indicate that there > is dirty data in teh page that hasn't been written. XFS issues a > warning on this because neither shrink_active_list nor > try_to_release_page() check for whether the page is dirty or not. > > Hence it seems to me that shrink_active_list() is calling > try_to_release_page() inappropriately, and XFS is just the > messenger. If you turn laptop mode on, it is likely the problem will > go away as kswapd will run with .may_writepage = false, but that > will also cause other behavioural changes relating to writeback and > memory reclaim. It might be worth trying as a workaround for now. > > MM-folk - is this analysis correct? If so, why is > shrink_active_list() calling try_to_release_page() on dirty pages? > Is this just an oversight or is there some problem that this is > trying to work around? It seems trivial to fix to me (add a > !PageDirty check), but I don't know why the check is there in the > first place... It seems to be latter. Below commit seems to be related. [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] At that time, even shrink_page_list works like this. shrink_page_list while (!list_empty(page_list)) { .. .. if (PageDirty(page)) { .. } /* * If the page has buffers, try to free the buffer mappings * associated with this page. If we succeed we try to free * the page as well. * * We do this even if the page is PageDirty(). * try_to_release_page() does not perform I/O, but it is * possible for a page to have PageDirty set, but it is actually * clean (all its buffers are clean). This happens if the * buffers were written out directly, with submit_bh(). ext3 * will do this, as well as the blockdev mapping. * try_to_release_page() will discover that cleanness and will * drop the buffers and mark the page clean - it can be freed. * .. */ if (PagePrivate(page)) { if (!try_to_release_page(page, sc->gfp_mask)) goto activate_locked; if (!mapping && page_count(page) == 1) goto free_it; } .. } I wonder whether it's valid or not with on ext4. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 1:07 ` Minchan Kim @ 2016-05-31 2:55 ` Dave Chinner 2016-05-31 3:59 ` Minchan Kim 2016-05-31 9:50 ` Jan Kara 1 sibling, 1 reply; 49+ messages in thread From: Dave Chinner @ 2016-05-31 2:55 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm, Brian Foster, xfs@oss.sgi.com, linux-kernel, Stefan Priebe - Profihost AG On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote: > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > [adding lkml and linux-mm to the cc list] > > > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: > > > Hi Dave, > > > Hi Brian, > > > > > > below are the results with a vanilla 4.4.11 kernel. > > > > Thanks for persisting with the testing, Stefan. > > > > .... > > > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a > > > fresh reboot it has happened again on the root FS for a debian apt file: > > > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 > > > ------------[ cut here ]------------ > > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 > > > xfs_vm_releasepage+0x10f/0x140() > > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport > > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse > > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan > > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod > > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas > > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 > > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 > > > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 > > > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 > > > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 > > > Call Trace: > > > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 > > > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 > > > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 > > > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 > > > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 > > > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 > > > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 > > > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 > > > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 > > > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 > > > [<ffffffffa2168539>] kswapd+0x4f9/0x970 > > > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > ---[ end trace c9d679f8ed4d7610 ]--- > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size > > > 0x12b990 > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size > > ..... > > > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6 > > code (so please try to reproduce on that kernel!) but it looks to me > > like the only way we can get from shrink_active_list() direct to > > try_to_release_page() is if we are over the maximum bufferhead > > threshold (i.e buffer_heads_over_limit = true) and we are trying to > > reclaim pages direct from the active list. > > > > Because we are called from kswapd()->balance_pgdat(), we have: > > > > struct scan_control sc = { > > .gfp_mask = GFP_KERNEL, > > .order = order, > > .priority = DEF_PRIORITY, > > .may_writepage = !laptop_mode, > > .may_unmap = 1, > > .may_swap = 1, > > }; > > > > The key point here is reclaim is being run with .may_writepage = > > true for default configuration kernels. when we get to > > shrink_active_list(): > > > > if (!sc->may_writepage) > > isolate_mode |= ISOLATE_CLEAN; > > > > But sc->may_writepage = true and this allows isolate_lru_pages() to > > isolate dirty pages from the active list. Normally this isn't a > > problem, because the isolated active list pages are rotated to the > > inactive list, and nothing else happens to them. *Except when > > buffer_heads_over_limit = true*. This special condition would > > explain why I have never seen apt/dpkg cause this problem on any of > > my (many) Debian systems that all use XFS.... > > > > In that case, shrink_active_list() runs: > > > > if (unlikely(buffer_heads_over_limit)) { > > if (page_has_private(page) && trylock_page(page)) { > > if (page_has_private(page)) > > try_to_release_page(page, 0); > > unlock_page(page); > > } > > } > > > > i.e. it locks the page, and if it has buffer heads it trys to get > > the bufferheads freed from the page. > > > > But this is a dirty page, which means it may have delalloc or > > unwritten state on it's buffers, both of which indicate that there > > is dirty data in teh page that hasn't been written. XFS issues a > > warning on this because neither shrink_active_list nor > > try_to_release_page() check for whether the page is dirty or not. > > > > Hence it seems to me that shrink_active_list() is calling > > try_to_release_page() inappropriately, and XFS is just the > > messenger. If you turn laptop mode on, it is likely the problem will > > go away as kswapd will run with .may_writepage = false, but that > > will also cause other behavioural changes relating to writeback and > > memory reclaim. It might be worth trying as a workaround for now. > > > > MM-folk - is this analysis correct? If so, why is > > shrink_active_list() calling try_to_release_page() on dirty pages? > > Is this just an oversight or is there some problem that this is > > trying to work around? It seems trivial to fix to me (add a > > !PageDirty check), but I don't know why the check is there in the > > first place... > > It seems to be latter. > Below commit seems to be related. > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] Okay, that's been there a long, long time (2007), and it covers a case where the filesystem cleans pages without the VM knowing about it (i.e. it marks bufferheads clean without clearing the PageDirty state). That does not explain the code in shrink_active_list(). > At that time, even shrink_page_list works like this. The current code in shrink_page_list still works this way - the PageDirty code will *jump over the PagePrivate case* if the page is to remain dirty or pageout() fails to make it clean. Hence it never gets to try_to_release_page() on a dirty page. Seems like this really needs a dirty check in shrink_active_list() and to leave the stripping of bufferheads from dirty pages in the ext3 corner case to shrink_inactive_list() once the dirty pages have been rotated off the active list... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 2:55 ` Dave Chinner @ 2016-05-31 3:59 ` Minchan Kim 2016-05-31 6:07 ` Dave Chinner 0 siblings, 1 reply; 49+ messages in thread From: Minchan Kim @ 2016-05-31 3:59 UTC (permalink / raw) To: Dave Chinner Cc: linux-mm, Brian Foster, xfs@oss.sgi.com, linux-kernel, Stefan Priebe - Profihost AG On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote: > On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote: > > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > > [adding lkml and linux-mm to the cc list] > > > > > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: > > > > Hi Dave, > > > > Hi Brian, > > > > > > > > below are the results with a vanilla 4.4.11 kernel. > > > > > > Thanks for persisting with the testing, Stefan. > > > > > > .... > > > > > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a > > > > fresh reboot it has happened again on the root FS for a debian apt file: > > > > > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 > > > > xfs_vm_releasepage+0x10f/0x140() > > > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport > > > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse > > > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan > > > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod > > > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > > > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas > > > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 > > > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 > > > > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 > > > > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 > > > > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 > > > > Call Trace: > > > > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 > > > > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 > > > > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 > > > > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 > > > > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 > > > > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 > > > > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 > > > > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 > > > > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 > > > > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 > > > > [<ffffffffa2168539>] kswapd+0x4f9/0x970 > > > > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > > > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 > > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 > > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > > ---[ end trace c9d679f8ed4d7610 ]--- > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size > > > > 0x12b990 > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size > > > ..... > > > > > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6 > > > code (so please try to reproduce on that kernel!) but it looks to me > > > like the only way we can get from shrink_active_list() direct to > > > try_to_release_page() is if we are over the maximum bufferhead > > > threshold (i.e buffer_heads_over_limit = true) and we are trying to > > > reclaim pages direct from the active list. > > > > > > Because we are called from kswapd()->balance_pgdat(), we have: > > > > > > struct scan_control sc = { > > > .gfp_mask = GFP_KERNEL, > > > .order = order, > > > .priority = DEF_PRIORITY, > > > .may_writepage = !laptop_mode, > > > .may_unmap = 1, > > > .may_swap = 1, > > > }; > > > > > > The key point here is reclaim is being run with .may_writepage = > > > true for default configuration kernels. when we get to > > > shrink_active_list(): > > > > > > if (!sc->may_writepage) > > > isolate_mode |= ISOLATE_CLEAN; > > > > > > But sc->may_writepage = true and this allows isolate_lru_pages() to > > > isolate dirty pages from the active list. Normally this isn't a > > > problem, because the isolated active list pages are rotated to the > > > inactive list, and nothing else happens to them. *Except when > > > buffer_heads_over_limit = true*. This special condition would > > > explain why I have never seen apt/dpkg cause this problem on any of > > > my (many) Debian systems that all use XFS.... > > > > > > In that case, shrink_active_list() runs: > > > > > > if (unlikely(buffer_heads_over_limit)) { > > > if (page_has_private(page) && trylock_page(page)) { > > > if (page_has_private(page)) > > > try_to_release_page(page, 0); > > > unlock_page(page); > > > } > > > } > > > > > > i.e. it locks the page, and if it has buffer heads it trys to get > > > the bufferheads freed from the page. > > > > > > But this is a dirty page, which means it may have delalloc or > > > unwritten state on it's buffers, both of which indicate that there > > > is dirty data in teh page that hasn't been written. XFS issues a > > > warning on this because neither shrink_active_list nor > > > try_to_release_page() check for whether the page is dirty or not. > > > > > > Hence it seems to me that shrink_active_list() is calling > > > try_to_release_page() inappropriately, and XFS is just the > > > messenger. If you turn laptop mode on, it is likely the problem will > > > go away as kswapd will run with .may_writepage = false, but that > > > will also cause other behavioural changes relating to writeback and > > > memory reclaim. It might be worth trying as a workaround for now. > > > > > > MM-folk - is this analysis correct? If so, why is > > > shrink_active_list() calling try_to_release_page() on dirty pages? > > > Is this just an oversight or is there some problem that this is > > > trying to work around? It seems trivial to fix to me (add a > > > !PageDirty check), but I don't know why the check is there in the > > > first place... > > > > It seems to be latter. > > Below commit seems to be related. > > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] > > Okay, that's been there a long, long time (2007), and it covers a > case where the filesystem cleans pages without the VM knowing about > it (i.e. it marks bufferheads clean without clearing the PageDirty > state). > > That does not explain the code in shrink_active_list(). Yeb, My point was the patch removed the PageDirty check in try_to_free_buffers. When I read description correctly, at that time, we wanted to check PageDirty in try_to_free_buffers but couldn't do with above ext3 corner case reason. iff --git a/fs/buffer.c b/fs/buffer.c index 3b116078b4c3..460f1c43238e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page) int ret = 0; BUG_ON(!PageLocked(page)); - if (PageDirty(page) || PageWriteback(page)) + if (PageWriteback(page)) return 0; And I found a culprit. e182d61263b7d5, [PATCH] buffer_head takedown for bighighmem machines It introduced pagevec_strip wich calls try_to_release_page without PageDirty check in refill_inactive_zone which is shrink_active_list now. Quote from " In refill_inactive(): if the number of buffer_heads is excessive then strip buffers from pages as they move onto the inactive list. This change is useful for all filesystems. This approach is good because pages which are being repeatedly overwritten will remain on the active list and will retain their buffers, whereas pages which are not being overwritten will be stripped. " > > > At that time, even shrink_page_list works like this. > > The current code in shrink_page_list still works this way - the > PageDirty code will *jump over the PagePrivate case* if the page is > to remain dirty or pageout() fails to make it clean. Hence it never > gets to try_to_release_page() on a dirty page. > > Seems like this really needs a dirty check in shrink_active_list() > and to leave the stripping of bufferheads from dirty pages in the > ext3 corner case to shrink_inactive_list() once the dirty pages have > been rotated off the active list... Another topic: I don't know file system at all so I might miss something. IMHO, if we should prohibit dirty page to fs->releasepage, isn't it better to move PageDirty warning check to try_to_release_page and clean it up all FSes's releasepage. diff --git a/mm/filemap.c b/mm/filemap.c index 00ae878b2a38..7c8b375c3475 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2821,8 +2821,10 @@ int try_to_release_page(struct page *page, gfp_t gfp_mask) if (PageWriteback(page)) return 0; - if (mapping && mapping->a_ops->releasepage) + if (mapping && mapping->a_ops->releasepage) { + WARN_ON(PageDirty(page)); return mapping->a_ops->releasepage(page, gfp_mask); + } return try_to_free_buffers(page); } diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 9a8bbc1fb1fa..89b432a90f59 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1795,10 +1795,6 @@ void f2fs_invalidate_page(struct page *page, unsigned int offset, int f2fs_release_page(struct page *page, gfp_t wait) { - /* If this is dirty page, keep PagePrivate */ - if (PageDirty(page)) - return 0; - /* This is atomic written page, keep Private */ if (IS_ATOMIC_WRITTEN_PAGE(page)) return 0; Otherwise, we can simply return 0 in try_to_relase_page if it finds dirty page. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 3:59 ` Minchan Kim @ 2016-05-31 6:07 ` Dave Chinner 2016-05-31 6:11 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 49+ messages in thread From: Dave Chinner @ 2016-05-31 6:07 UTC (permalink / raw) To: Minchan Kim Cc: linux-mm, Brian Foster, xfs@oss.sgi.com, linux-kernel, Stefan Priebe - Profihost AG On Tue, May 31, 2016 at 12:59:04PM +0900, Minchan Kim wrote: > On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote: > > On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote: > > > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > > > But this is a dirty page, which means it may have delalloc or > > > > unwritten state on it's buffers, both of which indicate that there > > > > is dirty data in teh page that hasn't been written. XFS issues a > > > > warning on this because neither shrink_active_list nor > > > > try_to_release_page() check for whether the page is dirty or not. > > > > > > > > Hence it seems to me that shrink_active_list() is calling > > > > try_to_release_page() inappropriately, and XFS is just the > > > > messenger. If you turn laptop mode on, it is likely the problem will > > > > go away as kswapd will run with .may_writepage = false, but that > > > > will also cause other behavioural changes relating to writeback and > > > > memory reclaim. It might be worth trying as a workaround for now. > > > > > > > > MM-folk - is this analysis correct? If so, why is > > > > shrink_active_list() calling try_to_release_page() on dirty pages? > > > > Is this just an oversight or is there some problem that this is > > > > trying to work around? It seems trivial to fix to me (add a > > > > !PageDirty check), but I don't know why the check is there in the > > > > first place... > > > > > > It seems to be latter. > > > Below commit seems to be related. > > > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] > > > > Okay, that's been there a long, long time (2007), and it covers a > > case where the filesystem cleans pages without the VM knowing about > > it (i.e. it marks bufferheads clean without clearing the PageDirty > > state). > > > > That does not explain the code in shrink_active_list(). > > Yeb, My point was the patch removed the PageDirty check in > try_to_free_buffers. *nod* [...] > And I found a culprit. > e182d61263b7d5, [PATCH] buffer_head takedown for bighighmem machines Heh. You have the combined historic tree sitting around for code archeology, just like I do :) > It introduced pagevec_strip wich calls try_to_release_page without > PageDirty check in refill_inactive_zone which is shrink_active_list > now. <sigh> It was merged 2 days before XFS was merged. Merging XFS made the code Andrew wrote incorrect: > Quote from > " > In refill_inactive(): if the number of buffer_heads is excessive then > strip buffers from pages as they move onto the inactive list. This > change is useful for all filesystems. [....] Except for those that carry state necessary for writeback to be done correctly on the dirty page bufferheads. At the time, nobody doing work the mm/writeback code cared about delayed allocation. So we've carried this behaviour for 14 years without realising that it's probably the source of all the unexplainable warnings we've got from XFS over all that time. I'm half tempted at this point to mostly ignore this mm/ behavour because we are moving down the path of removing buffer heads from XFS. That will require us to do different things in ->releasepage and so just skipping dirty pages in the XFS code is the best thing to do.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 6:07 ` Dave Chinner @ 2016-05-31 6:11 ` Stefan Priebe - Profihost AG 2016-05-31 7:31 ` Dave Chinner 0 siblings, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-31 6:11 UTC (permalink / raw) To: Dave Chinner, Minchan Kim Cc: linux-mm, Brian Foster, linux-kernel, xfs@oss.sgi.com Hi Dave, Am 31.05.2016 um 08:07 schrieb Dave Chinner: > On Tue, May 31, 2016 at 12:59:04PM +0900, Minchan Kim wrote: >> On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote: >>> On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote: >>>> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: >>>>> But this is a dirty page, which means it may have delalloc or >>>>> unwritten state on it's buffers, both of which indicate that there >>>>> is dirty data in teh page that hasn't been written. XFS issues a >>>>> warning on this because neither shrink_active_list nor >>>>> try_to_release_page() check for whether the page is dirty or not. >>>>> >>>>> Hence it seems to me that shrink_active_list() is calling >>>>> try_to_release_page() inappropriately, and XFS is just the >>>>> messenger. If you turn laptop mode on, it is likely the problem will >>>>> go away as kswapd will run with .may_writepage = false, but that >>>>> will also cause other behavioural changes relating to writeback and >>>>> memory reclaim. It might be worth trying as a workaround for now. >>>>> >>>>> MM-folk - is this analysis correct? If so, why is >>>>> shrink_active_list() calling try_to_release_page() on dirty pages? >>>>> Is this just an oversight or is there some problem that this is >>>>> trying to work around? It seems trivial to fix to me (add a >>>>> !PageDirty check), but I don't know why the check is there in the >>>>> first place... >>>> >>>> It seems to be latter. >>>> Below commit seems to be related. >>>> [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] >>> >>> Okay, that's been there a long, long time (2007), and it covers a >>> case where the filesystem cleans pages without the VM knowing about >>> it (i.e. it marks bufferheads clean without clearing the PageDirty >>> state). >>> >>> That does not explain the code in shrink_active_list(). >> >> Yeb, My point was the patch removed the PageDirty check in >> try_to_free_buffers. > > *nod* > > [...] > >> And I found a culprit. >> e182d61263b7d5, [PATCH] buffer_head takedown for bighighmem machines > > Heh. You have the combined historic tree sitting around for code > archeology, just like I do :) > >> It introduced pagevec_strip wich calls try_to_release_page without >> PageDirty check in refill_inactive_zone which is shrink_active_list >> now. > > <sigh> > > It was merged 2 days before XFS was merged. Merging XFS made the > code Andrew wrote incorrect: > >> Quote from >> " >> In refill_inactive(): if the number of buffer_heads is excessive then >> strip buffers from pages as they move onto the inactive list. This >> change is useful for all filesystems. [....] > > Except for those that carry state necessary for writeback to be done > correctly on the dirty page bufferheads. At the time, nobody doing > work the mm/writeback code cared about delayed allocation. So we've > carried this behaviour for 14 years without realising that it's > probably the source of all the unexplainable warnings we've got from > XFS over all that time. > > I'm half tempted at this point to mostly ignore this mm/ behavour > because we are moving down the path of removing buffer heads from > XFS. That will require us to do different things in ->releasepage > and so just skipping dirty pages in the XFS code is the best thing > to do.... does this change anything i should test? Or is 4.6 still the way to go? Greets, Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 6:11 ` Stefan Priebe - Profihost AG @ 2016-05-31 7:31 ` Dave Chinner 2016-05-31 8:03 ` Stefan Priebe - Profihost AG 2016-06-02 12:13 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 49+ messages in thread From: Dave Chinner @ 2016-05-31 7:31 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, Minchan Kim, Brian Foster, linux-kernel, xfs@oss.sgi.com On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote: > > I'm half tempted at this point to mostly ignore this mm/ behavour > > because we are moving down the path of removing buffer heads from > > XFS. That will require us to do different things in ->releasepage > > and so just skipping dirty pages in the XFS code is the best thing > > to do.... > > does this change anything i should test? Or is 4.6 still the way to go? Doesn't matter now - the warning will still be there on 4.6. I think you can simply ignore it as the XFS code appears to be handling the dirty page that is being passed to it correctly. We'll work out what needs to be done to get rid of the warning for this case, wether it be a mm/ change or an XFS change. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 7:31 ` Dave Chinner @ 2016-05-31 8:03 ` Stefan Priebe - Profihost AG 2016-06-02 12:13 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-05-31 8:03 UTC (permalink / raw) To: Dave Chinner Cc: linux-mm, Minchan Kim, Brian Foster, linux-kernel, xfs@oss.sgi.com Am 31.05.2016 um 09:31 schrieb Dave Chinner: > On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote: >>> I'm half tempted at this point to mostly ignore this mm/ behavour >>> because we are moving down the path of removing buffer heads from >>> XFS. That will require us to do different things in ->releasepage >>> and so just skipping dirty pages in the XFS code is the best thing >>> to do.... >> >> does this change anything i should test? Or is 4.6 still the way to go? > > Doesn't matter now - the warning will still be there on 4.6. I think > you can simply ignore it as the XFS code appears to be handling the > dirty page that is being passed to it correctly. We'll work out what > needs to be done to get rid of the warning for this case, wether it > be a mm/ change or an XFS change. So is it OK to remove the WARN_ONCE in kernel code? So i don't get alarms from our monitoring systems for the trace. Stefan > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 7:31 ` Dave Chinner 2016-05-31 8:03 ` Stefan Priebe - Profihost AG @ 2016-06-02 12:13 ` Stefan Priebe - Profihost AG 2016-06-02 12:44 ` Holger Hoffstätte 1 sibling, 1 reply; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-06-02 12:13 UTC (permalink / raw) To: Dave Chinner Cc: linux-mm, Minchan Kim, Brian Foster, linux-kernel, xfs@oss.sgi.com Am 31.05.2016 um 09:31 schrieb Dave Chinner: > On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote: >>> I'm half tempted at this point to mostly ignore this mm/ behavour >>> because we are moving down the path of removing buffer heads from >>> XFS. That will require us to do different things in ->releasepage >>> and so just skipping dirty pages in the XFS code is the best thing >>> to do.... >> >> does this change anything i should test? Or is 4.6 still the way to go? > > Doesn't matter now - the warning will still be there on 4.6. I think > you can simply ignore it as the XFS code appears to be handling the > dirty page that is being passed to it correctly. We'll work out what > needs to be done to get rid of the warning for this case, wether it > be a mm/ change or an XFS change. Any idea what i could do with 4.4.X? Can i safely remove the WARN_ONCE statement? Stefan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-06-02 12:13 ` Stefan Priebe - Profihost AG @ 2016-06-02 12:44 ` Holger Hoffstätte 2016-06-02 23:08 ` Dave Chinner 0 siblings, 1 reply; 49+ messages in thread From: Holger Hoffstätte @ 2016-06-02 12:44 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Dave Chinner Cc: linux-mm, Brian Foster, xfs@oss.sgi.com, linux-kernel, Minchan Kim On 06/02/16 14:13, Stefan Priebe - Profihost AG wrote: > > Am 31.05.2016 um 09:31 schrieb Dave Chinner: >> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote: >>>> I'm half tempted at this point to mostly ignore this mm/ behavour >>>> because we are moving down the path of removing buffer heads from >>>> XFS. That will require us to do different things in ->releasepage >>>> and so just skipping dirty pages in the XFS code is the best thing >>>> to do.... >>> >>> does this change anything i should test? Or is 4.6 still the way to go? >> >> Doesn't matter now - the warning will still be there on 4.6. I think >> you can simply ignore it as the XFS code appears to be handling the >> dirty page that is being passed to it correctly. We'll work out what >> needs to be done to get rid of the warning for this case, wether it >> be a mm/ change or an XFS change. > > Any idea what i could do with 4.4.X? Can i safely remove the WARN_ONCE > statement? By definition it won't break anything since it's just a heads-up message, so yes, it should be "safe". However if my understanding of the situation is correct, mainline commit f0281a00fe "mm: workingset: only do workingset activations on reads" (+ friends) in 4.7 should effectively prevent this from happenning. Can someone confirm or deny this? -h PS: Stefan: I backported that commit (and friends) to my 4.4.x patch queue, so if you want to try that for today's 4.4.12 the warning should be gone. No guarantees though :) _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-06-02 12:44 ` Holger Hoffstätte @ 2016-06-02 23:08 ` Dave Chinner 0 siblings, 0 replies; 49+ messages in thread From: Dave Chinner @ 2016-06-02 23:08 UTC (permalink / raw) To: Holger Hoffstätte Cc: Minchan Kim, Brian Foster, Stefan Priebe - Profihost AG, linux-kernel, xfs@oss.sgi.com, linux-mm On Thu, Jun 02, 2016 at 02:44:30PM +0200, Holger Hoffstätte wrote: > On 06/02/16 14:13, Stefan Priebe - Profihost AG wrote: > > > > Am 31.05.2016 um 09:31 schrieb Dave Chinner: > >> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG wrote: > >>>> I'm half tempted at this point to mostly ignore this mm/ behavour > >>>> because we are moving down the path of removing buffer heads from > >>>> XFS. That will require us to do different things in ->releasepage > >>>> and so just skipping dirty pages in the XFS code is the best thing > >>>> to do.... > >>> > >>> does this change anything i should test? Or is 4.6 still the way to go? > >> > >> Doesn't matter now - the warning will still be there on 4.6. I think > >> you can simply ignore it as the XFS code appears to be handling the > >> dirty page that is being passed to it correctly. We'll work out what > >> needs to be done to get rid of the warning for this case, wether it > >> be a mm/ change or an XFS change. > > > > Any idea what i could do with 4.4.X? Can i safely remove the WARN_ONCE > > statement? > > By definition it won't break anything since it's just a heads-up message, > so yes, it should be "safe". However if my understanding of the situation > is correct, mainline commit f0281a00fe "mm: workingset: only do workingset > activations on reads" (+ friends) in 4.7 should effectively prevent this > from happenning. Can someone confirm or deny this? I don't think it will. The above commits will avoid putting /write-only/ dirty pages on the active list from the write() syscall vector, but it won't prevent pages that are read first then dirtied from ending up on the active list. e.g. a mmap write will first read the page from disk to populate the page (hence it ends up on the active list), then the page gets dirtied and ->page_mkwrite is called to tell the filesystem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 1:07 ` Minchan Kim 2016-05-31 2:55 ` Dave Chinner @ 2016-05-31 9:50 ` Jan Kara 2016-06-01 1:38 ` Minchan Kim 2016-08-17 15:37 ` Andreas Grünbacher 1 sibling, 2 replies; 49+ messages in thread From: Jan Kara @ 2016-05-31 9:50 UTC (permalink / raw) To: Minchan Kim Cc: Stefan Priebe - Profihost AG, Brian Foster, linux-kernel, xfs@oss.sgi.com, linux-mm On Tue 31-05-16 10:07:24, Minchan Kim wrote: > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > [adding lkml and linux-mm to the cc list] > > > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: > > > Hi Dave, > > > Hi Brian, > > > > > > below are the results with a vanilla 4.4.11 kernel. > > > > Thanks for persisting with the testing, Stefan. > > > > .... > > > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a > > > fresh reboot it has happened again on the root FS for a debian apt file: > > > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 > > > ------------[ cut here ]------------ > > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 > > > xfs_vm_releasepage+0x10f/0x140() > > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport > > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse > > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan > > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod > > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas > > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 > > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 > > > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 > > > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 > > > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 > > > Call Trace: > > > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 > > > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 > > > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 > > > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 > > > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 > > > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 > > > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 > > > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 > > > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 > > > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 > > > [<ffffffffa2168539>] kswapd+0x4f9/0x970 > > > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > ---[ end trace c9d679f8ed4d7610 ]--- > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size > > > 0x12b990 > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size > > ..... > > > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6 > > code (so please try to reproduce on that kernel!) but it looks to me > > like the only way we can get from shrink_active_list() direct to > > try_to_release_page() is if we are over the maximum bufferhead > > threshold (i.e buffer_heads_over_limit = true) and we are trying to > > reclaim pages direct from the active list. > > > > Because we are called from kswapd()->balance_pgdat(), we have: > > > > struct scan_control sc = { > > .gfp_mask = GFP_KERNEL, > > .order = order, > > .priority = DEF_PRIORITY, > > .may_writepage = !laptop_mode, > > .may_unmap = 1, > > .may_swap = 1, > > }; > > > > The key point here is reclaim is being run with .may_writepage = > > true for default configuration kernels. when we get to > > shrink_active_list(): > > > > if (!sc->may_writepage) > > isolate_mode |= ISOLATE_CLEAN; > > > > But sc->may_writepage = true and this allows isolate_lru_pages() to > > isolate dirty pages from the active list. Normally this isn't a > > problem, because the isolated active list pages are rotated to the > > inactive list, and nothing else happens to them. *Except when > > buffer_heads_over_limit = true*. This special condition would > > explain why I have never seen apt/dpkg cause this problem on any of > > my (many) Debian systems that all use XFS.... > > > > In that case, shrink_active_list() runs: > > > > if (unlikely(buffer_heads_over_limit)) { > > if (page_has_private(page) && trylock_page(page)) { > > if (page_has_private(page)) > > try_to_release_page(page, 0); > > unlock_page(page); > > } > > } > > > > i.e. it locks the page, and if it has buffer heads it trys to get > > the bufferheads freed from the page. > > > > But this is a dirty page, which means it may have delalloc or > > unwritten state on it's buffers, both of which indicate that there > > is dirty data in teh page that hasn't been written. XFS issues a > > warning on this because neither shrink_active_list nor > > try_to_release_page() check for whether the page is dirty or not. > > > > Hence it seems to me that shrink_active_list() is calling > > try_to_release_page() inappropriately, and XFS is just the > > messenger. If you turn laptop mode on, it is likely the problem will > > go away as kswapd will run with .may_writepage = false, but that > > will also cause other behavioural changes relating to writeback and > > memory reclaim. It might be worth trying as a workaround for now. > > > > MM-folk - is this analysis correct? If so, why is > > shrink_active_list() calling try_to_release_page() on dirty pages? > > Is this just an oversight or is there some problem that this is > > trying to work around? It seems trivial to fix to me (add a > > !PageDirty check), but I don't know why the check is there in the > > first place... > > It seems to be latter. > Below commit seems to be related. > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] > > At that time, even shrink_page_list works like this. > > shrink_page_list > while (!list_empty(page_list)) { > .. > .. > if (PageDirty(page)) { > .. > } > > /* > * If the page has buffers, try to free the buffer mappings > * associated with this page. If we succeed we try to free > * the page as well. > * > * We do this even if the page is PageDirty(). > * try_to_release_page() does not perform I/O, but it is > * possible for a page to have PageDirty set, but it is actually > * clean (all its buffers are clean). This happens if the > * buffers were written out directly, with submit_bh(). ext3 > * will do this, as well as the blockdev mapping. > * try_to_release_page() will discover that cleanness and will > * drop the buffers and mark the page clean - it can be freed. > * .. > */ > if (PagePrivate(page)) { > if (!try_to_release_page(page, sc->gfp_mask)) > goto activate_locked; > if (!mapping && page_count(page) == 1) > goto free_it; > } > .. > } > > I wonder whether it's valid or not with on ext4. Actually, we've already discussed this about an year ago: http://oss.sgi.com/archives/xfs/2015-06/msg00119.html And it was the last drop that made me remove ext3 from the tree. ext4 can also clean dirty buffers while keeping pages dirty but it is limited only to metadata (and data in data=journal mode) so the scope of the problem is much smaller. So just avoiding calling ->releasepage for dirty pages may work fine these days. Also it is possible to change ext4 checkpointing code to completely avoid doing this but I never got to rewriting that code. Probably I should give it higher priority on my todo list... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 9:50 ` Jan Kara @ 2016-06-01 1:38 ` Minchan Kim 2016-08-17 15:37 ` Andreas Grünbacher 1 sibling, 0 replies; 49+ messages in thread From: Minchan Kim @ 2016-06-01 1:38 UTC (permalink / raw) To: Jan Kara Cc: Stefan Priebe - Profihost AG, Brian Foster, linux-kernel, xfs@oss.sgi.com, linux-mm On Tue, May 31, 2016 at 11:50:31AM +0200, Jan Kara wrote: > On Tue 31-05-16 10:07:24, Minchan Kim wrote: > > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > > [adding lkml and linux-mm to the cc list] > > > > > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: > > > > Hi Dave, > > > > Hi Brian, > > > > > > > > below are the results with a vanilla 4.4.11 kernel. > > > > > > Thanks for persisting with the testing, Stefan. > > > > > > .... > > > > > > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a > > > > fresh reboot it has happened again on the root FS for a debian apt file: > > > > > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 > > > > xfs_vm_releasepage+0x10f/0x140() > > > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport > > > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse > > > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan > > > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor > > > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod > > > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci > > > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas > > > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 > > > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 > > > > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 > > > > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 > > > > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 > > > > Call Trace: > > > > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 > > > > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 > > > > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 > > > > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 > > > > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 > > > > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 > > > > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 > > > > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 > > > > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 > > > > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 > > > > [<ffffffffa2168539>] kswapd+0x4f9/0x970 > > > > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 > > > > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 > > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 > > > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 > > > > ---[ end trace c9d679f8ed4d7610 ]--- > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size > > > > 0x12b990 > > > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size > > > ..... > > > > > > Ok, I suspect this may be a VM bug. I've been looking at the 4.6 > > > code (so please try to reproduce on that kernel!) but it looks to me > > > like the only way we can get from shrink_active_list() direct to > > > try_to_release_page() is if we are over the maximum bufferhead > > > threshold (i.e buffer_heads_over_limit = true) and we are trying to > > > reclaim pages direct from the active list. > > > > > > Because we are called from kswapd()->balance_pgdat(), we have: > > > > > > struct scan_control sc = { > > > .gfp_mask = GFP_KERNEL, > > > .order = order, > > > .priority = DEF_PRIORITY, > > > .may_writepage = !laptop_mode, > > > .may_unmap = 1, > > > .may_swap = 1, > > > }; > > > > > > The key point here is reclaim is being run with .may_writepage = > > > true for default configuration kernels. when we get to > > > shrink_active_list(): > > > > > > if (!sc->may_writepage) > > > isolate_mode |= ISOLATE_CLEAN; > > > > > > But sc->may_writepage = true and this allows isolate_lru_pages() to > > > isolate dirty pages from the active list. Normally this isn't a > > > problem, because the isolated active list pages are rotated to the > > > inactive list, and nothing else happens to them. *Except when > > > buffer_heads_over_limit = true*. This special condition would > > > explain why I have never seen apt/dpkg cause this problem on any of > > > my (many) Debian systems that all use XFS.... > > > > > > In that case, shrink_active_list() runs: > > > > > > if (unlikely(buffer_heads_over_limit)) { > > > if (page_has_private(page) && trylock_page(page)) { > > > if (page_has_private(page)) > > > try_to_release_page(page, 0); > > > unlock_page(page); > > > } > > > } > > > > > > i.e. it locks the page, and if it has buffer heads it trys to get > > > the bufferheads freed from the page. > > > > > > But this is a dirty page, which means it may have delalloc or > > > unwritten state on it's buffers, both of which indicate that there > > > is dirty data in teh page that hasn't been written. XFS issues a > > > warning on this because neither shrink_active_list nor > > > try_to_release_page() check for whether the page is dirty or not. > > > > > > Hence it seems to me that shrink_active_list() is calling > > > try_to_release_page() inappropriately, and XFS is just the > > > messenger. If you turn laptop mode on, it is likely the problem will > > > go away as kswapd will run with .may_writepage = false, but that > > > will also cause other behavioural changes relating to writeback and > > > memory reclaim. It might be worth trying as a workaround for now. > > > > > > MM-folk - is this analysis correct? If so, why is > > > shrink_active_list() calling try_to_release_page() on dirty pages? > > > Is this just an oversight or is there some problem that this is > > > trying to work around? It seems trivial to fix to me (add a > > > !PageDirty check), but I don't know why the check is there in the > > > first place... > > > > It seems to be latter. > > Below commit seems to be related. > > [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] > > > > At that time, even shrink_page_list works like this. > > > > shrink_page_list > > while (!list_empty(page_list)) { > > .. > > .. > > if (PageDirty(page)) { > > .. > > } > > > > /* > > * If the page has buffers, try to free the buffer mappings > > * associated with this page. If we succeed we try to free > > * the page as well. > > * > > * We do this even if the page is PageDirty(). > > * try_to_release_page() does not perform I/O, but it is > > * possible for a page to have PageDirty set, but it is actually > > * clean (all its buffers are clean). This happens if the > > * buffers were written out directly, with submit_bh(). ext3 > > * will do this, as well as the blockdev mapping. > > * try_to_release_page() will discover that cleanness and will > > * drop the buffers and mark the page clean - it can be freed. > > * .. > > */ > > if (PagePrivate(page)) { > > if (!try_to_release_page(page, sc->gfp_mask)) > > goto activate_locked; > > if (!mapping && page_count(page) == 1) > > goto free_it; > > } > > .. > > } > > > > I wonder whether it's valid or not with on ext4. > > Actually, we've already discussed this about an year ago: > http://oss.sgi.com/archives/xfs/2015-06/msg00119.html > > And it was the last drop that made me remove ext3 from the tree. ext4 can > also clean dirty buffers while keeping pages dirty but it is limited only > to metadata (and data in data=journal mode) so the scope of the problem is > much smaller. So just avoiding calling ->releasepage for dirty pages may > work fine these days. > > Also it is possible to change ext4 checkpointing code to completely avoid > doing this but I never got to rewriting that code. Probably I should give > it higher priority on my todo list... Hah, you already noticed. Thanks for the information. At a first glance, it seems to fix it in /mm with checking PageDirty but it might be risky for other out-of-tree FSes without full understanding of internal and block_invalidatepage users can make such clean buffers but dirty page although there is no one in mainline now so I will leave the fix to FS guys. Thanks. > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) 2016-05-31 9:50 ` Jan Kara 2016-06-01 1:38 ` Minchan Kim @ 2016-08-17 15:37 ` Andreas Grünbacher 1 sibling, 0 replies; 49+ messages in thread From: Andreas Grünbacher @ 2016-08-17 15:37 UTC (permalink / raw) To: Jan Kara Cc: Minchan Kim, Brian Foster, Stefan Priebe - Profihost AG, Linux Kernel Mailing List, xfs@oss.sgi.com, linux-mm, Lukas Czerner, Steven Whitehouse Hi Jan, 2016-05-31 11:50 GMT+02:00 Jan Kara <jack@suse.cz>: > On Tue 31-05-16 10:07:24, Minchan Kim wrote: >> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: >> > [adding lkml and linux-mm to the cc list] >> > >> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG wrote: >> > > Hi Dave, >> > > Hi Brian, >> > > >> > > below are the results with a vanilla 4.4.11 kernel. >> > >> > Thanks for persisting with the testing, Stefan. >> > >> > .... >> > >> > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a >> > > fresh reboot it has happened again on the root FS for a debian apt file: >> > > >> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 0x12b990 >> > > ------------[ cut here ]------------ >> > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239 >> > > xfs_vm_releasepage+0x10f/0x140() >> > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport >> > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse >> > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan >> > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor >> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod >> > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci >> > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas >> > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G O 4.4.11 #1 >> > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015 >> > > 0000000000000000 ffff880c4dacfa88 ffffffffa23c5b8f 0000000000000000 >> > > ffffffffa2a51ab4 ffff880c4dacfac8 ffffffffa20837a7 ffff880c4dacfae8 >> > > 0000000000000001 ffffea00010c3640 ffff8802176b49d0 ffffea00010c3660 >> > > Call Trace: >> > > [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 >> > > [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 >> > > [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 >> > > [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 >> > > [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 >> > > [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 >> > > [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 >> > > [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 >> > > [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 >> > > [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 >> > > [<ffffffffa2168539>] kswapd+0x4f9/0x970 >> > > [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 >> > > [<ffffffffa20a0d99>] kthread+0xc9/0xe0 >> > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 >> > > [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 >> > > [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 >> > > ---[ end trace c9d679f8ed4d7610 ]--- >> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size >> > > 0x12b990 >> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size >> > ..... >> > >> > Ok, I suspect this may be a VM bug. I've been looking at the 4.6 >> > code (so please try to reproduce on that kernel!) but it looks to me >> > like the only way we can get from shrink_active_list() direct to >> > try_to_release_page() is if we are over the maximum bufferhead >> > threshold (i.e buffer_heads_over_limit = true) and we are trying to >> > reclaim pages direct from the active list. >> > >> > Because we are called from kswapd()->balance_pgdat(), we have: >> > >> > struct scan_control sc = { >> > .gfp_mask = GFP_KERNEL, >> > .order = order, >> > .priority = DEF_PRIORITY, >> > .may_writepage = !laptop_mode, >> > .may_unmap = 1, >> > .may_swap = 1, >> > }; >> > >> > The key point here is reclaim is being run with .may_writepage = >> > true for default configuration kernels. when we get to >> > shrink_active_list(): >> > >> > if (!sc->may_writepage) >> > isolate_mode |= ISOLATE_CLEAN; >> > >> > But sc->may_writepage = true and this allows isolate_lru_pages() to >> > isolate dirty pages from the active list. Normally this isn't a >> > problem, because the isolated active list pages are rotated to the >> > inactive list, and nothing else happens to them. *Except when >> > buffer_heads_over_limit = true*. This special condition would >> > explain why I have never seen apt/dpkg cause this problem on any of >> > my (many) Debian systems that all use XFS.... >> > >> > In that case, shrink_active_list() runs: >> > >> > if (unlikely(buffer_heads_over_limit)) { >> > if (page_has_private(page) && trylock_page(page)) { >> > if (page_has_private(page)) >> > try_to_release_page(page, 0); >> > unlock_page(page); >> > } >> > } >> > >> > i.e. it locks the page, and if it has buffer heads it trys to get >> > the bufferheads freed from the page. >> > >> > But this is a dirty page, which means it may have delalloc or >> > unwritten state on it's buffers, both of which indicate that there >> > is dirty data in teh page that hasn't been written. XFS issues a >> > warning on this because neither shrink_active_list nor >> > try_to_release_page() check for whether the page is dirty or not. >> > >> > Hence it seems to me that shrink_active_list() is calling >> > try_to_release_page() inappropriately, and XFS is just the >> > messenger. If you turn laptop mode on, it is likely the problem will >> > go away as kswapd will run with .may_writepage = false, but that >> > will also cause other behavioural changes relating to writeback and >> > memory reclaim. It might be worth trying as a workaround for now. >> > >> > MM-folk - is this analysis correct? If so, why is >> > shrink_active_list() calling try_to_release_page() on dirty pages? >> > Is this just an oversight or is there some problem that this is >> > trying to work around? It seems trivial to fix to me (add a >> > !PageDirty check), but I don't know why the check is there in the >> > first place... >> >> It seems to be latter. >> Below commit seems to be related. >> [ecdfc9787fe527, Resurrect 'try_to_free_buffers()' VM hackery.] >> >> At that time, even shrink_page_list works like this. >> >> shrink_page_list >> while (!list_empty(page_list)) { >> .. >> .. >> if (PageDirty(page)) { >> .. >> } >> >> /* >> * If the page has buffers, try to free the buffer mappings >> * associated with this page. If we succeed we try to free >> * the page as well. >> * >> * We do this even if the page is PageDirty(). >> * try_to_release_page() does not perform I/O, but it is >> * possible for a page to have PageDirty set, but it is actually >> * clean (all its buffers are clean). This happens if the >> * buffers were written out directly, with submit_bh(). ext3 >> * will do this, as well as the blockdev mapping. >> * try_to_release_page() will discover that cleanness and will >> * drop the buffers and mark the page clean - it can be freed. >> * .. >> */ >> if (PagePrivate(page)) { >> if (!try_to_release_page(page, sc->gfp_mask)) >> goto activate_locked; >> if (!mapping && page_count(page) == 1) >> goto free_it; >> } >> .. >> } >> >> I wonder whether it's valid or not with on ext4. > > Actually, we've already discussed this about an year ago: > http://oss.sgi.com/archives/xfs/2015-06/msg00119.html > > And it was the last drop that made me remove ext3 from the tree. ext4 can > also clean dirty buffers while keeping pages dirty but it is limited only > to metadata (and data in data=journal mode) so the scope of the problem is > much smaller. So just avoiding calling ->releasepage for dirty pages may > work fine these days. > > Also it is possible to change ext4 checkpointing code to completely avoid > doing this but I never got to rewriting that code. Probably I should give > it higher priority on my todo list... we're seeing the same (releasepage being called for dirty pages) on GFS2 as well. Right now, GFS2 warns about this case, but we'll remove that warning and wait for ext4 and releasepage to be fixed so that we can re-add the warning. Maybe this will help as an argument for fixing ext4 soon :) Thanks, Andreas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-05-22 21:38 ` Dave Chinner 2016-05-30 7:23 ` Stefan Priebe - Profihost AG @ 2016-06-03 17:56 ` Stefan Priebe - Profihost AG 2016-06-03 19:35 ` Holger Hoffstätte ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-06-03 17:56 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs-masters@oss.sgi.com, Brian Foster, xfs@oss.sgi.com Hi, should i remove the complete if conditions incl. the return 0 or should id convert it to if without WARN_ONCE? like below? if (WARN_ON_ONCE(delalloc)) return 0; if (WARN_ON_ONCE(unwritten)) return 0; => if (delalloc) return 0; if (unwritten) return 0; Am 22.05.2016 um 23:38 schrieb Dave Chinner: > On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote: >> Am 16.05.2016 um 03:06 schrieb Brian Foster: >>>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci >>>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas >>>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O >>>> 4.4.10+25-ph #1 >>> >>> How close is this to an upstream kernel? Upstream XFS? Have you tried to >>> reproduce this on an upstream kernel? >> >> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq >> patches from 4.5 and 4.6 but i can try to replace the kernel on one >> machine with a 100% vanilla one if this helps. > > Please do. > >>>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0x52000 size 0x13d1c8 >>>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0x53000 size 0x13d1c8 >>>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0x54000 size 0x13d1c8 >>> ... >>>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0xab000 size 0x13d1c8 >>>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0xac000 size 0x13d1c8 >>>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>> 0xad000 size 0x13d1c8 >>>> >>>> The file to the inode number is: >>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >>>> >>> >>> xfs_bmap -v might be interesting here as well. >> >> # xfs_bmap -v >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL >> 0: [0..2567]: 41268928..41271495 3 (374464..377031) 2568 > > So the last file offset with a block is 0x140e00. This means the > file is fully allocated. However, the pages inside the file range > are still marked delayed allocation. That implies that we've failed > to write the pages over a delayed allocation region after we've > allocated the space. > > That, in turn, tends to indicate a problem in page writeback - the > first page to be written has triggered delayed allocation of the > entire range, but then the subsequent pages have not been written > (for some as yet unknown reason). When a page is written, we map it > to the current block via xfs_map_at_offset(), and that clears both > the buffer delay and unwritten flags. > > This clearly isn't happening which means either the VFS doesn't > think the inode is dirty anymore, writeback is never asking for > these pages to be written, or XFs is screwing something up in > ->writepage. The XFS writepage code changed significantly in 4.6, so > it might be worth seeing if a 4.6 kernel reproduces this same > problem.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-06-03 17:56 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG @ 2016-06-03 19:35 ` Holger Hoffstätte 2016-06-04 0:04 ` Dave Chinner 2016-06-26 5:45 ` Stefan Priebe 2 siblings, 0 replies; 49+ messages in thread From: Holger Hoffstätte @ 2016-06-03 19:35 UTC (permalink / raw) To: xfs@oss.sgi.com On 06/03/16 19:56, Stefan Priebe - Profihost AG wrote: > Hi, > > should i remove the complete if conditions incl. the return 0 or should > id convert it to if without WARN_ONCE? like below? > > if (WARN_ON_ONCE(delalloc)) > return 0; > if (WARN_ON_ONCE(unwritten)) > return 0; > > => > > if (delalloc) > return 0; > if (unwritten) > return 0; Good thing you ask, I forgot about the returns.. Until the bigger picture has been figured out with -mm I'd probably keep the returns. -h _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-06-03 17:56 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG 2016-06-03 19:35 ` Holger Hoffstätte @ 2016-06-04 0:04 ` Dave Chinner 2016-06-26 5:45 ` Stefan Priebe 2 siblings, 0 replies; 49+ messages in thread From: Dave Chinner @ 2016-06-04 0:04 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: xfs-masters@oss.sgi.com, Brian Foster, xfs@oss.sgi.com On Fri, Jun 03, 2016 at 07:56:08PM +0200, Stefan Priebe - Profihost AG wrote: > Hi, > > should i remove the complete if conditions incl. the return 0 or should > id convert it to if without WARN_ONCE? like below? > > if (WARN_ON_ONCE(delalloc)) > return 0; > if (WARN_ON_ONCE(unwritten)) > return 0; > > => > > if (delalloc) > return 0; > if (unwritten) > return 0; Yes, you need to keep the checks and returns. That's what I meant when I said that "XFS handles the dirty page case correctly in this case". If the page is dirty, we should not be attempting to release the buffers, and that is what the code does. It's just noisy about it... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage 2016-06-03 17:56 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG 2016-06-03 19:35 ` Holger Hoffstätte 2016-06-04 0:04 ` Dave Chinner @ 2016-06-26 5:45 ` Stefan Priebe 2 siblings, 0 replies; 49+ messages in thread From: Stefan Priebe @ 2016-06-26 5:45 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs-masters@oss.sgi.com, Brian Foster, xfs@oss.sgi.com Hi Dave, today i got this XFS trace while running 4.4.13. I'm not sure if it is related. [282732.262739] ------------[ cut here ]------------ [282732.264093] kernel BUG at fs/xfs/xfs_aops.c:1054! [282732.265459] invalid opcode: 0000 [#1] SMP [282732.266753] Modules linked in: netconsole xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse sb_edac edac_core xhci_pci i40e(O) xhci_hcd i2c_i801 vxlan shpchp ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci ehci_hcd usbcore usb_common ahci libahci igb i2c_algo_bit mpt3sas i2c_core raid_class ptp pps_core scsi_transport_sas [282732.280494] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O 4.4.13+36-ph #1 [282732.282707] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 2.0 12/17/2015 [282732.284873] task: ffff880c4d9ba500 ti: ffff880c4da28000 task.ti: ffff880c4da28000 [282732.287038] RIP: 0010:[<ffffffff943267f1>] [<ffffffff943267f1>] xfs_vm_writepage+0x561/0x5c0 [282732.289554] RSP: 0018:ffff880c4da2b8e8 EFLAGS: 00010246 [282732.291095] RAX: 001fffff80020009 RBX: ffffea000186de80 RCX: 000000000000000c [282732.293161] RDX: 0000000000001800 RSI: ffff880c4da2b9b8 RDI: ffffea000186de80 [282732.295255] RBP: ffff880c4da2b9a8 R08: 0000000000000003 R09: 7fffffffffffffff [282732.297340] R10: ffff880c7ffdc6c0 R11: 0000000000000000 R12: ffffea000186de80 [282732.299405] R13: ffff88001ea855d0 R14: ffff880c4da2bad8 R15: ffffea000186dea0 [282732.301472] FS: 0000000000000000(0000) GS:ffff880c7fc40000(0000) knlGS:0000000000000000 [282732.303811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [282732.305480] CR2: ffffffffff600400 CR3: 0000000014c0a000 CR4: 00000000001406e0 [282732.307545] Stack: [282732.308142] ffff8806c1e980e0 ffff880c442dc800 ffff880100000001 0000000100042000 [282732.310482] ffffea00016db240 ffff880c4da2b968 0000000001800000 ffff880c4da2b9b8 [282732.312822] 0000000000001000 0000000000297000 0000000000000000 0000000000000246 [282732.315161] Call Trace: [282732.315890] [<ffffffff9415c72e>] ? clear_page_dirty_for_io+0xee/0x1b0 [282732.317782] [<ffffffff94163974>] pageout.isra.43+0x164/0x280 [282732.319449] [<ffffffff94165f4a>] shrink_page_list+0x5ba/0x760 [282732.321143] [<ffffffff941667ce>] shrink_inactive_list+0x1ee/0x500 [282732.322934] [<ffffffff941674e1>] shrink_lruvec+0x621/0x7d0 [282732.324554] [<ffffffff9416776c>] shrink_zone+0xdc/0x2c0 [282732.326096] [<ffffffff941688b9>] kswapd+0x4f9/0x970 [282732.327541] [<ffffffff941683c0>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [282732.329561] [<ffffffff940a0dc9>] kthread+0xc9/0xe0 [282732.330979] [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100 [282732.332623] [<ffffffff946b470f>] ret_from_fork+0x3f/0x70 [282732.334191] [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100 [282732.404901] Code: f8 f5 74 a3 4c 89 e7 89 85 50 ff ff ff e8 38 e8 ff ff f0 41 80 24 24 f7 4c 89 e7 e8 3a bf e2 ff 8b 85 50 ff ff ff e9 39 fd ff ff <0f> 0b 80 3d 04 fc 9c 00 00 0f 85 6d ff ff ff be d6 03 00 00 48 [282732.556398] RIP [<ffffffff943267f1>] xfs_vm_writepage+0x561/0x5c0 [282732.630207] RSP <ffff880c4da2b8e8> [282732.703062] ---[ end trace 9ea1afce9e126cdc ]--- [282732.842462] ------------[ cut here ]------------ [282732.914729] WARNING: CPU: 2 PID: 108 at kernel/exit.c:661 do_exit+0x50/0xab0() [282732.989039] Modules linked in: netconsole xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse sb_edac edac_core xhci_pci i40e(O) xhci_hcd i2c_i801 vxlan shpchp ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod ehci_pci ehci_hcd usbcore usb_common ahci libahci igb i2c_algo_bit mpt3sas i2c_core raid_class ptp pps_core scsi_transport_sas [282733.306619] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G D O 4.4.13+36-ph #1 [282733.386805] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 2.0 12/17/2015 [282733.467585] 0000000000000000 ffff880c4da2b5d8 ffffffff943c60ff 0000000000000000 [282733.547861] ffffffff94a330a8 ffff880c4da2b618 ffffffff940837a7 ffff880c4da2b838 [282733.626144] 000000000000000b ffff880c4da2b838 0000000000000246 ffff880c4d9ba500 [282733.702723] Call Trace: [282733.776298] [<ffffffff943c60ff>] dump_stack+0x63/0x84 [282733.849877] [<ffffffff940837a7>] warn_slowpath_common+0x97/0xe0 [282733.922917] [<ffffffff9408380a>] warn_slowpath_null+0x1a/0x20 [282733.994640] [<ffffffff94085a90>] do_exit+0x50/0xab0 [282734.064858] [<ffffffff94008a02>] oops_end+0xa2/0xe0 [282734.133974] [<ffffffff94008b88>] die+0x58/0x80 [282734.202270] [<ffffffff94005ba9>] do_trap+0x69/0x150 [282734.269913] [<ffffffff940a1bc2>] ? __atomic_notifier_call_chain+0x12/0x20 [282734.337974] [<ffffffff94005d5d>] do_error_trap+0xcd/0xf0 [282734.406008] [<ffffffff943267f1>] ? xfs_vm_writepage+0x561/0x5c0 [282734.474472] [<ffffffff9439c334>] ? generic_make_request+0x104/0x190 [282734.542216] [<ffffffff94006000>] do_invalid_op+0x20/0x30 [282734.609276] [<ffffffff946b5e8e>] invalid_op+0x1e/0x30 [282734.675516] [<ffffffff943267f1>] ? xfs_vm_writepage+0x561/0x5c0 [282734.741307] [<ffffffff94326528>] ? xfs_vm_writepage+0x298/0x5c0 [282734.805722] [<ffffffff9415c72e>] ? clear_page_dirty_for_io+0xee/0x1b0 [282734.870589] [<ffffffff94163974>] pageout.isra.43+0x164/0x280 [282734.934901] [<ffffffff94165f4a>] shrink_page_list+0x5ba/0x760 [282734.998565] [<ffffffff941667ce>] shrink_inactive_list+0x1ee/0x500 [282735.061845] [<ffffffff941674e1>] shrink_lruvec+0x621/0x7d0 [282735.124441] [<ffffffff9416776c>] shrink_zone+0xdc/0x2c0 [282735.186752] [<ffffffff941688b9>] kswapd+0x4f9/0x970 [282735.249021] [<ffffffff941683c0>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [282735.312215] [<ffffffff940a0dc9>] kthread+0xc9/0xe0 [282735.375420] [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100 [282735.439012] [<ffffffff946b470f>] ret_from_fork+0x3f/0x70 [282735.502368] [<ffffffff940a0d00>] ? kthread_stop+0x100/0x100 [282735.565534] ---[ end trace 9ea1afce9e126cdd ]--- Stefan Am 03.06.2016 um 19:56 schrieb Stefan Priebe - Profihost AG: > Hi, > > should i remove the complete if conditions incl. the return 0 or should > id convert it to if without WARN_ONCE? like below? > > if (WARN_ON_ONCE(delalloc)) > return 0; > if (WARN_ON_ONCE(unwritten)) > return 0; > > => > > if (delalloc) > return 0; > if (unwritten) > return 0; > > > > Am 22.05.2016 um 23:38 schrieb Dave Chinner: >> On Sun, May 22, 2016 at 09:36:39PM +0200, Stefan Priebe - Profihost AG wrote: >>> Am 16.05.2016 um 03:06 schrieb Brian Foster: >>>>> sd_mod ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci >>>>> i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas >>>>> [Sun May 15 07:00:44 2016] CPU: 2 PID: 108 Comm: kswapd0 Tainted: G O >>>>> 4.4.10+25-ph #1 >>>> >>>> How close is this to an upstream kernel? Upstream XFS? Have you tried to >>>> reproduce this on an upstream kernel? >>> >>> It's a vanilla 4.4.10 + a new adaptec driver and some sched and wq >>> patches from 4.5 and 4.6 but i can try to replace the kernel on one >>> machine with a 100% vanilla one if this helps. >> >> Please do. >> >>>>> [295086.353473] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>>> 0x52000 size 0x13d1c8 >>>>> [295086.353476] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>>> 0x53000 size 0x13d1c8 >>>>> [295086.353478] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>>> 0x54000 size 0x13d1c8 >>>> ... >>>>> [295086.567508] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>>> 0xab000 size 0x13d1c8 >>>>> [295086.567510] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>>> 0xac000 size 0x13d1c8 >>>>> [295086.567515] XFS (md127p3): ino 0x600204f delalloc 1 unwritten 0 pgoff >>>>> 0xad000 size 0x13d1c8 >>>>> >>>>> The file to the inode number is: >>>>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >>>>> >>>> >>>> xfs_bmap -v might be interesting here as well. >>> >>> # xfs_bmap -v >>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en >>> /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_i18n_Translation-en: >>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL >>> 0: [0..2567]: 41268928..41271495 3 (374464..377031) 2568 >> >> So the last file offset with a block is 0x140e00. This means the >> file is fully allocated. However, the pages inside the file range >> are still marked delayed allocation. That implies that we've failed >> to write the pages over a delayed allocation region after we've >> allocated the space. >> >> That, in turn, tends to indicate a problem in page writeback - the >> first page to be written has triggered delayed allocation of the >> entire range, but then the subsequent pages have not been written >> (for some as yet unknown reason). When a page is written, we map it >> to the current block via xfs_map_at_offset(), and that clears both >> the buffer delay and unwritten flags. >> >> This clearly isn't happening which means either the VFS doesn't >> think the inode is dirty anymore, writeback is never asking for >> these pages to be written, or XFs is screwing something up in >> ->writepage. The XFS writepage code changed significantly in 4.6, so >> it might be worth seeing if a 4.6 kernel reproduces this same >> problem.... >> >> Cheers, >> >> Dave. >> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2016-08-17 15:37 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-20 8:02 xfs trace in 4.4.2 Stefan Priebe 2016-02-20 14:45 ` Brian Foster 2016-02-20 18:02 ` Stefan Priebe - Profihost AG 2016-03-04 18:47 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe 2016-03-04 19:13 ` Brian Foster 2016-03-04 20:02 ` Stefan Priebe 2016-03-04 21:03 ` Brian Foster 2016-03-04 21:15 ` Stefan Priebe 2016-03-05 22:48 ` Dave Chinner 2016-03-05 22:58 ` Stefan Priebe 2016-03-23 13:26 ` Stefan Priebe - Profihost AG 2016-03-23 13:28 ` Stefan Priebe - Profihost AG 2016-03-23 14:07 ` Brian Foster 2016-03-24 8:10 ` Stefan Priebe - Profihost AG 2016-03-24 8:15 ` Stefan Priebe - Profihost AG 2016-03-24 11:17 ` Brian Foster 2016-03-24 12:17 ` Stefan Priebe - Profihost AG 2016-03-24 12:24 ` Brian Foster 2016-04-04 6:12 ` Stefan Priebe - Profihost AG 2016-05-11 12:26 ` Stefan Priebe - Profihost AG 2016-05-11 13:34 ` Brian Foster 2016-05-11 14:03 ` Stefan Priebe - Profihost AG 2016-05-11 15:59 ` Brian Foster 2016-05-11 19:20 ` Stefan Priebe 2016-05-15 11:03 ` Stefan Priebe 2016-05-15 11:50 ` Brian Foster 2016-05-15 12:41 ` Stefan Priebe 2016-05-16 1:06 ` Brian Foster 2016-05-22 19:36 ` Stefan Priebe - Profihost AG 2016-05-22 21:38 ` Dave Chinner 2016-05-30 7:23 ` Stefan Priebe - Profihost AG 2016-05-30 22:36 ` shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage) Dave Chinner 2016-05-31 1:07 ` Minchan Kim 2016-05-31 2:55 ` Dave Chinner 2016-05-31 3:59 ` Minchan Kim 2016-05-31 6:07 ` Dave Chinner 2016-05-31 6:11 ` Stefan Priebe - Profihost AG 2016-05-31 7:31 ` Dave Chinner 2016-05-31 8:03 ` Stefan Priebe - Profihost AG 2016-06-02 12:13 ` Stefan Priebe - Profihost AG 2016-06-02 12:44 ` Holger Hoffstätte 2016-06-02 23:08 ` Dave Chinner 2016-05-31 9:50 ` Jan Kara 2016-06-01 1:38 ` Minchan Kim 2016-08-17 15:37 ` Andreas Grünbacher 2016-06-03 17:56 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe - Profihost AG 2016-06-03 19:35 ` Holger Hoffstätte 2016-06-04 0:04 ` Dave Chinner 2016-06-26 5:45 ` Stefan Priebe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox