* Re: XFS causing stack overflow [not found] <CAAnfqPAm559m-Bv8LkHARm7iBW5Kfs7NmjTFidmg-idhcOq4sQ@mail.gmail.com> @ 2011-12-09 11:55 ` Christoph Hellwig 2011-12-09 15:56 ` Ryan C. England ` (2 more replies) [not found] ` <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org> 1 sibling, 3 replies; 14+ messages in thread From: Christoph Hellwig @ 2011-12-09 11:55 UTC (permalink / raw) To: Ryan C. England; +Cc: xfs, linux-mm On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote: > I am looking for assistance on XFS which is why I have joined this mailing > list. I'm receiving a stack overflow on our file server. The server is > running Scientific Linux 6.1 with the following kernel, > 2.6.32-131.21.1.el6.x86_64. > > This is causing random reboots which is more annoying than anything. I > found a couple of links in the archives but wasn't quite sure how to apply > this patch. I can provide whatever information necessary in order for > assistance in troubleshooting. It's really mostly an issue with the VM page reclaim and writeback code. The kernel still has the old balance dirty pages code which calls into writeback code from the stack of the write system call, which already comes from NFSD with massive amounts of stack used. Then the writeback code calls into XFS to write data out, then you get the full XFS btree code, which then ends up in kmalloc and memory reclaim. You probably have only a third of the stack actually used by XFS, the rest is from NFSD/writeback code and page reclaim. I don't think any of this is easily fixable in a 2.6.32 codebase. Current mainline 3.2-rc now has the I/O-less balance dirty pages which will basically split the stack footprint in half, but it's an invasive change to the writeback code that isn't easily backportable. > Dec 6 20:27:55 localhost kernel: ------------[ cut here ]------------ > Dec 6 20:27:55 localhost kernel: WARNING: at arch/x86/kernel/irq_64.c:47 > handle_irq+0x8f/0xa0() (Not tainted) > Dec 6 20:27:55 localhost kernel: Hardware name: X8DTH-i/6/iF/6F > Dec 6 20:27:55 localhost kernel: do_IRQ: nfsd near stack overflow > (cur:ffff880622208000,sp:ffff880622208160) > Dec 6 20:27:55 localhost kernel: Modules linked in: mpt2sas > scsi_transport_sas raid_class mptctl mptbase nfsd lockd nfs_acl auth_rpcgss > autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log ses enclosure > ixgbe mdio microcode igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt > iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache > jbd2 megaraid_sas(U) sd_mod crc_t10dif ahci dm_mod [last unloaded: > scsi_wait_scan] > Dec 6 20:27:55 localhost kernel: Pid: 2898, comm: nfsd Not tainted > 2.6.32-131.21.1.el6.x86_64 #1 > Dec 6 20:27:55 localhost kernel: Call Trace: > Dec 6 20:27:55 localhost kernel: <IRQ> [<ffffffff81067097>] ? > warn_slowpath_common+0x87/0xc0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ? > __do_softirq+0x11a/0x1d0 > Dec 6 20:27:55 localhost kernel: [<ffffffff81067186>] ? > warn_slowpath_fmt+0x46/0x50 > Dec 6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ? > call_softirq+0x1c/0x30 > Dec 6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ? > handle_irq+0x8f/0xa0 > Dec 6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ? > ret_from_intr+0x0/0x11 > Dec 6 20:27:55 localhost kernel: <EOI> [<ffffffff8115b80f>] ? > kmem_cache_free+0xbf/0x2b0 > Dec 6 20:27:55 localhost kernel: [<ffffffff811a2542>] ? > free_buffer_head+0x22/0x50 > Dec 6 20:27:55 localhost kernel: [<ffffffff811a2919>] ? > try_to_free_buffers+0x79/0xc0 > Dec 6 20:27:55 localhost kernel: [<ffffffffa0259a9c>] ? > xfs_vm_releasepage+0xbc/0x130 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff8110c6c0>] ? > try_to_release_page+0x30/0x60 > Dec 6 20:27:55 localhost kernel: [<ffffffff811262c1>] ? > shrink_page_list.clone.0+0x4f1/0x5c0 > Dec 6 20:27:55 localhost kernel: [<ffffffff81126688>] ? > shrink_inactive_list+0x2f8/0x740 > Dec 6 20:27:55 localhost kernel: [<ffffffff8111f7f6>] ? > free_pcppages_bulk+0x2b6/0x390 > Dec 6 20:27:55 localhost kernel: [<ffffffff811278df>] ? > shrink_zone+0x38f/0x520 > Dec 6 20:27:55 localhost kernel: [<ffffffff811646f8>] ? > __mem_cgroup_uncharge_common+0x198/0x270 > Dec 6 20:27:55 localhost kernel: [<ffffffff81128684>] ? > zone_reclaim+0x354/0x410 > Dec 6 20:27:55 localhost kernel: [<ffffffff811292c0>] ? > isolate_pages_global+0x0/0x380 > Dec 6 20:27:55 localhost kernel: [<ffffffff8111ebf4>] ? > get_page_from_freelist+0x694/0x820 > Dec 6 20:27:55 localhost kernel: [<ffffffff81126882>] ? > shrink_inactive_list+0x4f2/0x740 > Dec 6 20:27:55 localhost kernel: [<ffffffff8111fb01>] ? > __alloc_pages_nodemask+0x111/0x8b0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8110d17e>] ? > find_get_page+0x1e/0xa0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8110e307>] ? > find_lock_page+0x37/0x80 > Dec 6 20:27:55 localhost kernel: [<ffffffff811546da>] ? > alloc_pages_current+0xaa/0x110 > Dec 6 20:27:55 localhost kernel: [<ffffffff8110d6b7>] ? > __page_cache_alloc+0x87/0x90 > Dec 6 20:27:55 localhost kernel: [<ffffffff8110e45f>] ? > find_or_create_page+0x4f/0xb0 > Dec 6 20:27:55 localhost kernel: [<ffffffffa025b945>] ? > _xfs_buf_lookup_pages+0x145/0x360 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa025b2ab>] ? > _xfs_buf_initialize+0xcb/0x140 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa025cb57>] ? > xfs_buf_get+0x77/0x1b0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa025ccbc>] ? > xfs_buf_read+0x2c/0x100 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa0250e39>] ? > xfs_trans_read_buf+0x219/0x440 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021efde>] ? > xfs_btree_read_buf_block+0x5e/0xc0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021f6d4>] ? > xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021d64c>] ? > xfs_btree_ptr_offset+0x4c/0x90 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021fd5f>] ? > xfs_btree_lookup+0xbf/0x470 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa0209cfa>] ? > xfs_alloc_ag_vextent_near+0x98a/0xb70 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa0250afd>] ? > xfs_trans_log_buf+0x9d/0xe0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021348f>] ? > xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021a2e4>] ? > xfs_bmap_add_extent_delay_real+0xe54/0x18d0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa025737a>] ? > kmem_zone_alloc+0x9a/0xe0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa01ff009>] ? > xfs_trans_mod_dquot_byino+0x79/0xd0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021b15f>] ? > xfs_bmap_add_extent+0x3ff/0x420 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021ce7a>] ? > xfs_bmbt_init_cursor+0x4a/0x150 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa021bc94>] ? > xfs_bmapi+0xb14/0x11a0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff814dc986>] ? > down_write+0x16/0x40 > Dec 6 20:27:55 localhost kernel: [<ffffffffa023ddd5>] ? > xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff81248a9e>] ? > generic_make_request+0x21e/0x5b0 > Dec 6 20:27:55 localhost kernel: [<ffffffffa023eb19>] ? > xfs_iomap+0x389/0x440 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff8119b6ac>] ? > __mark_inode_dirty+0x6c/0x160 > Dec 6 20:27:55 localhost kernel: [<ffffffffa0257f4d>] ? > xfs_map_blocks+0x2d/0x40 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa0259588>] ? > xfs_page_state_convert+0x2f8/0x750 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff81268505>] ? > radix_tree_gang_lookup_tag_slot+0x95/0xe0 > Dec 6 20:27:55 localhost kernel: [<ffffffffa0259b96>] ? > xfs_vm_writepage+0x86/0x170 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff81120d67>] ? > __writepage+0x17/0x40 > Dec 6 20:27:55 localhost kernel: [<ffffffff811220f9>] ? > write_cache_pages+0x1c9/0x4a0 > Dec 6 20:27:55 localhost kernel: [<ffffffff81120d50>] ? > __writepage+0x0/0x40 > Dec 6 20:27:55 localhost kernel: [<ffffffffa023ab93>] ? > xfs_iflush+0x203/0x210 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa025af9f>] ? > xfs_bdwrite+0x5f/0xa0 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa024fe99>] ? > xfs_trans_unlocked_item+0x39/0x60 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff811223f4>] ? > generic_writepages+0x24/0x30 > Dec 6 20:27:55 localhost kernel: [<ffffffffa025898e>] ? > xfs_vm_writepages+0x5e/0x80 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff81122421>] ? > do_writepages+0x21/0x40 > Dec 6 20:27:55 localhost kernel: [<ffffffff8119bc8d>] ? > writeback_single_inode+0xdd/0x2c0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8119c08e>] ? > writeback_sb_inodes+0xce/0x180 > Dec 6 20:27:55 localhost kernel: [<ffffffff8119c1eb>] ? > writeback_inodes_wb+0xab/0x1b0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8112181e>] ? > balance_dirty_pages+0x21e/0x4d0 > Dec 6 20:27:55 localhost kernel: [<ffffffff811a3851>] ? > mark_buffer_dirty+0x61/0xa0 > Dec 6 20:27:55 localhost kernel: [<ffffffff81121b34>] ? > balance_dirty_pages_ratelimited_nr+0x64/0x70 > Dec 6 20:27:55 localhost kernel: [<ffffffff8110dd23>] ? > generic_file_buffered_write+0x1c3/0x2a0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8106dcb7>] ? > current_fs_time+0x27/0x30 > Dec 6 20:27:55 localhost kernel: [<ffffffffa0261e4f>] ? > xfs_write+0x76f/0xb70 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff814174b5>] ? > memcpy_toiovec+0x55/0x80 > Dec 6 20:27:55 localhost kernel: [<ffffffffa025d800>] ? > xfs_file_aio_write+0x0/0x70 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffffa025d861>] ? > xfs_file_aio_write+0x61/0x70 [xfs] > Dec 6 20:27:55 localhost kernel: [<ffffffff811723bb>] ? > do_sync_readv_writev+0xfb/0x140 > Dec 6 20:27:55 localhost kernel: [<ffffffff8118ae9d>] ? > d_obtain_alias+0x4d/0x160 > Dec 6 20:27:55 localhost kernel: [<ffffffff8108e120>] ? > autoremove_wake_function+0x0/0x40 > Dec 6 20:27:55 localhost kernel: [<ffffffff812056b6>] ? > security_task_setgroups+0x16/0x20 > Dec 6 20:27:55 localhost kernel: [<ffffffff81205356>] ? > security_file_permission+0x16/0x20 > Dec 6 20:27:55 localhost kernel: [<ffffffff8117347f>] ? > do_readv_writev+0xcf/0x1f0 > Dec 6 20:27:55 localhost kernel: [<ffffffffa047f852>] ? > nfsd_setuser_and_check_port+0x62/0xb0 [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffff811735e6>] ? > vfs_writev+0x46/0x60 > Dec 6 20:27:55 localhost kernel: [<ffffffffa04813d7>] ? > nfsd_vfs_write+0x107/0x430 [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffff8116fe22>] ? > dentry_open+0x52/0xc0 > Dec 6 20:27:55 localhost kernel: [<ffffffffa04839fe>] ? > nfsd_open+0x13e/0x210 [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffffa0483e87>] ? > nfsd_write+0xe7/0x100 [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffffa048b7df>] ? > nfsd3_proc_write+0xaf/0x140 [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffffa047c43e>] ? > nfsd_dispatch+0xfe/0x240 [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffffa03f24d4>] ? > svc_process_common+0x344/0x640 [sunrpc] > Dec 6 20:27:55 localhost kernel: [<ffffffff8105dbc0>] ? > default_wake_function+0x0/0x20 > Dec 6 20:27:55 localhost kernel: [<ffffffffa03f2b10>] ? > svc_process+0x110/0x160 [sunrpc] > Dec 6 20:27:55 localhost kernel: [<ffffffffa047cb62>] ? nfsd+0xc2/0x160 > [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffffa047caa0>] ? nfsd+0x0/0x160 > [nfsd] > Dec 6 20:27:55 localhost kernel: [<ffffffff8108ddb6>] ? kthread+0x96/0xa0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 > Dec 6 20:27:55 localhost kernel: [<ffffffff8108dd20>] ? kthread+0x0/0xa0 > Dec 6 20:27:55 localhost kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 > Dec 6 20:27:55 localhost kernel: ---[ end trace e8b62253d4084e2b ]--- > > -- > Ryan C. England > Corvid Technologies <http://www.corvidtec.com/> > office: 704-799-6944 x158 > cell: 980-521-2297 > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs ---end quoted text--- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-09 11:55 ` XFS causing stack overflow Christoph Hellwig @ 2011-12-09 15:56 ` Ryan C. England 2011-12-09 22:19 ` Dave Chinner [not found] ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard> 2 siblings, 0 replies; 14+ messages in thread From: Ryan C. England @ 2011-12-09 15:56 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs, linux-mm [-- Attachment #1: Type: text/plain, Size: 12830 bytes --] Chris, Thanks for the reply. You have been a great help. Do you know if these changes were implemented any farther back than 3.2? I wouldn't feel comfortable running a release candidate kernel in a production environment. Thanks again On Fri, Dec 9, 2011 at 6:55 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote: > > I am looking for assistance on XFS which is why I have joined this > mailing > > list. I'm receiving a stack overflow on our file server. The server is > > running Scientific Linux 6.1 with the following kernel, > > 2.6.32-131.21.1.el6.x86_64. > > > > This is causing random reboots which is more annoying than anything. I > > found a couple of links in the archives but wasn't quite sure how to > apply > > this patch. I can provide whatever information necessary in order for > > assistance in troubleshooting. > > It's really mostly an issue with the VM page reclaim and writeback > code. The kernel still has the old balance dirty pages code which calls > into writeback code from the stack of the write system call, which > already comes from NFSD with massive amounts of stack used. Then > the writeback code calls into XFS to write data out, then you get the > full XFS btree code, which then ends up in kmalloc and memory reclaim. > > You probably have only a third of the stack actually used by XFS, the > rest is from NFSD/writeback code and page reclaim. I don't think any > of this is easily fixable in a 2.6.32 codebase. Current mainline 3.2-rc > now has the I/O-less balance dirty pages which will basically split the > stack footprint in half, but it's an invasive change to the writeback > code that isn't easily backportable. > > > Dec 6 20:27:55 localhost kernel: ------------[ cut here ]------------ > > Dec 6 20:27:55 localhost kernel: WARNING: at arch/x86/kernel/irq_64.c:47 > > handle_irq+0x8f/0xa0() (Not tainted) > > Dec 6 20:27:55 localhost kernel: Hardware name: X8DTH-i/6/iF/6F > > Dec 6 20:27:55 localhost kernel: do_IRQ: nfsd near stack overflow > > (cur:ffff880622208000,sp:ffff880622208160) > > Dec 6 20:27:55 localhost kernel: Modules linked in: mpt2sas > > scsi_transport_sas raid_class mptctl mptbase nfsd lockd nfs_acl > auth_rpcgss > > autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT > > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > > ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log ses > enclosure > > ixgbe mdio microcode igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt > > iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache > > jbd2 megaraid_sas(U) sd_mod crc_t10dif ahci dm_mod [last unloaded: > > scsi_wait_scan] > > Dec 6 20:27:55 localhost kernel: Pid: 2898, comm: nfsd Not tainted > > 2.6.32-131.21.1.el6.x86_64 #1 > > Dec 6 20:27:55 localhost kernel: Call Trace: > > Dec 6 20:27:55 localhost kernel: <IRQ> [<ffffffff81067097>] ? > > warn_slowpath_common+0x87/0xc0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ? > > __do_softirq+0x11a/0x1d0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81067186>] ? > > warn_slowpath_fmt+0x46/0x50 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ? > > call_softirq+0x1c/0x30 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ? > > handle_irq+0x8f/0xa0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ? > > ret_from_intr+0x0/0x11 > > Dec 6 20:27:55 localhost kernel: <EOI> [<ffffffff8115b80f>] ? > > kmem_cache_free+0xbf/0x2b0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811a2542>] ? > > free_buffer_head+0x22/0x50 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811a2919>] ? > > try_to_free_buffers+0x79/0xc0 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0259a9c>] ? > > xfs_vm_releasepage+0xbc/0x130 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff8110c6c0>] ? > > try_to_release_page+0x30/0x60 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811262c1>] ? > > shrink_page_list.clone.0+0x4f1/0x5c0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81126688>] ? > > shrink_inactive_list+0x2f8/0x740 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8111f7f6>] ? > > free_pcppages_bulk+0x2b6/0x390 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811278df>] ? > > shrink_zone+0x38f/0x520 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811646f8>] ? > > __mem_cgroup_uncharge_common+0x198/0x270 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81128684>] ? > > zone_reclaim+0x354/0x410 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811292c0>] ? > > isolate_pages_global+0x0/0x380 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8111ebf4>] ? > > get_page_from_freelist+0x694/0x820 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81126882>] ? > > shrink_inactive_list+0x4f2/0x740 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8111fb01>] ? > > __alloc_pages_nodemask+0x111/0x8b0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8110d17e>] ? > > find_get_page+0x1e/0xa0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8110e307>] ? > > find_lock_page+0x37/0x80 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811546da>] ? > > alloc_pages_current+0xaa/0x110 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8110d6b7>] ? > > __page_cache_alloc+0x87/0x90 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8110e45f>] ? > > find_or_create_page+0x4f/0xb0 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025b945>] ? > > _xfs_buf_lookup_pages+0x145/0x360 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025b2ab>] ? > > _xfs_buf_initialize+0xcb/0x140 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025cb57>] ? > > xfs_buf_get+0x77/0x1b0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025ccbc>] ? > > xfs_buf_read+0x2c/0x100 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0250e39>] ? > > xfs_trans_read_buf+0x219/0x440 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021efde>] ? > > xfs_btree_read_buf_block+0x5e/0xc0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021f6d4>] ? > > xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021d64c>] ? > > xfs_btree_ptr_offset+0x4c/0x90 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021fd5f>] ? > > xfs_btree_lookup+0xbf/0x470 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0209cfa>] ? > > xfs_alloc_ag_vextent_near+0x98a/0xb70 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0250afd>] ? > > xfs_trans_log_buf+0x9d/0xe0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021348f>] ? > > xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021a2e4>] ? > > xfs_bmap_add_extent_delay_real+0xe54/0x18d0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025737a>] ? > > kmem_zone_alloc+0x9a/0xe0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa01ff009>] ? > > xfs_trans_mod_dquot_byino+0x79/0xd0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021b15f>] ? > > xfs_bmap_add_extent+0x3ff/0x420 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021ce7a>] ? > > xfs_bmbt_init_cursor+0x4a/0x150 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa021bc94>] ? > > xfs_bmapi+0xb14/0x11a0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff814dc986>] ? > > down_write+0x16/0x40 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa023ddd5>] ? > > xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff81248a9e>] ? > > generic_make_request+0x21e/0x5b0 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa023eb19>] ? > > xfs_iomap+0x389/0x440 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff8119b6ac>] ? > > __mark_inode_dirty+0x6c/0x160 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0257f4d>] ? > > xfs_map_blocks+0x2d/0x40 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0259588>] ? > > xfs_page_state_convert+0x2f8/0x750 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff81268505>] ? > > radix_tree_gang_lookup_tag_slot+0x95/0xe0 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0259b96>] ? > > xfs_vm_writepage+0x86/0x170 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff81120d67>] ? > > __writepage+0x17/0x40 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811220f9>] ? > > write_cache_pages+0x1c9/0x4a0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81120d50>] ? > > __writepage+0x0/0x40 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa023ab93>] ? > > xfs_iflush+0x203/0x210 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025af9f>] ? > > xfs_bdwrite+0x5f/0xa0 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa024fe99>] ? > > xfs_trans_unlocked_item+0x39/0x60 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff811223f4>] ? > > generic_writepages+0x24/0x30 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025898e>] ? > > xfs_vm_writepages+0x5e/0x80 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff81122421>] ? > > do_writepages+0x21/0x40 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8119bc8d>] ? > > writeback_single_inode+0xdd/0x2c0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8119c08e>] ? > > writeback_sb_inodes+0xce/0x180 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8119c1eb>] ? > > writeback_inodes_wb+0xab/0x1b0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8112181e>] ? > > balance_dirty_pages+0x21e/0x4d0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff811a3851>] ? > > mark_buffer_dirty+0x61/0xa0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81121b34>] ? > > balance_dirty_pages_ratelimited_nr+0x64/0x70 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8110dd23>] ? > > generic_file_buffered_write+0x1c3/0x2a0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8106dcb7>] ? > > current_fs_time+0x27/0x30 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0261e4f>] ? > > xfs_write+0x76f/0xb70 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff814174b5>] ? > > memcpy_toiovec+0x55/0x80 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025d800>] ? > > xfs_file_aio_write+0x0/0x70 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa025d861>] ? > > xfs_file_aio_write+0x61/0x70 [xfs] > > Dec 6 20:27:55 localhost kernel: [<ffffffff811723bb>] ? > > do_sync_readv_writev+0xfb/0x140 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8118ae9d>] ? > > d_obtain_alias+0x4d/0x160 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8108e120>] ? > > autoremove_wake_function+0x0/0x40 > > Dec 6 20:27:55 localhost kernel: [<ffffffff812056b6>] ? > > security_task_setgroups+0x16/0x20 > > Dec 6 20:27:55 localhost kernel: [<ffffffff81205356>] ? > > security_file_permission+0x16/0x20 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8117347f>] ? > > do_readv_writev+0xcf/0x1f0 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa047f852>] ? > > nfsd_setuser_and_check_port+0x62/0xb0 [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffff811735e6>] ? > > vfs_writev+0x46/0x60 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa04813d7>] ? > > nfsd_vfs_write+0x107/0x430 [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffff8116fe22>] ? > > dentry_open+0x52/0xc0 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa04839fe>] ? > > nfsd_open+0x13e/0x210 [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa0483e87>] ? > > nfsd_write+0xe7/0x100 [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa048b7df>] ? > > nfsd3_proc_write+0xaf/0x140 [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa047c43e>] ? > > nfsd_dispatch+0xfe/0x240 [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa03f24d4>] ? > > svc_process_common+0x344/0x640 [sunrpc] > > Dec 6 20:27:55 localhost kernel: [<ffffffff8105dbc0>] ? > > default_wake_function+0x0/0x20 > > Dec 6 20:27:55 localhost kernel: [<ffffffffa03f2b10>] ? > > svc_process+0x110/0x160 [sunrpc] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa047cb62>] ? nfsd+0xc2/0x160 > > [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffffa047caa0>] ? nfsd+0x0/0x160 > > [nfsd] > > Dec 6 20:27:55 localhost kernel: [<ffffffff8108ddb6>] ? > kthread+0x96/0xa0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8100c1ca>] ? > child_rip+0xa/0x20 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8108dd20>] ? kthread+0x0/0xa0 > > Dec 6 20:27:55 localhost kernel: [<ffffffff8100c1c0>] ? > child_rip+0x0/0x20 > > Dec 6 20:27:55 localhost kernel: ---[ end trace e8b62253d4084e2b ]--- > > > > -- > > Ryan C. England > > Corvid Technologies <http://www.corvidtec.com/> > > office: 704-799-6944 x158 > > cell: 980-521-2297 > > > _______________________________________________ > > xfs mailing list > > xfs@oss.sgi.com > > http://oss.sgi.com/mailman/listinfo/xfs > > ---end quoted text--- > -- Ryan C. England Corvid Technologies <http://www.corvidtec.com/> office: 704-799-6944 x158 cell: 980-521-2297 [-- Attachment #2: Type: text/html, Size: 15629 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-09 11:55 ` XFS causing stack overflow Christoph Hellwig 2011-12-09 15:56 ` Ryan C. England @ 2011-12-09 22:19 ` Dave Chinner [not found] ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard> 2 siblings, 0 replies; 14+ messages in thread From: Dave Chinner @ 2011-12-09 22:19 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ryan C. England, linux-mm, xfs On Fri, Dec 09, 2011 at 06:55:13AM -0500, Christoph Hellwig wrote: > On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote: > > I am looking for assistance on XFS which is why I have joined this mailing > > list. I'm receiving a stack overflow on our file server. The server is > > running Scientific Linux 6.1 with the following kernel, > > 2.6.32-131.21.1.el6.x86_64. > > > > This is causing random reboots which is more annoying than anything. I > > found a couple of links in the archives but wasn't quite sure how to apply > > this patch. I can provide whatever information necessary in order for > > assistance in troubleshooting. > > It's really mostly an issue with the VM page reclaim and writeback > code. The kernel still has the old balance dirty pages code which calls > into writeback code from the stack of the write system call, which > already comes from NFSD with massive amounts of stack used. Then > the writeback code calls into XFS to write data out, then you get the > full XFS btree code, which then ends up in kmalloc and memory reclaim. You forgot about interrupt stacking - that trace shows the system took an interrupt at the point of highest stack usage in the writeback call chain.... :/ > You probably have only a third of the stack actually used by XFS, the > rest is from NFSD/writeback code and page reclaim. I don't think any > of this is easily fixable in a 2.6.32 codebase. Current mainline 3.2-rc > now has the I/O-less balance dirty pages which will basically split the > stack footprint in half, but it's an invasive change to the writeback > code that isn't easily backportable. It also doesn't solve the problem, because we can get pretty much the same stack from the COMMIT operation starting writeback.... The backport of the patches that separate the allocation onto a separte workqueue are not straight forward because all the workqueue code is different. I'll go back and update the TOT patch to make this separation first before backporting... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>]
* Re: XFS causing stack overflow [not found] ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard> @ 2011-12-10 19:52 ` Andi Kleen 2011-12-10 22:13 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2011-12-10 19:52 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England Dave Chinner <david@fromorbit.com> writes: > > You forgot about interrupt stacking - that trace shows the system > took an interrupt at the point of highest stack usage in the > writeback call chain.... :/ The interrupts are always running on other stacks these days (even 32bit got switched over). -Andi -- ak@linux.intel.com -- Speaking for myself only -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-10 19:52 ` Andi Kleen @ 2011-12-10 22:13 ` Dave Chinner 2011-12-11 0:00 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2011-12-10 22:13 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England On Sat, Dec 10, 2011 at 11:52:51AM -0800, Andi Kleen wrote: > Dave Chinner <david@fromorbit.com> writes: > > > > You forgot about interrupt stacking - that trace shows the system > > took an interrupt at the point of highest stack usage in the > > writeback call chain.... :/ > > The interrupts are always running on other stacks these days > (even 32bit got switched over). Where does the x86-64 do the interrupt stack switch? I know the x86 32 bit interrupt handler switches to an irq/softirq context stack, but the 64 bit one doesn't appear to. Indeed, arch/x86/kernel/irq_{32,64}.c are very different, and only the 32 bit irq handler switches to another stack to process the interrupts... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-10 22:13 ` Dave Chinner @ 2011-12-11 0:00 ` Andi Kleen 2011-12-11 23:05 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2011-12-11 0:00 UTC (permalink / raw) To: Dave Chinner Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs, Ryan C. England > Where does the x86-64 do the interrupt stack switch? in entry_64.S > > I know the x86 32 bit interrupt handler switches to an irq/softirq > context stack, but the 64 bit one doesn't appear to. Indeed, > arch/x86/kernel/irq_{32,64}.c are very different, and only the 32 > bit irq handler switches to another stack to process the > interrupts... x86-64 always used interrupt stacks and has used softirq stacks for a long time. 32bit got to it much later (the only good thing left from that 4k stack "experiment") -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-11 0:00 ` Andi Kleen @ 2011-12-11 23:05 ` Dave Chinner 2011-12-12 2:31 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2011-12-11 23:05 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England On Sun, Dec 11, 2011 at 01:00:36AM +0100, Andi Kleen wrote: > > Where does the x86-64 do the interrupt stack switch? > > in entry_64.S > > > > > I know the x86 32 bit interrupt handler switches to an irq/softirq > > context stack, but the 64 bit one doesn't appear to. Indeed, > > arch/x86/kernel/irq_{32,64}.c are very different, and only the 32 > > bit irq handler switches to another stack to process the > > interrupts... > > x86-64 always used interrupt stacks and has used softirq stacks > for a long time. 32bit got to it much later (the only good > thing left from that 4k stack "experiment") Oh, it's hidden in the "SAVE_ARGS_IRQ" macro. But that happens before do_IRQ is called, so what is the do_IRQ call chain doing on this stack given that we've already supposed to have switched to the interrupt stack before do_IRQ is called? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-11 23:05 ` Dave Chinner @ 2011-12-12 2:31 ` Andi Kleen 2011-12-12 4:36 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2011-12-12 2:31 UTC (permalink / raw) To: Dave Chinner Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs, Ryan C. England > But that happens before do_IRQ is called, so what is the do_IRQ call > chain doing on this stack given that we've already supposed to have > switched to the interrupt stack before do_IRQ is called? Not sure I understand the question. The pt_regs are on the original stack (but they are quite small), all the rest is on the new stack. ISTs are not used for interrupts, only for some special exceptions. do_IRQ doesn't switch any stacks on 64bit. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-12 2:31 ` Andi Kleen @ 2011-12-12 4:36 ` Dave Chinner 2011-12-12 5:13 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2011-12-12 4:36 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England On Mon, Dec 12, 2011 at 03:31:30AM +0100, Andi Kleen wrote: > > But that happens before do_IRQ is called, so what is the do_IRQ call > > chain doing on this stack given that we've already supposed to have > > switched to the interrupt stack before do_IRQ is called? > > Not sure I understand the question. > > The pt_regs are on the original stack (but they are quite small), all the rest It's ~180 bytes, so it's not really that small. > is on the new stack. ISTs are not used for interrupts, only for > some special exceptions. IST = ??? > do_IRQ doesn't switch any stacks on 64bit. No, but it appears that it's caller does: /* 0(%rsp): ~(interrupt number) */ .macro interrupt func /* reserve pt_regs for scratch regs and rbp */ subq $ORIG_RAX-RBP, %rsp CFI_ADJUST_CFA_OFFSET ORIG_RAX-RBP SAVE_ARGS_IRQ call \func .endm and the SAVE_ARGS_IRQ macro switches to the per cpu interrupt stack. The only caller does this: common_interrupt: XCPT_FRAME addq $-0x80,(%rsp) /* Adjust vector to [-256,-1] range */ interrupt do_IRQ So, why do we get this: Dec 6 20:27:55 localhost kernel: <IRQ> [<ffffffff81067097>] ? warn_slowpath_common+0x87/0xc0 Dec 6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ? __do_softirq+0x11a/0x1d0 Dec 6 20:27:55 localhost kernel: [<ffffffff81067186>] ? warn_slowpath_fmt+0x46/0x50 Dec 6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30 Dec 6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ? handle_irq+0x8f/0xa0 Dec 6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0 Dec 6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11 Dec 6 20:27:55 localhost kernel: <EOI> [<ffffffff8115b80f>] ? kmem_cache_free+0xbf/0x2b0 at the top of the stack frame? Is the stack unwinder walking back across the interrupt stack to the previous task stack? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-12 4:36 ` Dave Chinner @ 2011-12-12 5:13 ` Andi Kleen 2011-12-12 9:00 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2011-12-12 5:13 UTC (permalink / raw) To: Dave Chinner Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs, Ryan C. England > It's ~180 bytes, so it's not really that small. Quite small compared to what real code uses. And also fixed size. > > > is on the new stack. ISTs are not used for interrupts, only for > > some special exceptions. > > IST = ??? That's a hardware mechanism on x86-64 to switch stacks (Interrupt Stack Table or somesuch) With ISTs it would have been possible to move the the pt_regs too, but the software mechanism is somewhat simpler. > at the top of the stack frame? Is the stack unwinder walking back > across the interrupt stack to the previous task stack? Yes, the unwinder knows about all the extra stacks (interrupt and exception stacks) and crosses them as needed. BTW I suppose it wouldn't be all that hard to add more stacks and switch to them too, similar to what the 32bit do_IRQ does. Perhaps XFS could just allocate its own stack per thread (or maybe only if it detects some specific configuration that is known to need much stack) It would need to be per thread if you could sleep inside them. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-12 5:13 ` Andi Kleen @ 2011-12-12 9:00 ` Dave Chinner 2011-12-12 13:43 ` Ryan C. England 0 siblings, 1 reply; 14+ messages in thread From: Dave Chinner @ 2011-12-12 9:00 UTC (permalink / raw) To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote: > > It's ~180 bytes, so it's not really that small. > > Quite small compared to what real code uses. And also fixed > size. > > > > > > is on the new stack. ISTs are not used for interrupts, only for > > > some special exceptions. > > > > IST = ??? > > That's a hardware mechanism on x86-64 to switch stacks > (Interrupt Stack Table or somesuch) > > With ISTs it would have been possible to move the the pt_regs too, > but the software mechanism is somewhat simpler. > > > at the top of the stack frame? Is the stack unwinder walking back > > across the interrupt stack to the previous task stack? > > Yes, the unwinder knows about all the extra stacks (interrupt > and exception stacks) and crosses them as needed. > > BTW I suppose it wouldn't be all that hard to add more stacks and > switch to them too, similar to what the 32bit do_IRQ does. > Perhaps XFS could just allocate its own stack per thread > (or maybe only if it detects some specific configuration that > is known to need much stack) That's possible, but rather complex, I think. > It would need to be per thread if you could sleep inside them. Yes, we'd need to sleep, do IO, possibly operate within a transaction context, etc, and a workqueue handles all these cases without having to do anything special. Splitting the stack at a logical point is probably better, such as this patch: http://oss.sgi.com/archives/xfs/2011-07/msg00443.html Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-12 9:00 ` Dave Chinner @ 2011-12-12 13:43 ` Ryan C. England 2011-12-12 22:47 ` Dave Chinner 0 siblings, 1 reply; 14+ messages in thread From: Ryan C. England @ 2011-12-12 13:43 UTC (permalink / raw) To: Dave Chinner; +Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs [-- Attachment #1: Type: text/plain, Size: 2133 bytes --] Is it possible to apply this patch to my current installation? We use this box in production and the reboots that we're experiencing are an inconvenience. Is there is a walkthrough on how to apply this patch? If not, could your provide the steps necessary to apply successfully? I would greatly appreciate it. Thank you On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner <david@fromorbit.com> wrote: > On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote: > > > It's ~180 bytes, so it's not really that small. > > > > Quite small compared to what real code uses. And also fixed > > size. > > > > > > > > > is on the new stack. ISTs are not used for interrupts, only for > > > > some special exceptions. > > > > > > IST = ??? > > > > That's a hardware mechanism on x86-64 to switch stacks > > (Interrupt Stack Table or somesuch) > > > > With ISTs it would have been possible to move the the pt_regs too, > > but the software mechanism is somewhat simpler. > > > > > at the top of the stack frame? Is the stack unwinder walking back > > > across the interrupt stack to the previous task stack? > > > > Yes, the unwinder knows about all the extra stacks (interrupt > > and exception stacks) and crosses them as needed. > > > > BTW I suppose it wouldn't be all that hard to add more stacks and > > switch to them too, similar to what the 32bit do_IRQ does. > > Perhaps XFS could just allocate its own stack per thread > > (or maybe only if it detects some specific configuration that > > is known to need much stack) > > That's possible, but rather complex, I think. > > It would need to be per thread if you could sleep inside them. > > Yes, we'd need to sleep, do IO, possibly operate within a > transaction context, etc, and a workqueue handles all these cases > without having to do anything special. Splitting the stack at a > logical point is probably better, such as this patch: > > http://oss.sgi.com/archives/xfs/2011-07/msg00443.html > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- Ryan C. England Corvid Technologies <http://www.corvidtec.com/> office: 704-799-6944 x158 cell: 980-521-2297 [-- Attachment #2: Type: text/html, Size: 3033 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: XFS causing stack overflow 2011-12-12 13:43 ` Ryan C. England @ 2011-12-12 22:47 ` Dave Chinner 0 siblings, 0 replies; 14+ messages in thread From: Dave Chinner @ 2011-12-12 22:47 UTC (permalink / raw) To: Ryan C. England; +Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs On Mon, Dec 12, 2011 at 08:43:57AM -0500, Ryan C. England wrote: > On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote: > > > BTW I suppose it wouldn't be all that hard to add more stacks and > > > switch to them too, similar to what the 32bit do_IRQ does. > > > Perhaps XFS could just allocate its own stack per thread > > > (or maybe only if it detects some specific configuration that > > > is known to need much stack) > > > > That's possible, but rather complex, I think. > > > It would need to be per thread if you could sleep inside them. > > > > Yes, we'd need to sleep, do IO, possibly operate within a > > transaction context, etc, and a workqueue handles all these cases > > without having to do anything special. Splitting the stack at a > > logical point is probably better, such as this patch: > > > > http://oss.sgi.com/archives/xfs/2011-07/msg00443.html > > Is it possible to apply this patch to my current installation? We use this > box in production and the reboots that we're experiencing are an > inconvenience. Not easily. The problem with a backport is that the workqueue infrastructure changed around 2.6.36, allowing workqueues to act like an (almost) infinite pool of worker threads and so by using a workqueue we can have effectively unlimited numbers of concurrent allocations in progress at once. The workqueue implementation in 2.6.32 only allows a single work instance per workqueue thread, and so even with per-CPU worker threads, would only allow one allocation at a time per CPU. This adds additional serialisation within a filesystem, between filesystem and potentially adds new deadlock conditions as well. So it's not exactly obvious whether it can be backported in a sane manner or not. > Is there is a walkthrough on how to apply this patch? If not, could your > provide the steps necessary to apply successfully? I would greatly > appreciate it. It would probably need redesigning and re-implementing from scratch because of the above reasons. It'd then need a lot of testing and review. As a workaround, you might be better off doing what Andi first suggested - recompiling your kernel to use 16k stacks. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org>]
* Re: XFS causing stack overflow [not found] ` <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org> @ 2011-12-09 19:53 ` Andi Kleen 0 siblings, 0 replies; 14+ messages in thread From: Andi Kleen @ 2011-12-09 19:53 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Ryan C. England, linux-mm, xfs Christoph Hellwig <hch@infradead.org> writes: > > You probably have only a third of the stack actually used by XFS, the > rest is from NFSD/writeback code and page reclaim. I don't think any > of this is easily fixable in a 2.6.32 codebase. Current mainline 3.2-rc > now has the I/O-less balance dirty pages which will basically split the > stack footprint in half, but it's an invasive change to the writeback > code that isn't easily backportable. An easy fix would be 16k stacks. Don't think they're that difficult to do, but would need a special binary. -Andi -- ak@linux.intel.com -- Speaking for myself only -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-12-12 22:47 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAAnfqPAm559m-Bv8LkHARm7iBW5Kfs7NmjTFidmg-idhcOq4sQ@mail.gmail.com>
2011-12-09 11:55 ` XFS causing stack overflow Christoph Hellwig
2011-12-09 15:56 ` Ryan C. England
2011-12-09 22:19 ` Dave Chinner
[not found] ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>
2011-12-10 19:52 ` Andi Kleen
2011-12-10 22:13 ` Dave Chinner
2011-12-11 0:00 ` Andi Kleen
2011-12-11 23:05 ` Dave Chinner
2011-12-12 2:31 ` Andi Kleen
2011-12-12 4:36 ` Dave Chinner
2011-12-12 5:13 ` Andi Kleen
2011-12-12 9:00 ` Dave Chinner
2011-12-12 13:43 ` Ryan C. England
2011-12-12 22:47 ` Dave Chinner
[not found] ` <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org>
2011-12-09 19:53 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).