Re: XFS causing stack overflow

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: XFS causing stack overflow
       [not found] <CAAnfqPAm559m-Bv8LkHARm7iBW5Kfs7NmjTFidmg-idhcOq4sQ@mail.gmail.com>
@ 2011-12-09 11:55 ` Christoph Hellwig
  2011-12-09 15:56   ` Ryan C. England
                     ` (2 more replies)
       [not found] ` <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org>
  1 sibling, 3 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-12-09 11:55 UTC (permalink / raw)
  To: Ryan C. England; +Cc: xfs, linux-mm

On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote:
> I am looking for assistance on XFS which is why I have joined this mailing
> list.  I'm receiving a stack overflow on our file server.  The server is
> running Scientific Linux 6.1 with the following kernel,
> 2.6.32-131.21.1.el6.x86_64.
> 
> This is causing random reboots which is more annoying than anything.  I
> found a couple of links in the archives but wasn't quite sure how to apply
> this patch.  I can provide whatever information necessary in order for
> assistance in troubleshooting.

It's really mostly an issue with the VM page reclaim and writeback
code.  The kernel still has the old balance dirty pages code which calls
into writeback code from the stack of the write system call, which
already comes from NFSD with massive amounts of stack used.  Then
the writeback code calls into XFS to write data out, then you get the
full XFS btree code, which then ends up in kmalloc and memory reclaim.

You probably have only a third of the stack actually used by XFS, the
rest is from NFSD/writeback code and page reclaim.  I don't think any
of this is easily fixable in a 2.6.32 codebase.  Current mainline 3.2-rc
now has the I/O-less balance dirty pages which will basically split the
stack footprint in half, but it's an invasive change to the writeback
code that isn't easily backportable.

> Dec  6 20:27:55 localhost kernel: ------------[ cut here ]------------
> Dec  6 20:27:55 localhost kernel: WARNING: at arch/x86/kernel/irq_64.c:47
> handle_irq+0x8f/0xa0() (Not tainted)
> Dec  6 20:27:55 localhost kernel: Hardware name: X8DTH-i/6/iF/6F
> Dec  6 20:27:55 localhost kernel: do_IRQ: nfsd near stack overflow
> (cur:ffff880622208000,sp:ffff880622208160)
> Dec  6 20:27:55 localhost kernel: Modules linked in: mpt2sas
> scsi_transport_sas raid_class mptctl mptbase nfsd lockd nfs_acl auth_rpcgss
> autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log ses enclosure
> ixgbe mdio microcode igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt
> iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache
> jbd2 megaraid_sas(U) sd_mod crc_t10dif ahci dm_mod [last unloaded:
> scsi_wait_scan]
> Dec  6 20:27:55 localhost kernel: Pid: 2898, comm: nfsd Not tainted
> 2.6.32-131.21.1.el6.x86_64 #1
> Dec  6 20:27:55 localhost kernel: Call Trace:
> Dec  6 20:27:55 localhost kernel: <IRQ>  [<ffffffff81067097>] ?
> warn_slowpath_common+0x87/0xc0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ?
> __do_softirq+0x11a/0x1d0
> Dec  6 20:27:55 localhost kernel: [<ffffffff81067186>] ?
> warn_slowpath_fmt+0x46/0x50
> Dec  6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ?
> call_softirq+0x1c/0x30
> Dec  6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ?
> handle_irq+0x8f/0xa0
> Dec  6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ?
> ret_from_intr+0x0/0x11
> Dec  6 20:27:55 localhost kernel: <EOI>  [<ffffffff8115b80f>] ?
> kmem_cache_free+0xbf/0x2b0
> Dec  6 20:27:55 localhost kernel: [<ffffffff811a2542>] ?
> free_buffer_head+0x22/0x50
> Dec  6 20:27:55 localhost kernel: [<ffffffff811a2919>] ?
> try_to_free_buffers+0x79/0xc0
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0259a9c>] ?
> xfs_vm_releasepage+0xbc/0x130 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff8110c6c0>] ?
> try_to_release_page+0x30/0x60
> Dec  6 20:27:55 localhost kernel: [<ffffffff811262c1>] ?
> shrink_page_list.clone.0+0x4f1/0x5c0
> Dec  6 20:27:55 localhost kernel: [<ffffffff81126688>] ?
> shrink_inactive_list+0x2f8/0x740
> Dec  6 20:27:55 localhost kernel: [<ffffffff8111f7f6>] ?
> free_pcppages_bulk+0x2b6/0x390
> Dec  6 20:27:55 localhost kernel: [<ffffffff811278df>] ?
> shrink_zone+0x38f/0x520
> Dec  6 20:27:55 localhost kernel: [<ffffffff811646f8>] ?
> __mem_cgroup_uncharge_common+0x198/0x270
> Dec  6 20:27:55 localhost kernel: [<ffffffff81128684>] ?
> zone_reclaim+0x354/0x410
> Dec  6 20:27:55 localhost kernel: [<ffffffff811292c0>] ?
> isolate_pages_global+0x0/0x380
> Dec  6 20:27:55 localhost kernel: [<ffffffff8111ebf4>] ?
> get_page_from_freelist+0x694/0x820
> Dec  6 20:27:55 localhost kernel: [<ffffffff81126882>] ?
> shrink_inactive_list+0x4f2/0x740
> Dec  6 20:27:55 localhost kernel: [<ffffffff8111fb01>] ?
> __alloc_pages_nodemask+0x111/0x8b0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8110d17e>] ?
> find_get_page+0x1e/0xa0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8110e307>] ?
> find_lock_page+0x37/0x80
> Dec  6 20:27:55 localhost kernel: [<ffffffff811546da>] ?
> alloc_pages_current+0xaa/0x110
> Dec  6 20:27:55 localhost kernel: [<ffffffff8110d6b7>] ?
> __page_cache_alloc+0x87/0x90
> Dec  6 20:27:55 localhost kernel: [<ffffffff8110e45f>] ?
> find_or_create_page+0x4f/0xb0
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025b945>] ?
> _xfs_buf_lookup_pages+0x145/0x360 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025b2ab>] ?
> _xfs_buf_initialize+0xcb/0x140 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025cb57>] ?
> xfs_buf_get+0x77/0x1b0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025ccbc>] ?
> xfs_buf_read+0x2c/0x100 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0250e39>] ?
> xfs_trans_read_buf+0x219/0x440 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021efde>] ?
> xfs_btree_read_buf_block+0x5e/0xc0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021f6d4>] ?
> xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021d64c>] ?
> xfs_btree_ptr_offset+0x4c/0x90 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021fd5f>] ?
> xfs_btree_lookup+0xbf/0x470 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0209cfa>] ?
> xfs_alloc_ag_vextent_near+0x98a/0xb70 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0250afd>] ?
> xfs_trans_log_buf+0x9d/0xe0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021348f>] ?
> xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021a2e4>] ?
> xfs_bmap_add_extent_delay_real+0xe54/0x18d0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025737a>] ?
> kmem_zone_alloc+0x9a/0xe0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa01ff009>] ?
> xfs_trans_mod_dquot_byino+0x79/0xd0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021b15f>] ?
> xfs_bmap_add_extent+0x3ff/0x420 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021ce7a>] ?
> xfs_bmbt_init_cursor+0x4a/0x150 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa021bc94>] ?
> xfs_bmapi+0xb14/0x11a0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff814dc986>] ?
> down_write+0x16/0x40
> Dec  6 20:27:55 localhost kernel: [<ffffffffa023ddd5>] ?
> xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff81248a9e>] ?
> generic_make_request+0x21e/0x5b0
> Dec  6 20:27:55 localhost kernel: [<ffffffffa023eb19>] ?
> xfs_iomap+0x389/0x440 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff8119b6ac>] ?
> __mark_inode_dirty+0x6c/0x160
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0257f4d>] ?
> xfs_map_blocks+0x2d/0x40 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0259588>] ?
> xfs_page_state_convert+0x2f8/0x750 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff81268505>] ?
> radix_tree_gang_lookup_tag_slot+0x95/0xe0
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0259b96>] ?
> xfs_vm_writepage+0x86/0x170 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff81120d67>] ?
> __writepage+0x17/0x40
> Dec  6 20:27:55 localhost kernel: [<ffffffff811220f9>] ?
> write_cache_pages+0x1c9/0x4a0
> Dec  6 20:27:55 localhost kernel: [<ffffffff81120d50>] ?
> __writepage+0x0/0x40
> Dec  6 20:27:55 localhost kernel: [<ffffffffa023ab93>] ?
> xfs_iflush+0x203/0x210 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025af9f>] ?
> xfs_bdwrite+0x5f/0xa0 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa024fe99>] ?
> xfs_trans_unlocked_item+0x39/0x60 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff811223f4>] ?
> generic_writepages+0x24/0x30
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025898e>] ?
> xfs_vm_writepages+0x5e/0x80 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff81122421>] ?
> do_writepages+0x21/0x40
> Dec  6 20:27:55 localhost kernel: [<ffffffff8119bc8d>] ?
> writeback_single_inode+0xdd/0x2c0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8119c08e>] ?
> writeback_sb_inodes+0xce/0x180
> Dec  6 20:27:55 localhost kernel: [<ffffffff8119c1eb>] ?
> writeback_inodes_wb+0xab/0x1b0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8112181e>] ?
> balance_dirty_pages+0x21e/0x4d0
> Dec  6 20:27:55 localhost kernel: [<ffffffff811a3851>] ?
> mark_buffer_dirty+0x61/0xa0
> Dec  6 20:27:55 localhost kernel: [<ffffffff81121b34>] ?
> balance_dirty_pages_ratelimited_nr+0x64/0x70
> Dec  6 20:27:55 localhost kernel: [<ffffffff8110dd23>] ?
> generic_file_buffered_write+0x1c3/0x2a0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8106dcb7>] ?
> current_fs_time+0x27/0x30
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0261e4f>] ?
> xfs_write+0x76f/0xb70 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff814174b5>] ?
> memcpy_toiovec+0x55/0x80
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025d800>] ?
> xfs_file_aio_write+0x0/0x70 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa025d861>] ?
> xfs_file_aio_write+0x61/0x70 [xfs]
> Dec  6 20:27:55 localhost kernel: [<ffffffff811723bb>] ?
> do_sync_readv_writev+0xfb/0x140
> Dec  6 20:27:55 localhost kernel: [<ffffffff8118ae9d>] ?
> d_obtain_alias+0x4d/0x160
> Dec  6 20:27:55 localhost kernel: [<ffffffff8108e120>] ?
> autoremove_wake_function+0x0/0x40
> Dec  6 20:27:55 localhost kernel: [<ffffffff812056b6>] ?
> security_task_setgroups+0x16/0x20
> Dec  6 20:27:55 localhost kernel: [<ffffffff81205356>] ?
> security_file_permission+0x16/0x20
> Dec  6 20:27:55 localhost kernel: [<ffffffff8117347f>] ?
> do_readv_writev+0xcf/0x1f0
> Dec  6 20:27:55 localhost kernel: [<ffffffffa047f852>] ?
> nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffff811735e6>] ?
> vfs_writev+0x46/0x60
> Dec  6 20:27:55 localhost kernel: [<ffffffffa04813d7>] ?
> nfsd_vfs_write+0x107/0x430 [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffff8116fe22>] ?
> dentry_open+0x52/0xc0
> Dec  6 20:27:55 localhost kernel: [<ffffffffa04839fe>] ?
> nfsd_open+0x13e/0x210 [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa0483e87>] ?
> nfsd_write+0xe7/0x100 [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa048b7df>] ?
> nfsd3_proc_write+0xaf/0x140 [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa047c43e>] ?
> nfsd_dispatch+0xfe/0x240 [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa03f24d4>] ?
> svc_process_common+0x344/0x640 [sunrpc]
> Dec  6 20:27:55 localhost kernel: [<ffffffff8105dbc0>] ?
> default_wake_function+0x0/0x20
> Dec  6 20:27:55 localhost kernel: [<ffffffffa03f2b10>] ?
> svc_process+0x110/0x160 [sunrpc]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa047cb62>] ? nfsd+0xc2/0x160
> [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffffa047caa0>] ? nfsd+0x0/0x160
> [nfsd]
> Dec  6 20:27:55 localhost kernel: [<ffffffff8108ddb6>] ? kthread+0x96/0xa0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
> Dec  6 20:27:55 localhost kernel: [<ffffffff8108dd20>] ? kthread+0x0/0xa0
> Dec  6 20:27:55 localhost kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
> Dec  6 20:27:55 localhost kernel: ---[ end trace e8b62253d4084e2b ]---
> 
> -- 
> Ryan C. England
> Corvid Technologies <http://www.corvidtec.com/>
> office: 704-799-6944 x158
> cell:    980-521-2297

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

---end quoted text---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-09 11:55 ` XFS causing stack overflow Christoph Hellwig
@ 2011-12-09 15:56   ` Ryan C. England
  2011-12-09 22:19   ` Dave Chinner
       [not found]   ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>
  2 siblings, 0 replies; 14+ messages in thread
From: Ryan C. England @ 2011-12-09 15:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, linux-mm

[-- Attachment #1: Type: text/plain, Size: 12830 bytes --]

Chris,

Thanks for the reply.  You have been a great help.

Do you know if these changes were implemented any farther back than 3.2?  I
wouldn't feel comfortable running a release candidate kernel in a
production environment.

Thanks again

On Fri, Dec 9, 2011 at 6:55 AM, Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote:
> > I am looking for assistance on XFS which is why I have joined this
> mailing
> > list.  I'm receiving a stack overflow on our file server.  The server is
> > running Scientific Linux 6.1 with the following kernel,
> > 2.6.32-131.21.1.el6.x86_64.
> >
> > This is causing random reboots which is more annoying than anything.  I
> > found a couple of links in the archives but wasn't quite sure how to
> apply
> > this patch.  I can provide whatever information necessary in order for
> > assistance in troubleshooting.
>
> It's really mostly an issue with the VM page reclaim and writeback
> code.  The kernel still has the old balance dirty pages code which calls
> into writeback code from the stack of the write system call, which
> already comes from NFSD with massive amounts of stack used.  Then
> the writeback code calls into XFS to write data out, then you get the
> full XFS btree code, which then ends up in kmalloc and memory reclaim.
>
> You probably have only a third of the stack actually used by XFS, the
> rest is from NFSD/writeback code and page reclaim.  I don't think any
> of this is easily fixable in a 2.6.32 codebase.  Current mainline 3.2-rc
> now has the I/O-less balance dirty pages which will basically split the
> stack footprint in half, but it's an invasive change to the writeback
> code that isn't easily backportable.
>
> > Dec  6 20:27:55 localhost kernel: ------------[ cut here ]------------
> > Dec  6 20:27:55 localhost kernel: WARNING: at arch/x86/kernel/irq_64.c:47
> > handle_irq+0x8f/0xa0() (Not tainted)
> > Dec  6 20:27:55 localhost kernel: Hardware name: X8DTH-i/6/iF/6F
> > Dec  6 20:27:55 localhost kernel: do_IRQ: nfsd near stack overflow
> > (cur:ffff880622208000,sp:ffff880622208160)
> > Dec  6 20:27:55 localhost kernel: Modules linked in: mpt2sas
> > scsi_transport_sas raid_class mptctl mptbase nfsd lockd nfs_acl
> auth_rpcgss
> > autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT
> > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> > ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log ses
> enclosure
> > ixgbe mdio microcode igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt
> > iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache
> > jbd2 megaraid_sas(U) sd_mod crc_t10dif ahci dm_mod [last unloaded:
> > scsi_wait_scan]
> > Dec  6 20:27:55 localhost kernel: Pid: 2898, comm: nfsd Not tainted
> > 2.6.32-131.21.1.el6.x86_64 #1
> > Dec  6 20:27:55 localhost kernel: Call Trace:
> > Dec  6 20:27:55 localhost kernel: <IRQ>  [<ffffffff81067097>] ?
> > warn_slowpath_common+0x87/0xc0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ?
> > __do_softirq+0x11a/0x1d0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81067186>] ?
> > warn_slowpath_fmt+0x46/0x50
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ?
> > call_softirq+0x1c/0x30
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ?
> > handle_irq+0x8f/0xa0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ?
> > ret_from_intr+0x0/0x11
> > Dec  6 20:27:55 localhost kernel: <EOI>  [<ffffffff8115b80f>] ?
> > kmem_cache_free+0xbf/0x2b0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811a2542>] ?
> > free_buffer_head+0x22/0x50
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811a2919>] ?
> > try_to_free_buffers+0x79/0xc0
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0259a9c>] ?
> > xfs_vm_releasepage+0xbc/0x130 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8110c6c0>] ?
> > try_to_release_page+0x30/0x60
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811262c1>] ?
> > shrink_page_list.clone.0+0x4f1/0x5c0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81126688>] ?
> > shrink_inactive_list+0x2f8/0x740
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8111f7f6>] ?
> > free_pcppages_bulk+0x2b6/0x390
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811278df>] ?
> > shrink_zone+0x38f/0x520
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811646f8>] ?
> > __mem_cgroup_uncharge_common+0x198/0x270
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81128684>] ?
> > zone_reclaim+0x354/0x410
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811292c0>] ?
> > isolate_pages_global+0x0/0x380
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8111ebf4>] ?
> > get_page_from_freelist+0x694/0x820
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81126882>] ?
> > shrink_inactive_list+0x4f2/0x740
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8111fb01>] ?
> > __alloc_pages_nodemask+0x111/0x8b0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8110d17e>] ?
> > find_get_page+0x1e/0xa0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8110e307>] ?
> > find_lock_page+0x37/0x80
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811546da>] ?
> > alloc_pages_current+0xaa/0x110
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8110d6b7>] ?
> > __page_cache_alloc+0x87/0x90
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8110e45f>] ?
> > find_or_create_page+0x4f/0xb0
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025b945>] ?
> > _xfs_buf_lookup_pages+0x145/0x360 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025b2ab>] ?
> > _xfs_buf_initialize+0xcb/0x140 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025cb57>] ?
> > xfs_buf_get+0x77/0x1b0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025ccbc>] ?
> > xfs_buf_read+0x2c/0x100 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0250e39>] ?
> > xfs_trans_read_buf+0x219/0x440 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021efde>] ?
> > xfs_btree_read_buf_block+0x5e/0xc0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021f6d4>] ?
> > xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021d64c>] ?
> > xfs_btree_ptr_offset+0x4c/0x90 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021fd5f>] ?
> > xfs_btree_lookup+0xbf/0x470 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0209cfa>] ?
> > xfs_alloc_ag_vextent_near+0x98a/0xb70 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0250afd>] ?
> > xfs_trans_log_buf+0x9d/0xe0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021348f>] ?
> > xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021a2e4>] ?
> > xfs_bmap_add_extent_delay_real+0xe54/0x18d0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025737a>] ?
> > kmem_zone_alloc+0x9a/0xe0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa01ff009>] ?
> > xfs_trans_mod_dquot_byino+0x79/0xd0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021b15f>] ?
> > xfs_bmap_add_extent+0x3ff/0x420 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021ce7a>] ?
> > xfs_bmbt_init_cursor+0x4a/0x150 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa021bc94>] ?
> > xfs_bmapi+0xb14/0x11a0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff814dc986>] ?
> > down_write+0x16/0x40
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa023ddd5>] ?
> > xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81248a9e>] ?
> > generic_make_request+0x21e/0x5b0
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa023eb19>] ?
> > xfs_iomap+0x389/0x440 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8119b6ac>] ?
> > __mark_inode_dirty+0x6c/0x160
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0257f4d>] ?
> > xfs_map_blocks+0x2d/0x40 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0259588>] ?
> > xfs_page_state_convert+0x2f8/0x750 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81268505>] ?
> > radix_tree_gang_lookup_tag_slot+0x95/0xe0
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0259b96>] ?
> > xfs_vm_writepage+0x86/0x170 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81120d67>] ?
> > __writepage+0x17/0x40
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811220f9>] ?
> > write_cache_pages+0x1c9/0x4a0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81120d50>] ?
> > __writepage+0x0/0x40
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa023ab93>] ?
> > xfs_iflush+0x203/0x210 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025af9f>] ?
> > xfs_bdwrite+0x5f/0xa0 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa024fe99>] ?
> > xfs_trans_unlocked_item+0x39/0x60 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811223f4>] ?
> > generic_writepages+0x24/0x30
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025898e>] ?
> > xfs_vm_writepages+0x5e/0x80 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81122421>] ?
> > do_writepages+0x21/0x40
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8119bc8d>] ?
> > writeback_single_inode+0xdd/0x2c0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8119c08e>] ?
> > writeback_sb_inodes+0xce/0x180
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8119c1eb>] ?
> > writeback_inodes_wb+0xab/0x1b0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8112181e>] ?
> > balance_dirty_pages+0x21e/0x4d0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811a3851>] ?
> > mark_buffer_dirty+0x61/0xa0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81121b34>] ?
> > balance_dirty_pages_ratelimited_nr+0x64/0x70
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8110dd23>] ?
> > generic_file_buffered_write+0x1c3/0x2a0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8106dcb7>] ?
> > current_fs_time+0x27/0x30
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0261e4f>] ?
> > xfs_write+0x76f/0xb70 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff814174b5>] ?
> > memcpy_toiovec+0x55/0x80
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025d800>] ?
> > xfs_file_aio_write+0x0/0x70 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa025d861>] ?
> > xfs_file_aio_write+0x61/0x70 [xfs]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811723bb>] ?
> > do_sync_readv_writev+0xfb/0x140
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8118ae9d>] ?
> > d_obtain_alias+0x4d/0x160
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8108e120>] ?
> > autoremove_wake_function+0x0/0x40
> > Dec  6 20:27:55 localhost kernel: [<ffffffff812056b6>] ?
> > security_task_setgroups+0x16/0x20
> > Dec  6 20:27:55 localhost kernel: [<ffffffff81205356>] ?
> > security_file_permission+0x16/0x20
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8117347f>] ?
> > do_readv_writev+0xcf/0x1f0
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa047f852>] ?
> > nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff811735e6>] ?
> > vfs_writev+0x46/0x60
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa04813d7>] ?
> > nfsd_vfs_write+0x107/0x430 [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8116fe22>] ?
> > dentry_open+0x52/0xc0
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa04839fe>] ?
> > nfsd_open+0x13e/0x210 [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa0483e87>] ?
> > nfsd_write+0xe7/0x100 [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa048b7df>] ?
> > nfsd3_proc_write+0xaf/0x140 [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa047c43e>] ?
> > nfsd_dispatch+0xfe/0x240 [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa03f24d4>] ?
> > svc_process_common+0x344/0x640 [sunrpc]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8105dbc0>] ?
> > default_wake_function+0x0/0x20
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa03f2b10>] ?
> > svc_process+0x110/0x160 [sunrpc]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa047cb62>] ? nfsd+0xc2/0x160
> > [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffffa047caa0>] ? nfsd+0x0/0x160
> > [nfsd]
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8108ddb6>] ?
> kthread+0x96/0xa0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8100c1ca>] ?
> child_rip+0xa/0x20
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8108dd20>] ? kthread+0x0/0xa0
> > Dec  6 20:27:55 localhost kernel: [<ffffffff8100c1c0>] ?
> child_rip+0x0/0x20
> > Dec  6 20:27:55 localhost kernel: ---[ end trace e8b62253d4084e2b ]---
> >
> > --
> > Ryan C. England
> > Corvid Technologies <http://www.corvidtec.com/>
> > office: 704-799-6944 x158
> > cell:    980-521-2297
>
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> ---end quoted text---
>



-- 
Ryan C. England
Corvid Technologies <http://www.corvidtec.com/>
office: 704-799-6944 x158
cell:    980-521-2297

[-- Attachment #2: Type: text/html, Size: 15629 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-09 11:55 ` XFS causing stack overflow Christoph Hellwig
  2011-12-09 15:56   ` Ryan C. England
@ 2011-12-09 22:19   ` Dave Chinner
       [not found]   ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>
  2 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2011-12-09 22:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ryan C. England, linux-mm, xfs

On Fri, Dec 09, 2011 at 06:55:13AM -0500, Christoph Hellwig wrote:
> On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote:
> > I am looking for assistance on XFS which is why I have joined this mailing
> > list.  I'm receiving a stack overflow on our file server.  The server is
> > running Scientific Linux 6.1 with the following kernel,
> > 2.6.32-131.21.1.el6.x86_64.
> > 
> > This is causing random reboots which is more annoying than anything.  I
> > found a couple of links in the archives but wasn't quite sure how to apply
> > this patch.  I can provide whatever information necessary in order for
> > assistance in troubleshooting.
> 
> It's really mostly an issue with the VM page reclaim and writeback
> code.  The kernel still has the old balance dirty pages code which calls
> into writeback code from the stack of the write system call, which
> already comes from NFSD with massive amounts of stack used.  Then
> the writeback code calls into XFS to write data out, then you get the
> full XFS btree code, which then ends up in kmalloc and memory reclaim.

You forgot about interrupt stacking - that trace shows the system
took an interrupt at the point of highest stack usage in the
writeback call chain.... :/

> You probably have only a third of the stack actually used by XFS, the
> rest is from NFSD/writeback code and page reclaim.  I don't think any
> of this is easily fixable in a 2.6.32 codebase.  Current mainline 3.2-rc
> now has the I/O-less balance dirty pages which will basically split the
> stack footprint in half, but it's an invasive change to the writeback
> code that isn't easily backportable.

It also doesn't solve the problem, because we can get pretty much
the same stack from the COMMIT operation starting writeback....

The backport of the patches that separate the allocation onto a
separte workqueue are not straight forward because all the workqueue
code is different. I'll go back and update the TOT patch to make
this separation first before backporting...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>]

* Re: XFS causing stack overflow
       [not found]   ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>
@ 2011-12-10 19:52     ` Andi Kleen
  2011-12-10 22:13       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-12-10 19:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England

Dave Chinner <david@fromorbit.com> writes:
>
> You forgot about interrupt stacking - that trace shows the system
> took an interrupt at the point of highest stack usage in the
> writeback call chain.... :/

The interrupts are always running on other stacks these days
(even 32bit got switched over).

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-10 19:52     ` Andi Kleen
@ 2011-12-10 22:13       ` Dave Chinner
  2011-12-11  0:00         ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-12-10 22:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England

On Sat, Dec 10, 2011 at 11:52:51AM -0800, Andi Kleen wrote:
> Dave Chinner <david@fromorbit.com> writes:
> >
> > You forgot about interrupt stacking - that trace shows the system
> > took an interrupt at the point of highest stack usage in the
> > writeback call chain.... :/
> 
> The interrupts are always running on other stacks these days
> (even 32bit got switched over).

Where does the x86-64 do the interrupt stack switch?

I know the x86 32 bit interrupt handler switches to an irq/softirq
context stack, but the 64 bit one doesn't appear to. Indeed,
arch/x86/kernel/irq_{32,64}.c are very different, and only the 32
bit irq handler switches to another stack to process the
interrupts...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-10 22:13       ` Dave Chinner
@ 2011-12-11  0:00         ` Andi Kleen
  2011-12-11 23:05           ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-12-11  0:00 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs, Ryan C. England

> Where does the x86-64 do the interrupt stack switch?

in entry_64.S

> 
> I know the x86 32 bit interrupt handler switches to an irq/softirq
> context stack, but the 64 bit one doesn't appear to. Indeed,
> arch/x86/kernel/irq_{32,64}.c are very different, and only the 32
> bit irq handler switches to another stack to process the
> interrupts...

x86-64 always used interrupt stacks and has used softirq stacks
for a long time. 32bit got to it much later (the only good 
thing left from that 4k stack "experiment")

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-11  0:00         ` Andi Kleen
@ 2011-12-11 23:05           ` Dave Chinner
  2011-12-12  2:31             ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-12-11 23:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England

On Sun, Dec 11, 2011 at 01:00:36AM +0100, Andi Kleen wrote:
> > Where does the x86-64 do the interrupt stack switch?
> 
> in entry_64.S
> 
> > 
> > I know the x86 32 bit interrupt handler switches to an irq/softirq
> > context stack, but the 64 bit one doesn't appear to. Indeed,
> > arch/x86/kernel/irq_{32,64}.c are very different, and only the 32
> > bit irq handler switches to another stack to process the
> > interrupts...
> 
> x86-64 always used interrupt stacks and has used softirq stacks
> for a long time. 32bit got to it much later (the only good 
> thing left from that 4k stack "experiment")

Oh, it's hidden in the "SAVE_ARGS_IRQ" macro. 

But that happens before do_IRQ is called, so what is the do_IRQ call
chain doing on this stack given that we've already supposed to have
switched to the interrupt stack before do_IRQ is called?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-11 23:05           ` Dave Chinner
@ 2011-12-12  2:31             ` Andi Kleen
  2011-12-12  4:36               ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-12-12  2:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs, Ryan C. England

> But that happens before do_IRQ is called, so what is the do_IRQ call
> chain doing on this stack given that we've already supposed to have
> switched to the interrupt stack before do_IRQ is called?

Not sure I understand the question.

The pt_regs are on the original stack (but they are quite small), all the rest 
is on the new stack. ISTs are not used for interrupts, only for 
some special exceptions. do_IRQ doesn't switch any stacks on 64bit.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-12  2:31             ` Andi Kleen
@ 2011-12-12  4:36               ` Dave Chinner
  2011-12-12  5:13                 ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-12-12  4:36 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England

On Mon, Dec 12, 2011 at 03:31:30AM +0100, Andi Kleen wrote:
> > But that happens before do_IRQ is called, so what is the do_IRQ call
> > chain doing on this stack given that we've already supposed to have
> > switched to the interrupt stack before do_IRQ is called?
> 
> Not sure I understand the question.
> 
> The pt_regs are on the original stack (but they are quite small), all the rest 

It's ~180 bytes, so it's not really that small.

> is on the new stack. ISTs are not used for interrupts, only for 
> some special exceptions.

IST = ???

> do_IRQ doesn't switch any stacks on 64bit.

No, but it appears that it's caller does:

/* 0(%rsp): ~(interrupt number) */
        .macro interrupt func
        /* reserve pt_regs for scratch regs and rbp */
        subq $ORIG_RAX-RBP, %rsp
        CFI_ADJUST_CFA_OFFSET ORIG_RAX-RBP
        SAVE_ARGS_IRQ
        call \func
        .endm

and the SAVE_ARGS_IRQ macro switches to the per cpu interrupt stack.
The only caller does this:

common_interrupt:
        XCPT_FRAME
        addq $-0x80,(%rsp)              /* Adjust vector to [-256,-1] range */
        interrupt do_IRQ

So, why do we get this:

Dec  6 20:27:55 localhost kernel: <IRQ>  [<ffffffff81067097>] ?  warn_slowpath_common+0x87/0xc0
Dec  6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ?  __do_softirq+0x11a/0x1d0
Dec  6 20:27:55 localhost kernel: [<ffffffff81067186>] ?  warn_slowpath_fmt+0x46/0x50
Dec  6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ?  call_softirq+0x1c/0x30
Dec  6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ?  handle_irq+0x8f/0xa0
Dec  6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0
Dec  6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ?  ret_from_intr+0x0/0x11
Dec  6 20:27:55 localhost kernel: <EOI>  [<ffffffff8115b80f>] ?  kmem_cache_free+0xbf/0x2b0

at the top of the stack frame? Is the stack unwinder walking back
across the interrupt stack to the previous task stack?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-12  4:36               ` Dave Chinner
@ 2011-12-12  5:13                 ` Andi Kleen
  2011-12-12  9:00                   ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2011-12-12  5:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs, Ryan C. England

> It's ~180 bytes, so it's not really that small.

Quite small compared to what real code uses. And also fixed
size.

> 
> > is on the new stack. ISTs are not used for interrupts, only for 
> > some special exceptions.
> 
> IST = ???

That's a hardware mechanism on x86-64 to switch stacks
(Interrupt Stack Table or somesuch) 

With ISTs it would have been possible to move the the pt_regs too,
but the software mechanism is somewhat simpler.

> at the top of the stack frame? Is the stack unwinder walking back
> across the interrupt stack to the previous task stack?

Yes, the unwinder knows about all the extra stacks (interrupt
and exception stacks) and crosses them as needed.

BTW I suppose it wouldn't be all that hard to add more stacks and
switch to them too, similar to what the 32bit do_IRQ does. 
Perhaps XFS could just allocate its own stack per thread
(or maybe only if it detects some specific configuration that
is known to need much stack) 
It would need to be per thread if you could sleep inside them.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-12  5:13                 ` Andi Kleen
@ 2011-12-12  9:00                   ` Dave Chinner
  2011-12-12 13:43                     ` Ryan C. England
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-12-12  9:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Hellwig, linux-mm, xfs, Ryan C. England

On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote:
> > It's ~180 bytes, so it's not really that small.
> 
> Quite small compared to what real code uses. And also fixed
> size.
> 
> > 
> > > is on the new stack. ISTs are not used for interrupts, only for 
> > > some special exceptions.
> > 
> > IST = ???
> 
> That's a hardware mechanism on x86-64 to switch stacks
> (Interrupt Stack Table or somesuch) 
> 
> With ISTs it would have been possible to move the the pt_regs too,
> but the software mechanism is somewhat simpler.
> 
> > at the top of the stack frame? Is the stack unwinder walking back
> > across the interrupt stack to the previous task stack?
> 
> Yes, the unwinder knows about all the extra stacks (interrupt
> and exception stacks) and crosses them as needed.
> 
> BTW I suppose it wouldn't be all that hard to add more stacks and
> switch to them too, similar to what the 32bit do_IRQ does. 
> Perhaps XFS could just allocate its own stack per thread
> (or maybe only if it detects some specific configuration that
> is known to need much stack) 

That's possible, but rather complex, I think.
> It would need to be per thread if you could sleep inside them.

Yes, we'd need to sleep, do IO, possibly operate within a
transaction context, etc, and a workqueue handles all these cases
without having to do anything special. Splitting the stack at a
logical point is probably better, such as this patch:

http://oss.sgi.com/archives/xfs/2011-07/msg00443.html

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-12  9:00                   ` Dave Chinner
@ 2011-12-12 13:43                     ` Ryan C. England
  2011-12-12 22:47                       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Ryan C. England @ 2011-12-12 13:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs

[-- Attachment #1: Type: text/plain, Size: 2133 bytes --]

Is it possible to apply this patch to my current installation?  We use this
box in production and the reboots that we're experiencing are an
inconvenience.

Is there is a walkthrough on how to apply this patch?  If not, could your
provide the steps necessary to apply successfully?  I would greatly
appreciate it.

Thank you

On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner <david@fromorbit.com> wrote:

> On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote:
> > > It's ~180 bytes, so it's not really that small.
> >
> > Quite small compared to what real code uses. And also fixed
> > size.
> >
> > >
> > > > is on the new stack. ISTs are not used for interrupts, only for
> > > > some special exceptions.
> > >
> > > IST = ???
> >
> > That's a hardware mechanism on x86-64 to switch stacks
> > (Interrupt Stack Table or somesuch)
> >
> > With ISTs it would have been possible to move the the pt_regs too,
> > but the software mechanism is somewhat simpler.
> >
> > > at the top of the stack frame? Is the stack unwinder walking back
> > > across the interrupt stack to the previous task stack?
> >
> > Yes, the unwinder knows about all the extra stacks (interrupt
> > and exception stacks) and crosses them as needed.
> >
> > BTW I suppose it wouldn't be all that hard to add more stacks and
> > switch to them too, similar to what the 32bit do_IRQ does.
> > Perhaps XFS could just allocate its own stack per thread
> > (or maybe only if it detects some specific configuration that
> > is known to need much stack)
>
> That's possible, but rather complex, I think.
> > It would need to be per thread if you could sleep inside them.
>
> Yes, we'd need to sleep, do IO, possibly operate within a
> transaction context, etc, and a workqueue handles all these cases
> without having to do anything special. Splitting the stack at a
> logical point is probably better, such as this patch:
>
> http://oss.sgi.com/archives/xfs/2011-07/msg00443.html
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
Ryan C. England
Corvid Technologies <http://www.corvidtec.com/>
office: 704-799-6944 x158
cell:    980-521-2297

[-- Attachment #2: Type: text/html, Size: 3033 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: XFS causing stack overflow
  2011-12-12 13:43                     ` Ryan C. England
@ 2011-12-12 22:47                       ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2011-12-12 22:47 UTC (permalink / raw)
  To: Ryan C. England; +Cc: Andi Kleen, Christoph Hellwig, linux-mm, xfs

On Mon, Dec 12, 2011 at 08:43:57AM -0500, Ryan C. England wrote:
> On Mon, Dec 12, 2011 at 4:00 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Dec 12, 2011 at 06:13:11AM +0100, Andi Kleen wrote:
> > > BTW I suppose it wouldn't be all that hard to add more stacks and
> > > switch to them too, similar to what the 32bit do_IRQ does.
> > > Perhaps XFS could just allocate its own stack per thread
> > > (or maybe only if it detects some specific configuration that
> > > is known to need much stack)
> >
> > That's possible, but rather complex, I think.
> > > It would need to be per thread if you could sleep inside them.
> >
> > Yes, we'd need to sleep, do IO, possibly operate within a
> > transaction context, etc, and a workqueue handles all these cases
> > without having to do anything special. Splitting the stack at a
> > logical point is probably better, such as this patch:
> >
> > http://oss.sgi.com/archives/xfs/2011-07/msg00443.html
>
> Is it possible to apply this patch to my current installation?  We use this
> box in production and the reboots that we're experiencing are an
> inconvenience.

Not easily. The problem with a backport is that the workqueue
infrastructure changed around 2.6.36, allowing workqueues to act
like an (almost) infinite pool of worker threads and so by using a
workqueue we can have effectively unlimited numbers of concurrent
allocations in progress at once.

The workqueue implementation in 2.6.32 only allows a single work
instance per workqueue thread, and so even with per-CPU worker
threads, would only allow one allocation at a time per CPU. This
adds additional serialisation within a filesystem, between
filesystem and potentially adds new deadlock conditions as well.

So it's not exactly obvious whether it can be backported in a sane
manner or not.

> Is there is a walkthrough on how to apply this patch?  If not, could your
> provide the steps necessary to apply successfully?  I would greatly
> appreciate it.

It would probably need redesigning and re-implementing from scratch
because of the above reasons. It'd then need a lot of testing and
review. As a workaround, you might be better off doing what Andi
first suggested - recompiling your kernel to use 16k stacks.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

[parent not found: <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org>]

* Re: XFS causing stack overflow
       [not found] ` <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org>
@ 2011-12-09 19:53   ` Andi Kleen
  0 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2011-12-09 19:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Ryan C. England, linux-mm, xfs

Christoph Hellwig <hch@infradead.org> writes:
>
> You probably have only a third of the stack actually used by XFS, the
> rest is from NFSD/writeback code and page reclaim.  I don't think any
> of this is easily fixable in a 2.6.32 codebase.  Current mainline 3.2-rc
> now has the I/O-less balance dirty pages which will basically split the
> stack footprint in half, but it's an invasive change to the writeback
> code that isn't easily backportable.

An easy fix would be 16k stacks. Don't think they're that difficult
to do, but would need a special binary.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-12-12 22:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAAnfqPAm559m-Bv8LkHARm7iBW5Kfs7NmjTFidmg-idhcOq4sQ@mail.gmail.com>
2011-12-09 11:55 ` XFS causing stack overflow Christoph Hellwig
2011-12-09 15:56   ` Ryan C. England
2011-12-09 22:19   ` Dave Chinner
     [not found]   ` <20111209221956.GE14273__25752.826271537$1323469420$gmane$org@dastard>
2011-12-10 19:52     ` Andi Kleen
2011-12-10 22:13       ` Dave Chinner
2011-12-11  0:00         ` Andi Kleen
2011-12-11 23:05           ` Dave Chinner
2011-12-12  2:31             ` Andi Kleen
2011-12-12  4:36               ` Dave Chinner
2011-12-12  5:13                 ` Andi Kleen
2011-12-12  9:00                   ` Dave Chinner
2011-12-12 13:43                     ` Ryan C. England
2011-12-12 22:47                       ` Dave Chinner
     [not found] ` <20111209115513.GA19994__23079.9863501035$1323435203$gmane$org@infradead.org>
2011-12-09 19:53   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).