* ext4 deep stack with mark_page_dirty reclaim @ 2011-03-14 19:20 Hugh Dickins 2011-03-14 20:46 ` Ted Ts'o 2011-03-14 22:46 ` Christoph Hellwig 0 siblings, 2 replies; 7+ messages in thread From: Hugh Dickins @ 2011-03-14 19:20 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4, linux-kernel, linux-mm When testing something else on 2.6.38-rc8 last night, I hit this x86_64 stack overflow. I've never had one before, it seems worth reporting. kdb was in, I jotted it down by hand (the notifier part of it will be notifying kdb of the fault). CONFIG_DEBUG_STACK_OVERFLOW and DEBUG_STACK_USAGE were not set. I should disclose that I have a hack in which may make my stack frames slightly larger than they should be: check against yours. So it may not be an overflow for anyone else, but still a trace to worry about. The faulting address has a stray 2 in it: presumably from when the stack descended into the thread_info and got corrupted. BUG: unable to handle kernel paging request at ffff88027a704060 IP: vprintk+0x100 Thread overran stack, or stack corrupted ffff88007a7040a8 notifier_call_chain+0x40 ffff88007a704108 __atomic_notifier_call_chain+0x48 ffff88007a704148 __die+0x48 ffff88007a7041a8 die+0x50 ffff88007a7041e8 do_general_protection+0x168 ffff88007a704248 general_protection+0x1f ffff88007a704338 schedule+0x25a ffff88007a7044a8 io_schedule+0x35 ffff88007a7044c8 get_request_wait+0xc6 ffff88007a704568 __make_request+0x36d ffff88007a7045d8 generic_make_request+0x2f2 ffff88007a7046a8 submit_bio+0xe1 ffff88007a704738 swap_writepage+0xa3 ffff88007a704788 pageout+0x151 ffff88007a704808 shrink_page_list+0x2db ffff88007a7048b8 shrink_inactive_list+0x2d3 ffff88007a7049b8 shrink_zone+0x17d ffff88007a704a98 shrink_zones+0x0xa3 ffff88007a704b18 do_try_to_free_pages+0x87 ffff88007a704ba8 try_to_free_mem_cgroup_pages+0x8e ffff88007a704c18 mem_cgroup_hierarchical_reclaim+0x220 ffff88007a704cc8 mem_cgroup_do_charge+0xdc ffff88007a704d48 __mem_cgroup_try_charge+0x19c ffff88007a704dc8 mem_cgroup_charge_common+0xa8 ffff88007a704e48 mem_cgroup_cache_charge+0x19a ffff88007a704ec8 add_to_page_cache_locked+0x57 ffff88007a704f28 add_to_page_cache_lru+0x3e ffff88007a704f78 find_or_create_page+0x69 ffff88007a704fe8 grow_dev_page+0x4a ffff88007a705048 grow_buffers+0x41 ffff88007a705088 __getblk_slow+0xd7 ffff88007a7050d8 __getblk+0x44 ffff88007a705128 __ext4_get_inode_loc+0x12c ffff88007a7051d8 ext4_get_inode_loc+0x30 ffff88007a705208 ext4_reserve_inode_write+0x21 ffff88007a705258 ext4_mark_inode_dirty+0x3b ffff88007a7052f8 ext4_dirty_inode+0x3e ffff88007a705338 __mark_inode_dirty+0x32 linux/fs.h mark_inode_dirty linux/quotaops.h dquot_alloc_space linux/quotaops.h dquot_alloc_block ffff88007a705388 ext4_mb_new_blocks+0xc2 ffff88007a705418 ext4_alloc_blocks+0x189 ffff88007a7054e8 ext4_alloc_branch+0x73 ffff88007a7055b8 ext4_ind_map_blocks+0x148 ffff88007a7056c8 ext4_map_blocks+0x148 ffff88007a705738 ext4_getblk+0x5f ffff88007a7057c8 ext4_bread+0x36 ffff88007a705828 ext4_append+0x52 ffff88007a705888 do_split+0x5b ffff88007a705968 ext4_dx_add_entry+0x4b4 ffff88007a705a98 ext4_add_entry+0x7c ffff88007a705b48 ext4_add_nondir+0x2e ffff88007a705b98 ext4_create+0xf5 ffff88007a705c28 vfs_create+0x83 ffff88007a705c88 __open_namei_create+0x59 ffff88007a705ce8 do_last+0x13b ffff88007a705d58 do_filp_open+0x2ae ffff88007a705ed8 do_sys_open+0x72 ffff88007a705f58 sys_open+0x27 Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext4 deep stack with mark_page_dirty reclaim 2011-03-14 19:20 ext4 deep stack with mark_page_dirty reclaim Hugh Dickins @ 2011-03-14 20:46 ` Ted Ts'o 2011-03-15 2:25 ` Andreas Dilger 2011-03-14 22:46 ` Christoph Hellwig 1 sibling, 1 reply; 7+ messages in thread From: Ted Ts'o @ 2011-03-14 20:46 UTC (permalink / raw) To: Hugh Dickins; +Cc: linux-ext4, linux-kernel, linux-mm On Mon, Mar 14, 2011 at 12:20:52PM -0700, Hugh Dickins wrote: > When testing something else on 2.6.38-rc8 last night, > I hit this x86_64 stack overflow. I've never had one before, > it seems worth reporting. kdb was in, I jotted it down by hand > (the notifier part of it will be notifying kdb of the fault). > CONFIG_DEBUG_STACK_OVERFLOW and DEBUG_STACK_USAGE were not set. > > I should disclose that I have a hack in which may make my stack > frames slightly larger than they should be: check against yours. > So it may not be an overflow for anyone else, but still a trace > to worry about. Here's the trace translated to the stack space used by each function. There are a few piggy ext4 functions that we can try to shrink, but the real problem is just how deep the whole stack is getting. >From the syscall to the lowest-level ext4 function is 3712 bytes, and everything from there to the schedule() which then triggered the GPF was another 3728 of stack space.... - Ted 240 schedule+0x25a 368 io_schedule+0x35 32 get_request_wait+0xc6 160 __make_request+0x36d 112 generic_make_request+0x2f2 208 submit_bio+0xe1 144 swap_writepage+0xa3 80 pageout+0x151 128 shrink_page_list+0x2db 176 shrink_inactive_list+0x2d3 256 shrink_zone+0x17d 224 shrink_zones+0x0xa3 128 do_try_to_free_pages+0x87 144 try_to_free_mem_cgroup_pages+0x8e 112 mem_cgroup_hierarchical_reclaim+0x220 176 mem_cgroup_do_charge+0xdc 128 __mem_cgroup_try_charge+0x19c 128 mem_cgroup_charge_common+0xa8 128 mem_cgroup_cache_charge+0x19a 128 add_to_page_cache_locked+0x57 96 add_to_page_cache_lru+0x3e 80 find_or_create_page+0x69 112 grow_dev_page+0x4a 96 grow_buffers+0x41 64 __getblk_slow+0xd7 80 __getblk+0x44 80 __ext4_get_inode_loc+0x12c 176 ext4_get_inode_loc+0x30 48 ext4_reserve_inode_write+0x21 80 ext4_mark_inode_dirty+0x3b 160 ext4_dirty_inode+0x3e 64 __mark_inode_dirty+0x32 80 linux/fs.h mark_inode_dirty 0 linux/quotaops.h dquot_alloc_space 0 linux/quotaops.h dquot_alloc_block 0 ext4_mb_new_blocks+0xc2 144 ext4_alloc_blocks+0x189 208 ext4_alloc_branch+0x73 208 ext4_ind_map_blocks+0x148 272 ext4_map_blocks+0x148 112 ext4_getblk+0x5f 144 ext4_bread+0x36 96 ext4_append+0x52 96 do_split+0x5b 224 ext4_dx_add_entry+0x4b4 304 ext4_add_entry+0x7c 176 ext4_add_nondir+0x2e 80 ext4_create+0xf5 144 vfs_create+0x83 96 __open_namei_create+0x59 96 do_last+0x13b 112 do_filp_open+0x2ae 384 do_sys_open+0x72 128 sys_open+0x27 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext4 deep stack with mark_page_dirty reclaim 2011-03-14 20:46 ` Ted Ts'o @ 2011-03-15 2:25 ` Andreas Dilger 2011-03-15 15:22 ` David Sterba 0 siblings, 1 reply; 7+ messages in thread From: Andreas Dilger @ 2011-03-15 2:25 UTC (permalink / raw) To: Ted Ts'o; +Cc: Hugh Dickins, linux-ext4, linux-kernel, linux-mm On 2011-03-14, at 1:46 PM, Ted Ts'o wrote: > On Mon, Mar 14, 2011 at 12:20:52PM -0700, Hugh Dickins wrote: >> When testing something else on 2.6.38-rc8 last night, >> I hit this x86_64 stack overflow. I've never had one before, >> it seems worth reporting. kdb was in, I jotted it down by hand >> (the notifier part of it will be notifying kdb of the fault). >> CONFIG_DEBUG_STACK_OVERFLOW and DEBUG_STACK_USAGE were not set. >> >> I should disclose that I have a hack in which may make my stack >> frames slightly larger than they should be: check against yours. >> So it may not be an overflow for anyone else, but still a trace >> to worry about. > > Here's the trace translated to the stack space used by each function. > There are a few piggy ext4 functions that we can try to shrink, but > the real problem is just how deep the whole stack is getting. > > From the syscall to the lowest-level ext4 function is 3712 bytes, and > everything from there to the schedule() which then triggered the GPF > was another 3728 of stack space.... Is there a script which you used to generate this stack trace to function size mapping, or did you do it by hand? I've always wanted such a script, but the tricky part is that there is so much garbage on the stack that any automated stack parsing is almost useless. Alternately, it would seem trivial to have the stack dumper print the relative address of each symbol, and the delta from the previous symbol... To be honest, I think the stack size limitation is becoming a serious problem in itself. While some stack-size reduction effort is actually useful in removing inefficiency, I think there is a lot of crazy and inefficient things to try and minimize the stack usage (e.g. lots of kmalloc/kfree of temporary arrays instead of just putting them on the stack), which ends up consuming _more_ total memory. This can be seen with deep storage stacks that are using the network on both ends, like NFS+{XFS, ext4}+LVM+DM+{fcoib,iSCSI}+driver+kmalloc or similar... The below stack isn't even using something so convoluted. > 240 schedule+0x25a > 368 io_schedule+0x35 > 32 get_request_wait+0xc6 > 160 __make_request+0x36d > 112 generic_make_request+0x2f2 > 208 submit_bio+0xe1 > 144 swap_writepage+0xa3 > 80 pageout+0x151 > 128 shrink_page_list+0x2db > 176 shrink_inactive_list+0x2d3 > 256 shrink_zone+0x17d > 224 shrink_zones+0x0xa3 > 128 do_try_to_free_pages+0x87 > 144 try_to_free_mem_cgroup_pages+0x8e > 112 mem_cgroup_hierarchical_reclaim+0x220 > 176 mem_cgroup_do_charge+0xdc > 128 __mem_cgroup_try_charge+0x19c > 128 mem_cgroup_charge_common+0xa8 > 128 mem_cgroup_cache_charge+0x19a > 128 add_to_page_cache_locked+0x57 > 96 add_to_page_cache_lru+0x3e > 80 find_or_create_page+0x69 > 112 grow_dev_page+0x4a > 96 grow_buffers+0x41 > 64 __getblk_slow+0xd7 > 80 __getblk+0x44 > 80 __ext4_get_inode_loc+0x12c > 176 ext4_get_inode_loc+0x30 > 48 ext4_reserve_inode_write+0x21 > 80 ext4_mark_inode_dirty+0x3b > 160 ext4_dirty_inode+0x3e > 64 __mark_inode_dirty+0x32 > 80 linux/fs.h mark_inode_dirty > 0 linux/quotaops.h dquot_alloc_space > 0 linux/quotaops.h dquot_alloc_block > 0 ext4_mb_new_blocks+0xc2 > 144 ext4_alloc_blocks+0x189 > 208 ext4_alloc_branch+0x73 > 208 ext4_ind_map_blocks+0x148 > 272 ext4_map_blocks+0x148 > 112 ext4_getblk+0x5f > 144 ext4_bread+0x36 > 96 ext4_append+0x52 > 96 do_split+0x5b > 224 ext4_dx_add_entry+0x4b4 > 304 ext4_add_entry+0x7c > 176 ext4_add_nondir+0x2e > 80 ext4_create+0xf5 > 144 vfs_create+0x83 > 96 __open_namei_create+0x59 > 96 do_last+0x13b > 112 do_filp_open+0x2ae > 384 do_sys_open+0x72 > 128 sys_open+0x27 > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext4 deep stack with mark_page_dirty reclaim 2011-03-15 2:25 ` Andreas Dilger @ 2011-03-15 15:22 ` David Sterba 2011-03-15 16:26 ` Chris Mason 0 siblings, 1 reply; 7+ messages in thread From: David Sterba @ 2011-03-15 15:22 UTC (permalink / raw) To: linux-kernel, linux-mm; +Cc: adilger On Mon, Mar 14, 2011 at 07:25:10PM -0700, Andreas Dilger wrote: > Is there a script which you used to generate this stack trace to > function size mapping, or did you do it by hand? I've always wanted > such a script, but the tricky part is that there is so much garbage on > the stack that any automated stack parsing is almost useless. > Alternately, it would seem trivial to have the stack dumper print the > relative address of each symbol, and the delta from the previous > symbol... > > 240 schedule+0x25a > > 368 io_schedule+0x35 > > 32 get_request_wait+0xc6 from the callstack: ffff88007a704338 schedule+0x25a ffff88007a7044a8 io_schedule+0x35 ffff88007a7044c8 get_request_wait+0xc6 subtract the values and you get the ones Ted posted, eg. for get_request_wait: 0xffff88007a7044c8 - 0xffff88007a7044a8 = 32 There'se a script scripts/checkstack.pl which tries to determine stack usage from 'objdump -d' looking for the 'sub 0x123,%rsp' instruction and reporting the 0x123 as stack consumption. It does not give same results, for the get_request_wait: ffffffff81216205: 48 83 ec 68 sub $0x68,%rsp reported as 104. dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext4 deep stack with mark_page_dirty reclaim 2011-03-15 15:22 ` David Sterba @ 2011-03-15 16:26 ` Chris Mason 2011-03-23 14:02 ` David Sterba 0 siblings, 1 reply; 7+ messages in thread From: Chris Mason @ 2011-03-15 16:26 UTC (permalink / raw) To: dave; +Cc: linux-kernel, linux-mm, adilger Excerpts from David Sterba's message of 2011-03-15 11:22:22 -0400: > On Mon, Mar 14, 2011 at 07:25:10PM -0700, Andreas Dilger wrote: > > Is there a script which you used to generate this stack trace to > > function size mapping, or did you do it by hand? I've always wanted > > such a script, but the tricky part is that there is so much garbage on > > the stack that any automated stack parsing is almost useless. > > Alternately, it would seem trivial to have the stack dumper print the > > relative address of each symbol, and the delta from the previous > > symbol... > > > > 240 schedule+0x25a > > > 368 io_schedule+0x35 > > > 32 get_request_wait+0xc6 > > from the callstack: > > ffff88007a704338 schedule+0x25a > ffff88007a7044a8 io_schedule+0x35 > ffff88007a7044c8 get_request_wait+0xc6 > > subtract the values and you get the ones Ted posted, > > eg. for get_request_wait: > > 0xffff88007a7044c8 - 0xffff88007a7044a8 = 32 > > There'se a script scripts/checkstack.pl which tries to determine stack > usage from 'objdump -d' looking for the 'sub 0x123,%rsp' instruction and > reporting the 0x123 as stack consumption. It does not give same results, > for the get_request_wait: > > ffffffff81216205: 48 83 ec 68 sub $0x68,%rsp > > reported as 104. Also, the ftrace stack usage tracer gives more verbose output that includes the size of each function. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext4 deep stack with mark_page_dirty reclaim 2011-03-15 16:26 ` Chris Mason @ 2011-03-23 14:02 ` David Sterba 0 siblings, 0 replies; 7+ messages in thread From: David Sterba @ 2011-03-23 14:02 UTC (permalink / raw) To: Chris Mason; +Cc: dave, linux-kernel, linux-mm, adilger On Tue, Mar 15, 2011 at 12:26:43PM -0400, Chris Mason wrote: > Also, the ftrace stack usage tracer gives more verbose output that > includes the size of each function. Yet another one on the list is the -fstack-size option in new gcc 4.6 [*]. It creates a file with .su extension containing lines in format file:line:char:function_name stack_size linkage_type eg. a.c:168:5:main 224 static dave * http://gcc.gnu.org/gcc-4.6/changes.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ext4 deep stack with mark_page_dirty reclaim 2011-03-14 19:20 ext4 deep stack with mark_page_dirty reclaim Hugh Dickins 2011-03-14 20:46 ` Ted Ts'o @ 2011-03-14 22:46 ` Christoph Hellwig 1 sibling, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2011-03-14 22:46 UTC (permalink / raw) To: Hugh Dickins; +Cc: Theodore Ts'o, linux-ext4, linux-kernel, linux-mm Direct reclaim (in the cgroup variant) at it's work. We had a couple of flamewars on this before, but this trivial example with reclaim from the most simple case (swap space) shows that we really should never reclaim from memory allocation callers for stack usage reasons. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-03-23 14:03 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-14 19:20 ext4 deep stack with mark_page_dirty reclaim Hugh Dickins 2011-03-14 20:46 ` Ted Ts'o 2011-03-15 2:25 ` Andreas Dilger 2011-03-15 15:22 ` David Sterba 2011-03-15 16:26 ` Chris Mason 2011-03-23 14:02 ` David Sterba 2011-03-14 22:46 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).