linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fs: clear_inode failed with nrpages not zero!
@ 2014-02-26  8:40 hitmoon
  2014-02-26 12:31 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: hitmoon @ 2014-02-26  8:40 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Andrew Morton, jack, hannes

Hi all:

     I am running a redhat 2.6.32-279 offical kernel. Under heavy work 
load and memory pressure, in my case, running ltp test for about 20 
hours, kernel oops happened. Say concretely, a testcase process open a 
file, truncate to 128M, mmap, munmap and close the file, this circle 
repeatedly when kernel hangs. Through the vmcore, I also find it hangs 
at: BUG_ON(inode->i_data.nrpages) in function clear_inode, which means 
the truncate_inode_pages faild to decrase nrpages to 0. I have google 
this problem and find no clear solutions but make me confused. The 
comment of function truncate_inode_pages says that after it return, the 
nrpages may not be zero.

     My understanding is: the page reclaime migth still in the process 
of deletion of the page. Jan Kara once post a patch, which use spin_lock 
to sync the radix tree and nrpages. This kernel already contains this 
patch. Then problem come: When kernel hangs, the nrpages is not a small 
number like 1 or 2, but a bigger one, more than 500 or 700! So I think 
even we take some sync measures before clear inode, the function 
truncate_inode_pages together with other reclaim functions failed to set 
nrpages to zero. By dump the vmcore, I also find the radix tree is also 
not empty but with some slots left.

     Then I think:
     1. The fault might happen at pagevec_lookup, which return no page 
even the radix tree is in fact not empty. Because lookup uses the rcu 
lock, is it possible a race condition
        happened in the lookup process and lead the function return 
unexpectedly? If possiable, how dose it happened ?
     2. I find Johannes Weiner post a 
patch(http://www.spinics.net/lists/linux-fsdevel/msg72395.html), which 
has following code:

+	if (nrpages || nrshadows) {
+		/*
+		 * As truncation uses a lockless tree lookup, cycle
+		 * the tree lock to make sure any ongoing tree
+		 * modification that does not see AS_EXITING is
+		 * completed before starting the final truncate.
+		 */
+		spin_lock_irq(&mapping->tree_lock);
+		spin_unlock_irq(&mapping->tree_lock);
+
+		truncate_inode_pages(mapping, 0);
+	}

     which wrapped the truncate_inode_pages in function 
truncate_inode_pages_final. Does it make sence to my problem ?

     Any suggestion will be appreciated!


^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: fs: clear_inode failed with nrpages not zero!
@ 2014-02-26 14:44 xiaoqiang zhao
  0 siblings, 0 replies; 3+ messages in thread
From: xiaoqiang zhao @ 2014-02-26 14:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Andrew Morton, hannes

Resend, for gmail android app can not send plain text mail,sorry!

Thanks Kara! I will try to get a better understanding of this problem. 

Jan Kara <jack@suse.cz>编写:

>  Hello,
>
>On Wed 26-02-14 16:40:44, hitmoon wrote:
>>    I am running a redhat 2.6.32-279 offical kernel.
>  Well, in that case you should consider contacting RH support instead of
>general community forum... Also 2.6.32 is pretty old and RH (similarly as
>other enterprise distributors) has lots of stuff added on top of it. So it
>is hard to help you.
>
>> Under heavy work load and memory pressure, in my case, running ltp test
>> for about 20 hours, kernel oops happened. Say concretely, a testcase
>> process open a file, truncate to 128M, mmap, munmap and close the file,
>> this circle repeatedly when kernel hangs. Through the vmcore, I also find
>> it hangs at: BUG_ON(inode->i_data.nrpages) in function clear_inode, which
>> means the truncate_inode_pages faild to decrase nrpages to 0. I have
>> google this problem and find no clear solutions but make me confused. The
>> comment of function truncate_inode_pages says that after it return, the
>> nrpages may not be zero.
>> 
>>     My understanding is: the page reclaime migth still in the
>> process of deletion of the page. Jan Kara once post a patch, which
>> use spin_lock to sync the radix tree and nrpages. This kernel
>> already contains this patch. Then problem come: When kernel hangs,
>> the nrpages is not a small number like 1 or 2, but a bigger one,
>> more than 500 or 700! So I think even we take some sync measures
>> before clear inode, the function truncate_inode_pages together with
>> other reclaim functions failed to set nrpages to zero. By dump the
>> vmcore, I also find the radix tree is also not empty but with some
>> slots left.
>> 
>>     Then I think:
>>     1. The fault might happen at pagevec_lookup, which return no
>> page even the radix tree is in fact not empty. Because lookup uses
>> the rcu lock, is it possible a race condition
>>        happened in the lookup process and lead the function return
>> unexpectedly? If possiable, how dose it happened ?
>>     2. I find Johannes Weiner post a
>> patch(http://www.spinics.net/lists/linux-fsdevel/msg72395.html),
>> which has following code:
>> 
>> +	if (nrpages || nrshadows) {
>> +		/*
>> +		 * As truncation uses a lockless tree lookup, cycle
>> +		 * the tree lock to make sure any ongoing tree
>> +		 * modification that does not see AS_EXITING is
>> +		 * completed before starting the final truncate.
>> +		 */
>> +		spin_lock_irq(&mapping->tree_lock);
>> +		spin_unlock_irq(&mapping->tree_lock);
>> +
>> +		truncate_inode_pages(mapping, 0);
>> +	}
>> 
>>     which wrapped the truncate_inode_pages in function
>> truncate_inode_pages_final. Does it make sence to my problem ?
>  This shouldn't be really related. That is specific to Johannes' patch set
>adding new special radix tree entries.
>
>								Honza
>-- 
>Jan Kara <jack@suse.cz>
>SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-02-26 14:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26  8:40 fs: clear_inode failed with nrpages not zero! hitmoon
2014-02-26 12:31 ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2014-02-26 14:44 xiaoqiang zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).