From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx105.postini.com [74.125.245.105]) by kanga.kvack.org (Postfix) with SMTP id 24C396B005D for ; Wed, 24 Oct 2012 22:37:42 -0400 (EDT) Date: Wed, 24 Oct 2012 22:37:38 -0400 From: Dave Jones Subject: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Linux Kernel Machine under significant load (4gb memory used, swap usage fluctuating) triggered this... WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] shmem_getpage_gfp+0xa5c/0xa70 [] ? shmem_getpage_gfp+0x29e/0xa70 [] shmem_fault+0x4f/0xa0 [] __do_fault+0x71/0x5c0 [] ? __lock_acquire+0x306/0x1ba0 [] ? local_clock+0x89/0xa0 [] handle_pte_fault+0x97/0xae0 [] ? sub_preempt_count+0x79/0xd0 [] ? delay_tsc+0xae/0x120 [] ? __const_udelay+0x28/0x30 [] handle_mm_fault+0x289/0x350 [] __do_page_fault+0x18e/0x530 [] ? local_clock+0x89/0xa0 [] ? get_parent_ip+0x11/0x50 [] ? get_parent_ip+0x11/0x50 [] ? sub_preempt_count+0x79/0xd0 [] ? rcu_user_exit+0xc9/0xf0 [] do_page_fault+0x2b/0x50 [] page_fault+0x28/0x30 [] ? copy_user_enhanced_fast_string+0x9/0x20 [] ? sys_futimesat+0x41/0xe0 [] ? syscall_trace_enter+0x25/0x2c0 [] ? tracesys+0x7e/0xe6 [] tracesys+0xe1/0xe6 1148 error = shmem_add_to_page_cache(page, mapping, index, 1149 gfp, swp_to_radix_entry(swap)); 1150 /* We already confirmed swap, and make no allocation */ 1151 VM_BUG_ON(error); 1152 } total used free shared buffers cached Mem: 3885528 2854064 1031464 0 9624 19208 -/+ buffers/cache: 2825232 1060296 Swap: 6029308 30656 5998652 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx188.postini.com [74.125.245.188]) by kanga.kvack.org (Postfix) with SMTP id 2CAF96B0062 for ; Thu, 25 Oct 2012 00:36:34 -0400 (EDT) Received: by mail-ia0-f169.google.com with SMTP id h37so1265245iak.14 for ; Wed, 24 Oct 2012 21:36:33 -0700 (PDT) Date: Wed, 24 Oct 2012 21:36:27 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121025023738.GA27001@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, 24 Oct 2012, Dave Jones wrote: > Machine under significant load (4gb memory used, swap usage fluctuating) > triggered this... > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_null+0x1a/0x20 > [] shmem_getpage_gfp+0xa5c/0xa70 > [] ? shmem_getpage_gfp+0x29e/0xa70 > [] shmem_fault+0x4f/0xa0 > [] __do_fault+0x71/0x5c0 > [] ? __lock_acquire+0x306/0x1ba0 > [] ? local_clock+0x89/0xa0 > [] handle_pte_fault+0x97/0xae0 > [] ? sub_preempt_count+0x79/0xd0 > [] ? delay_tsc+0xae/0x120 > [] ? __const_udelay+0x28/0x30 > [] handle_mm_fault+0x289/0x350 > [] __do_page_fault+0x18e/0x530 > [] ? local_clock+0x89/0xa0 > [] ? get_parent_ip+0x11/0x50 > [] ? get_parent_ip+0x11/0x50 > [] ? sub_preempt_count+0x79/0xd0 > [] ? rcu_user_exit+0xc9/0xf0 > [] do_page_fault+0x2b/0x50 > [] page_fault+0x28/0x30 > [] ? copy_user_enhanced_fast_string+0x9/0x20 > [] ? sys_futimesat+0x41/0xe0 > [] ? syscall_trace_enter+0x25/0x2c0 > [] ? tracesys+0x7e/0xe6 > [] tracesys+0xe1/0xe6 > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > 1149 gfp, swp_to_radix_entry(swap)); > 1150 /* We already confirmed swap, and make no allocation */ > 1151 VM_BUG_ON(error); > 1152 } That's very surprising. Easy enough to handle an error there, but of course I made it a VM_BUG_ON because it violates my assumptions: I rather need to understand how this can be, and I've no idea. Clutching at straws, I expect this is entirely irrelevant, but: there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor in current linux.git; rather, there's a VM_BUG_ON on line 1149. So you've inserted a couple of lines for some reason (more useful trinity behaviour, perhaps)? And have some config option I'm unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? Hugh > > > total used free shared buffers cached > Mem: 3885528 2854064 1031464 0 9624 19208 > -/+ buffers/cache: 2825232 1060296 > Swap: 6029308 30656 5998652 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx166.postini.com [74.125.245.166]) by kanga.kvack.org (Postfix) with SMTP id CD68B6B0070 for ; Thu, 25 Oct 2012 00:50:58 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id fa10so949285pad.14 for ; Wed, 24 Oct 2012 21:50:58 -0700 (PDT) Message-ID: <5088C51D.3060009@gmail.com> Date: Thu, 25 Oct 2012 12:50:37 +0800 From: Ni zhan Chen MIME-Version: 1.0 Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 10/25/2012 12:36 PM, Hugh Dickins wrote: > On Wed, 24 Oct 2012, Dave Jones wrote: > >> Machine under significant load (4gb memory used, swap usage fluctuating) >> triggered this... >> >> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >> Call Trace: >> [] warn_slowpath_common+0x7f/0xc0 >> [] warn_slowpath_null+0x1a/0x20 >> [] shmem_getpage_gfp+0xa5c/0xa70 >> [] ? shmem_getpage_gfp+0x29e/0xa70 >> [] shmem_fault+0x4f/0xa0 >> [] __do_fault+0x71/0x5c0 >> [] ? __lock_acquire+0x306/0x1ba0 >> [] ? local_clock+0x89/0xa0 >> [] handle_pte_fault+0x97/0xae0 >> [] ? sub_preempt_count+0x79/0xd0 >> [] ? delay_tsc+0xae/0x120 >> [] ? __const_udelay+0x28/0x30 >> [] handle_mm_fault+0x289/0x350 >> [] __do_page_fault+0x18e/0x530 >> [] ? local_clock+0x89/0xa0 >> [] ? get_parent_ip+0x11/0x50 >> [] ? get_parent_ip+0x11/0x50 >> [] ? sub_preempt_count+0x79/0xd0 >> [] ? rcu_user_exit+0xc9/0xf0 >> [] do_page_fault+0x2b/0x50 >> [] page_fault+0x28/0x30 >> [] ? copy_user_enhanced_fast_string+0x9/0x20 >> [] ? sys_futimesat+0x41/0xe0 >> [] ? syscall_trace_enter+0x25/0x2c0 >> [] ? tracesys+0x7e/0xe6 >> [] tracesys+0xe1/0xe6 >> >> >> >> 1148 error = shmem_add_to_page_cache(page, mapping, index, >> 1149 gfp, swp_to_radix_entry(swap)); >> 1150 /* We already confirmed swap, and make no allocation */ >> 1151 VM_BUG_ON(error); >> 1152 } > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. > > Clutching at straws, I expect this is entirely irrelevant, but: > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > So you've inserted a couple of lines for some reason (more useful > trinity behaviour, perhaps)? And have some config option I'm > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? Hi Hugh, I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative rss in memcg memory.stat], one question: if function shmem_confirm_swap confirm the entry has already brought back from swap by a racing thread, then why call shmem_add_to_page_cache to add page from swapcache to pagecache again? otherwise, will goto unlock and then go to repeat? where I miss? Regards, Chen > > Hugh > >> >> total used free shared buffers cached >> Mem: 3885528 2854064 1031464 0 9624 19208 >> -/+ buffers/cache: 2825232 1060296 >> Swap: 6029308 30656 5998652 > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 9319A6B0062 for ; Thu, 25 Oct 2012 02:59:50 -0400 (EDT) Received: by mail-pa0-f41.google.com with SMTP id fa10so1024951pad.14 for ; Wed, 24 Oct 2012 23:59:49 -0700 (PDT) Date: Wed, 24 Oct 2012 23:59:40 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <5088C51D.3060009@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Ni zhan Chen Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Ni zhan Chen wrote: > On 10/25/2012 12:36 PM, Hugh Dickins wrote: > > On Wed, 24 Oct 2012, Dave Jones wrote: > > > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > > triggered this... > > > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > > > > > 1148 error = shmem_add_to_page_cache(page, > > > mapping, index, > > > 1149 gfp, > > > swp_to_radix_entry(swap)); > > > 1150 /* We already confirmed swap, and make no > > > allocation */ > > > 1151 VM_BUG_ON(error); > > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > > of course I made it a VM_BUG_ON because it violates my assumptions: > > I rather need to understand how this can be, and I've no idea. > > > > Clutching at straws, I expect this is entirely irrelevant, but: > > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > > > So you've inserted a couple of lines for some reason (more useful > > trinity behaviour, perhaps)? And have some config option I'm > > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? > > Hi Hugh, > > I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative > rss in memcg memory.stat], one question: Well, yes, I added the VM_BUG_ON in that commit. > > if function shmem_confirm_swap confirm the entry has already brought back > from swap by a racing thread, The reverse: true confirms that the swap entry has not been brought back from swap by a racing thread; false indicates that there has been a race. > then why call shmem_add_to_page_cache to add > page from swapcache to pagecache again? Adding it to pagecache again, after such a race, would set error to -EEXIST (originating from radix_tree_insert); but we don't do that, we add it to pagecache when it has not already been added. Or that's the intention: but Dave seems to have found an unexpected exception, despite us holding the page lock across all this. (But if it weren't for the memcg and replace_page issues, I'd much prefer to let shmem_add_to_page_cache discover the race as before.) Hugh > otherwise, will goto unlock and then go to repeat? where I miss? > > Regards, > Chen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx125.postini.com [74.125.245.125]) by kanga.kvack.org (Postfix) with SMTP id D97F76B0062 for ; Thu, 25 Oct 2012 05:53:21 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so2466308ied.14 for ; Thu, 25 Oct 2012 02:53:21 -0700 (PDT) Message-ID: <50890C06.5060305@gmail.com> Date: Thu, 25 Oct 2012 17:53:10 +0800 From: Ni zhan Chen MIME-Version: 1.0 Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------090709030502040204030605" Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------090709030502040204030605 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 10/25/2012 02:59 PM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Ni zhan Chen wrote: >> On 10/25/2012 12:36 PM, Hugh Dickins wrote: >>> On Wed, 24 Oct 2012, Dave Jones wrote: >>> >>>> Machine under significant load (4gb memory used, swap usage fluctuating) >>>> triggered this... >>>> >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >>>> >>>> 1148 error = shmem_add_to_page_cache(page, >>>> mapping, index, >>>> 1149 gfp, >>>> swp_to_radix_entry(swap)); >>>> 1150 /* We already confirmed swap, and make no >>>> allocation */ >>>> 1151 VM_BUG_ON(error); >>>> 1152 } >>> That's very surprising. Easy enough to handle an error there, but >>> of course I made it a VM_BUG_ON because it violates my assumptions: >>> I rather need to understand how this can be, and I've no idea. >>> >>> Clutching at straws, I expect this is entirely irrelevant, but: >>> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor >>> in current linux.git; rather, there's a VM_BUG_ON on line 1149. >>> >>> So you've inserted a couple of lines for some reason (more useful >>> trinity behaviour, perhaps)? And have some config option I'm >>> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? >> Hi Hugh, >> >> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative >> rss in memcg memory.stat], one question: > Well, yes, I added the VM_BUG_ON in that commit. > >> if function shmem_confirm_swap confirm the entry has already brought back >> from swap by a racing thread, > The reverse: true confirms that the swap entry has not been brought back > from swap by a racing thread; false indicates that there has been a race. > >> then why call shmem_add_to_page_cache to add >> page from swapcache to pagecache again? > Adding it to pagecache again, after such a race, would set error to > -EEXIST (originating from radix_tree_insert); but we don't do that, > we add it to pagecache when it has not already been added. > > Or that's the intention: but Dave seems to have found an unexpected > exception, despite us holding the page lock across all this. > > (But if it weren't for the memcg and replace_page issues, I'd much > prefer to let shmem_add_to_page_cache discover the race as before.) > > Hugh Hi Hugh Thanks for your response. You mean the -EEXIST originating from radix_tree_insert, in radix_tree_insert: if (slot != NULL) return -EEXIST; But why slot should be NULL? if no race, the pagecache related radix tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? Regards, Chen > >> otherwise, will goto unlock and then go to repeat? where I miss? >> >> Regards, >> Chen --------------090709030502040204030605 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
On 10/25/2012 02:59 PM, Hugh Dickins wrote:
On Thu, 25 Oct 2012, Ni zhan Chen wrote:
On 10/25/2012 12:36 PM, Hugh Dickins wrote:
On Wed, 24 Oct 2012, Dave Jones wrote:

Machine under significant load (4gb memory used, swap usage fluctuating)
triggered this...

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49

1148                         error = shmem_add_to_page_cache(page,
mapping, index,
1149                                                 gfp,
swp_to_radix_entry(swap));
1150                         /* We already confirmed swap, and make no
allocation */
1151                         VM_BUG_ON(error);
1152                 }
That's very surprising.  Easy enough to handle an error there, but
of course I made it a VM_BUG_ON because it violates my assumptions:
I rather need to understand how this can be, and I've no idea.

Clutching at straws, I expect this is entirely irrelevant, but:
there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
in current linux.git; rather, there's a VM_BUG_ON on line 1149.

So you've inserted a couple of lines for some reason (more useful
trinity behaviour, perhaps)?  And have some config option I'm
unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
Hi Hugh,

I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative
rss in memcg memory.stat], one question:
Well, yes, I added the VM_BUG_ON in that commit.

if function shmem_confirm_swap confirm the entry has already brought back
from swap by a racing thread,
The reverse: true confirms that the swap entry has not been brought back
from swap by a racing thread; false indicates that there has been a race.

then why call shmem_add_to_page_cache to add
page from swapcache to pagecache again?
Adding it to pagecache again, after such a race, would set error to
-EEXIST (originating from radix_tree_insert); but we don't do that,
we add it to pagecache when it has not already been added.

Or that's the intention: but Dave seems to have found an unexpected
exception, despite us holding the page lock across all this.

(But if it weren't for the memcg and replace_page issues, I'd much
prefer to let shmem_add_to_page_cache discover the race as before.)

Hugh

Hi Hugh

Thanks for your response. You mean the -EEXIST originating from radix_tree_insert, in radix_tree_insert:
if (slot != NULL)
    return -EEXIST;
But why slot should be NULL? if no race, the pagecache related radix tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss?

Regards,
Chen


otherwise, will goto unlock and then go to repeat? where I miss?

Regards,
Chen

    

--------------090709030502040204030605-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id BC80F6B0071 for ; Thu, 25 Oct 2012 06:21:44 -0400 (EDT) Received: by mail-pb0-f41.google.com with SMTP id rq2so2092538pbb.14 for ; Thu, 25 Oct 2012 03:21:44 -0700 (PDT) Message-ID: <508912B0.7080805@gmail.com> Date: Thu, 25 Oct 2012 18:21:36 +0800 From: Ni zhan Chen MIME-Version: 1.0 Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 10/25/2012 02:59 PM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Ni zhan Chen wrote: >> On 10/25/2012 12:36 PM, Hugh Dickins wrote: >>> On Wed, 24 Oct 2012, Dave Jones wrote: >>> >>>> Machine under significant load (4gb memory used, swap usage fluctuating) >>>> triggered this... >>>> >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >>>> >>>> 1148 error = shmem_add_to_page_cache(page, >>>> mapping, index, >>>> 1149 gfp, >>>> swp_to_radix_entry(swap)); >>>> 1150 /* We already confirmed swap, and make no >>>> allocation */ >>>> 1151 VM_BUG_ON(error); >>>> 1152 } >>> That's very surprising. Easy enough to handle an error there, but >>> of course I made it a VM_BUG_ON because it violates my assumptions: >>> I rather need to understand how this can be, and I've no idea. >>> >>> Clutching at straws, I expect this is entirely irrelevant, but: >>> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor >>> in current linux.git; rather, there's a VM_BUG_ON on line 1149. >>> >>> So you've inserted a couple of lines for some reason (more useful >>> trinity behaviour, perhaps)? And have some config option I'm >>> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? >> Hi Hugh, >> >> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative >> rss in memcg memory.stat], one question: > Well, yes, I added the VM_BUG_ON in that commit. > >> if function shmem_confirm_swap confirm the entry has already brought back >> from swap by a racing thread, > The reverse: true confirms that the swap entry has not been brought back > from swap by a racing thread; false indicates that there has been a race. > >> then why call shmem_add_to_page_cache to add >> page from swapcache to pagecache again? > Adding it to pagecache again, after such a race, would set error to > -EEXIST (originating from radix_tree_insert); but we don't do that, > we add it to pagecache when it has not already been added. > > Or that's the intention: but Dave seems to have found an unexpected > exception, despite us holding the page lock across all this. > > (But if it weren't for the memcg and replace_page issues, I'd much > prefer to let shmem_add_to_page_cache discover the race as before.) > > Hugh Hi Hugh Thanks for your response. You mean the -EEXIST originating from radix_tree_insert, in radix_tree_insert: if (slot != NULL) return -EEXIST; But why slot should be NULL? if no race, the pagecache related radix tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? Regards, Chen > >> otherwise, will goto unlock and then go to repeat? where I miss? >> >> Regards, >> Chen -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx142.postini.com [74.125.245.142]) by kanga.kvack.org (Postfix) with SMTP id 850996B0062 for ; Thu, 25 Oct 2012 07:14:15 -0400 (EDT) Date: Thu, 25 Oct 2012 07:14:11 -0400 From: Dave Jones Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121025111411.GB24886@redhat.com> References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > 1149 gfp, swp_to_radix_entry(swap)); > > 1150 /* We already confirmed swap, and make no allocation */ > > 1151 VM_BUG_ON(error); > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. > > Clutching at straws, I expect this is entirely irrelevant, but: > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > So you've inserted a couple of lines for some reason (more useful > trinity behaviour, perhaps)? detritus from the recent mpol_to_str bug that I was chasing. Shouldn't be relevant... diff -durpN '--exclude-from=/home/davej/.exclude' src/git-trees/kernel/linux/mm> --- src/git-trees/kernel/linux/mm/shmem.c 2012-10-12 10:01:46.613408580 -> +++ linux-dj/mm/shmem.c 2012-10-15 12:31:32.979653309 -0400 @@ -885,13 +885,15 @@ redirty: static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) { char buffer[64]; + int ret; if (!mpol || mpol->mode == MPOL_DEFAULT) return; /* show nothing */ - mpol_to_str(buffer, sizeof(buffer), mpol, 1); - - seq_printf(seq, ",mpol=%s", buffer); + memset(buffer, 0, sizeof(buffer)); + ret = mpol_to_str(buffer, sizeof(buffer), mpol, 1); + if (ret > 0) + seq_printf(seq, ",mpol=%s", buffer); } > And have some config option I'm > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? Yes, I do have this.. -#define VM_BUG_ON(cond) BUG_ON(cond) +#define VM_BUG_ON(cond) WARN_ON(cond) because I got tired of things not going over my usb serial port when I hit them a while ago. BUG_ON is pretty unfriendly to bug finding. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx124.postini.com [74.125.245.124]) by kanga.kvack.org (Postfix) with SMTP id 9249E6B0072 for ; Thu, 25 Oct 2012 16:52:19 -0400 (EDT) Date: Thu, 25 Oct 2012 16:52:13 -0400 From: Johannes Weiner Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121025205213.GB4771@cmpxchg.org> References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > On Wed, 24 Oct 2012, Dave Jones wrote: > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > triggered this... > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > Call Trace: > > [] warn_slowpath_common+0x7f/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] shmem_getpage_gfp+0xa5c/0xa70 > > [] ? shmem_getpage_gfp+0x29e/0xa70 > > [] shmem_fault+0x4f/0xa0 > > [] __do_fault+0x71/0x5c0 > > [] ? __lock_acquire+0x306/0x1ba0 > > [] ? local_clock+0x89/0xa0 > > [] handle_pte_fault+0x97/0xae0 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? delay_tsc+0xae/0x120 > > [] ? __const_udelay+0x28/0x30 > > [] handle_mm_fault+0x289/0x350 > > [] __do_page_fault+0x18e/0x530 > > [] ? local_clock+0x89/0xa0 > > [] ? get_parent_ip+0x11/0x50 > > [] ? get_parent_ip+0x11/0x50 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? rcu_user_exit+0xc9/0xf0 > > [] do_page_fault+0x2b/0x50 > > [] page_fault+0x28/0x30 > > [] ? copy_user_enhanced_fast_string+0x9/0x20 > > [] ? sys_futimesat+0x41/0xe0 > > [] ? syscall_trace_enter+0x25/0x2c0 > > [] ? tracesys+0x7e/0xe6 > > [] tracesys+0xe1/0xe6 > > > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > 1149 gfp, swp_to_radix_entry(swap)); > > 1150 /* We already confirmed swap, and make no allocation */ > > 1151 VM_BUG_ON(error); > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. Could it be concurrent truncation clearing out the entry between shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see anything preventing that. The empty slot would not match the expected swap entry this call passes in and the returned error would be -ENOENT. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id 0F70D6B0072 for ; Thu, 25 Oct 2012 17:27:47 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so3668133ied.14 for ; Thu, 25 Oct 2012 14:27:46 -0700 (PDT) Date: Thu, 25 Oct 2012 14:27:41 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <508912B0.7080805@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> <508912B0.7080805@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Ni zhan Chen Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Ni zhan Chen wrote: > On 10/25/2012 02:59 PM, Hugh Dickins wrote: > > On Thu, 25 Oct 2012, Ni zhan Chen wrote: > > > > > > I think it maybe caused by your commit [d189922862e03ce: shmem: fix > > > negative > > > rss in memcg memory.stat], one question: > > Well, yes, I added the VM_BUG_ON in that commit. > > > > > if function shmem_confirm_swap confirm the entry has already brought back > > > from swap by a racing thread, > > The reverse: true confirms that the swap entry has not been brought back > > from swap by a racing thread; false indicates that there has been a race. > > > > > then why call shmem_add_to_page_cache to add > > > page from swapcache to pagecache again? > > Adding it to pagecache again, after such a race, would set error to > > -EEXIST (originating from radix_tree_insert); but we don't do that, > > we add it to pagecache when it has not already been added. > > > > Or that's the intention: but Dave seems to have found an unexpected > > exception, despite us holding the page lock across all this. > > > > (But if it weren't for the memcg and replace_page issues, I'd much > > prefer to let shmem_add_to_page_cache discover the race as before.) > > > > Hugh > > Hi Hugh > > Thanks for your response. You mean the -EEXIST originating from > radix_tree_insert, in radix_tree_insert: > if (slot != NULL) > return -EEXIST; > But why slot should be NULL? if no race, the pagecache related radix tree > entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? I was describing what would happen in a case that should not exist, that you had thought the common case. In actuality, the entry should not be NULL, it should be as you say there. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx116.postini.com [74.125.245.116]) by kanga.kvack.org (Postfix) with SMTP id 8C0B76B0072 for ; Thu, 25 Oct 2012 17:28:59 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so3669620ied.14 for ; Thu, 25 Oct 2012 14:28:59 -0700 (PDT) Date: Thu, 25 Oct 2012 14:28:59 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121025111411.GB24886@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121025111411.GB24886@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Dave Jones wrote: > On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > > Clutching at straws, I expect this is entirely irrelevant, but: > > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > > > So you've inserted a couple of lines for some reason (more useful > > trinity behaviour, perhaps)? > > detritus from the recent mpol_to_str bug that I was chasing. > Shouldn't be relevant... > > > And have some config option I'm > > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? > > Yes, I do have this.. > > -#define VM_BUG_ON(cond) BUG_ON(cond) > +#define VM_BUG_ON(cond) WARN_ON(cond) > > because I got tired of things not going over my usb serial port when I hit them > a while ago. BUG_ON is pretty unfriendly to bug finding. Makes sense, thanks for the reassurance. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx197.postini.com [74.125.245.197]) by kanga.kvack.org (Postfix) with SMTP id 48CCE6B0072 for ; Thu, 25 Oct 2012 17:48:39 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so3695746ied.14 for ; Thu, 25 Oct 2012 14:48:38 -0700 (PDT) Date: Thu, 25 Oct 2012 14:48:39 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121025205213.GB4771@cmpxchg.org> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121025205213.GB4771@cmpxchg.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Johannes Weiner wrote: > On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > On Wed, 24 Oct 2012, Dave Jones wrote: > > > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > > triggered this... > > > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > > 1149 gfp, swp_to_radix_entry(swap)); > > > 1150 /* We already confirmed swap, and make no allocation */ > > > 1151 VM_BUG_ON(error); > > > 1152 } > > > > That's very surprising. Easy enough to handle an error there, but > > of course I made it a VM_BUG_ON because it violates my assumptions: > > I rather need to understand how this can be, and I've no idea. > > Could it be concurrent truncation clearing out the entry between > shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see > anything preventing that. > > The empty slot would not match the expected swap entry this call > passes in and the returned error would be -ENOENT. Excellent notion, many thanks Hannes, I believe you've got it. I've hit that truncation problem in swapoff (and commented on it in shmem_unuse_inode), but never hit it or considered it here. I think of the page lock as holding it stable, but truncation's free_swap_and_cache only does a trylock on the swapcache page, so we're not secured against that possibility. So I'd like to change it to VM_BUG_ON(error && error != -ENOENT), but there's a little tidying up to do in the -ENOENT case, which needs more thought. A delete_from_swap_cache(page) - though we can be lazy and leave that to reclaim for such a rare occurrence - and probably a mem_cgroup uncharge; but the memcg hooks are always the hardest to get right, I'll have think about that one carefully. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx196.postini.com [74.125.245.196]) by kanga.kvack.org (Postfix) with SMTP id 9DC246B0071 for ; Thu, 25 Oct 2012 21:48:29 -0400 (EDT) Received: by mail-da0-f41.google.com with SMTP id i14so1160662dad.14 for ; Thu, 25 Oct 2012 18:48:28 -0700 (PDT) Message-ID: <5089EBE1.1050009@gmail.com> Date: Fri, 26 Oct 2012 09:48:17 +0800 From: Ni zhan Chen MIME-Version: 1.0 Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> <508912B0.7080805@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 10/26/2012 05:27 AM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Ni zhan Chen wrote: >> On 10/25/2012 02:59 PM, Hugh Dickins wrote: >>> On Thu, 25 Oct 2012, Ni zhan Chen wrote: >>>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix >>>> negative >>>> rss in memcg memory.stat], one question: >>> Well, yes, I added the VM_BUG_ON in that commit. >>> >>>> if function shmem_confirm_swap confirm the entry has already brought back >>>> from swap by a racing thread, >>> The reverse: true confirms that the swap entry has not been brought back >>> from swap by a racing thread; false indicates that there has been a race. >>> >>>> then why call shmem_add_to_page_cache to add >>>> page from swapcache to pagecache again? >>> Adding it to pagecache again, after such a race, would set error to >>> -EEXIST (originating from radix_tree_insert); but we don't do that, >>> we add it to pagecache when it has not already been added. >>> >>> Or that's the intention: but Dave seems to have found an unexpected >>> exception, despite us holding the page lock across all this. >>> >>> (But if it weren't for the memcg and replace_page issues, I'd much >>> prefer to let shmem_add_to_page_cache discover the race as before.) >>> >>> Hugh >> Hi Hugh >> >> Thanks for your response. You mean the -EEXIST originating from >> radix_tree_insert, in radix_tree_insert: >> if (slot != NULL) >> return -EEXIST; >> But why slot should be NULL? if no race, the pagecache related radix tree >> entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? > I was describing what would happen in a case that should not exist, > that you had thought the common case. In actuality, the entry should > not be NULL, it should be as you say there. Thanks for your patience. So in the common case, the entry should be the value I mentioned, then why has this check? if (slot != NULL) return -EEXIST; the common case will return -EEXIST. > > Hugh > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx140.postini.com [74.125.245.140]) by kanga.kvack.org (Postfix) with SMTP id 902A56B0072 for ; Thu, 25 Oct 2012 22:15:17 -0400 (EDT) Received: by mail-ia0-f169.google.com with SMTP id h37so2318852iak.14 for ; Thu, 25 Oct 2012 19:15:16 -0700 (PDT) Message-ID: <5089F22D.70007@gmail.com> Date: Fri, 26 Oct 2012 10:15:09 +0800 From: Ni zhan Chen MIME-Version: 1.0 Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <20121025205213.GB4771@cmpxchg.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Johannes Weiner , Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 10/26/2012 05:48 AM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Johannes Weiner wrote: >> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: >>> On Wed, 24 Oct 2012, Dave Jones wrote: >>> >>>> Machine under significant load (4gb memory used, swap usage fluctuating) >>>> triggered this... >>>> >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >>>> >>>> 1148 error = shmem_add_to_page_cache(page, mapping, index, >>>> 1149 gfp, swp_to_radix_entry(swap)); >>>> 1150 /* We already confirmed swap, and make no allocation */ >>>> 1151 VM_BUG_ON(error); >>>> 1152 } >>> That's very surprising. Easy enough to handle an error there, but >>> of course I made it a VM_BUG_ON because it violates my assumptions: >>> I rather need to understand how this can be, and I've no idea. >> Could it be concurrent truncation clearing out the entry between >> shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see >> anything preventing that. >> >> The empty slot would not match the expected swap entry this call >> passes in and the returned error would be -ENOENT. > Excellent notion, many thanks Hannes, I believe you've got it. > > I've hit that truncation problem in swapoff (and commented on it > in shmem_unuse_inode), but never hit it or considered it here. > I think of the page lock as holding it stable, but truncation's > free_swap_and_cache only does a trylock on the swapcache page, > so we're not secured against that possibility. Hi Hugh, Even though free_swap_and_cache only does a trylock on the swapcache page, but it doens't call delete_from_swap_cache and the associated entry should still be there, I am interested in what you have already introduce to protect it? > > So I'd like to change it to VM_BUG_ON(error && error != -ENOENT), > but there's a little tidying up to do in the -ENOENT case, which Do you mean radix_tree_insert will return -ENOENT if the associated entry is not present? Why I can't find this return value in the function radix_tree_insert? > needs more thought. A delete_from_swap_cache(page) - though we > can be lazy and leave that to reclaim for such a rare occurrence - > and probably a mem_cgroup uncharge; but the memcg hooks are always > the hardest to get right, I'll have think about that one carefully. > > Hugh > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx178.postini.com [74.125.245.178]) by kanga.kvack.org (Postfix) with SMTP id 14C946B004D for ; Thu, 1 Nov 2012 15:10:57 -0400 (EDT) Date: Thu, 1 Nov 2012 15:10:52 -0400 From: Dave Jones Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121101191052.GA5884@redhat.com> References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > On Wed, 24 Oct 2012, Dave Jones wrote: > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > triggered this... > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > Call Trace: > > [] warn_slowpath_common+0x7f/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] shmem_getpage_gfp+0xa5c/0xa70 > > [] ? shmem_getpage_gfp+0x29e/0xa70 > > [] shmem_fault+0x4f/0xa0 > > [] __do_fault+0x71/0x5c0 > > [] ? __lock_acquire+0x306/0x1ba0 > > [] ? local_clock+0x89/0xa0 > > [] handle_pte_fault+0x97/0xae0 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? delay_tsc+0xae/0x120 > > [] ? __const_udelay+0x28/0x30 > > [] handle_mm_fault+0x289/0x350 > > [] __do_page_fault+0x18e/0x530 > > [] ? local_clock+0x89/0xa0 > > [] ? get_parent_ip+0x11/0x50 > > [] ? get_parent_ip+0x11/0x50 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? rcu_user_exit+0xc9/0xf0 > > [] do_page_fault+0x2b/0x50 > > [] page_fault+0x28/0x30 > > [] ? copy_user_enhanced_fast_string+0x9/0x20 > > [] ? sys_futimesat+0x41/0xe0 > > [] ? syscall_trace_enter+0x25/0x2c0 > > [] ? tracesys+0x7e/0xe6 > > [] tracesys+0xe1/0xe6 > > > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > 1149 gfp, swp_to_radix_entry(swap)); > > 1150 /* We already confirmed swap, and make no allocation */ > > 1151 VM_BUG_ON(error); > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. I just noticed we had a user report hitting this same warning, but with a different trace.. : [] warn_slowpath_common+0x7f/0xc0 : [] warn_slowpath_null+0x1a/0x20 : [] shmem_getpage_gfp+0x7f3/0x830 : [] ? vma_adjust+0x3ed/0x620 : [] shmem_file_aio_read+0x1f2/0x380 : [] do_sync_read+0xa7/0xe0 : [] vfs_read+0xa9/0x180 : [] sys_read+0x4a/0x90 : [] system_call_fastpath+0x16/0x1b Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 8868F6B0075 for ; Thu, 1 Nov 2012 19:03:44 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so5317236ied.14 for ; Thu, 01 Nov 2012 16:03:43 -0700 (PDT) Date: Thu, 1 Nov 2012 16:03:40 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121101191052.GA5884@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 1 Nov 2012, Dave Jones wrote: > On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > On Wed, 24 Oct 2012, Dave Jones wrote: > > > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > > triggered this... > > > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > > 1149 gfp, swp_to_radix_entry(swap)); > > > 1150 /* We already confirmed swap, and make no allocation */ > > > 1151 VM_BUG_ON(error); > > > 1152 } > > > > That's very surprising. Easy enough to handle an error there, but > > of course I made it a VM_BUG_ON because it violates my assumptions: > > I rather need to understand how this can be, and I've no idea. > > I just noticed we had a user report hitting this same warning, but > with a different trace.. > > : [] warn_slowpath_common+0x7f/0xc0 > : [] warn_slowpath_null+0x1a/0x20 > : [] shmem_getpage_gfp+0x7f3/0x830 > : [] ? vma_adjust+0x3ed/0x620 > : [] shmem_file_aio_read+0x1f2/0x380 > : [] do_sync_read+0xa7/0xe0 > : [] vfs_read+0xa9/0x180 > : [] sys_read+0x4a/0x90 > : [] system_call_fastpath+0x16/0x1b Equally explicable by Hannes's hypothesis; but useful supporting evidence, thank you. Except... earlier in the thread you explained how you hacked #define VM_BUG_ON(cond) WARN_ON(cond) to get this to come out as a warning instead of a bug, and now it looks as if "a user" has here done the same. Which is very much a user's right, of course; but does make me wonder whether that user might actually be davej ;) Never mind, whatever, it's more justification for the fix - which I've honestly not forgotten, but somehow not got around to sending (with a couple of others even longer outstanding). On its way shortly, for some unpredictable value of shortly. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx132.postini.com [74.125.245.132]) by kanga.kvack.org (Postfix) with SMTP id 793926B0068 for ; Thu, 1 Nov 2012 19:20:34 -0400 (EDT) Date: Thu, 1 Nov 2012 19:20:30 -0400 From: Dave Jones Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121101232030.GA25519@redhat.com> References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote: > > I just noticed we had a user report hitting this same warning, but > > with a different trace.. > > > > : [] warn_slowpath_common+0x7f/0xc0 > > : [] warn_slowpath_null+0x1a/0x20 > > : [] shmem_getpage_gfp+0x7f3/0x830 > > : [] ? vma_adjust+0x3ed/0x620 > > : [] shmem_file_aio_read+0x1f2/0x380 > > : [] do_sync_read+0xa7/0xe0 > > : [] vfs_read+0xa9/0x180 > > : [] sys_read+0x4a/0x90 > > : [] system_call_fastpath+0x16/0x1b > > Equally explicable by Hannes's hypothesis; > but useful supporting evidence, thank you. > > Except... earlier in the thread you explained how you hacked > #define VM_BUG_ON(cond) WARN_ON(cond) > to get this to come out as a warning instead of a bug, > and now it looks as if "a user" has here done the same. > > Which is very much a user's right, of course; but does > make me wonder whether that user might actually be davej ;) indirectly. I made the same change in the Fedora kernel a while ago to test a hypothesis that we weren't getting any VM_BUG_ON reports. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx125.postini.com [74.125.245.125]) by kanga.kvack.org (Postfix) with SMTP id 72C0F6B004D for ; Thu, 1 Nov 2012 19:48:46 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so5364784ied.14 for ; Thu, 01 Nov 2012 16:48:45 -0700 (PDT) Date: Thu, 1 Nov 2012 16:48:41 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121101232030.GA25519@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 1 Nov 2012, Dave Jones wrote: > On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote: > > > > Except... earlier in the thread you explained how you hacked > > #define VM_BUG_ON(cond) WARN_ON(cond) > > to get this to come out as a warning instead of a bug, > > and now it looks as if "a user" has here done the same. > > > > Which is very much a user's right, of course; but does > > make me wonder whether that user might actually be davej ;) > > indirectly. I made the same change in the Fedora kernel a while ago > to test a hypothesis that we weren't getting any VM_BUG_ON reports. Fedora turns on CONFIG_DEBUG_VM? All mm developers should thank you for the wider testing exposure; but I'm not so sure that Fedora users should thank you for turning it on - really it's for mm developers to wrap around !assertions or more expensive checks (e.g. checking calls) in their development. Or did I read a few months ago that some change had been made to such definitions, and VM_BUG_ON(contents) are evaluated even when the config option is off? I do hope I'm mistaken on that. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx168.postini.com [74.125.245.168]) by kanga.kvack.org (Postfix) with SMTP id 5830B6B004D for ; Thu, 1 Nov 2012 21:43:42 -0400 (EDT) Date: Thu, 1 Nov 2012 21:43:36 -0400 From: Dave Jones Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121102014336.GA1727@redhat.com> References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote: > On Thu, 1 Nov 2012, Dave Jones wrote: > > On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote: > > > > > > Except... earlier in the thread you explained how you hacked > > > #define VM_BUG_ON(cond) WARN_ON(cond) > > > to get this to come out as a warning instead of a bug, > > > and now it looks as if "a user" has here done the same. > > > > > > Which is very much a user's right, of course; but does > > > make me wonder whether that user might actually be davej ;) > > > > indirectly. I made the same change in the Fedora kernel a while ago > > to test a hypothesis that we weren't getting any VM_BUG_ON reports. > > Fedora turns on CONFIG_DEBUG_VM? Yes. > All mm developers should thank you for the wider testing exposure; > but I'm not so sure that Fedora users should thank you for turning > it on - really it's for mm developers to wrap around !assertions or > more expensive checks (e.g. checking calls) in their development. The last time I did some benchmarking the impact wasn't as ridiculous as say lockdep, or spinlock debug. Maybe the benchmarks I was using weren't pushing the VM very hard, but it seemed to me that the value in getting info in potential problems early was higher than a small performance increase. > Or did I read a few months ago that some change had been made to > such definitions, and VM_BUG_ON(contents) are evaluated even when > the config option is off? I do hope I'm mistaken on that. Pretty sure that isn't the case. I remember Andrew chastising people a few times for putting checks in VM_BUG_ON's that needed to stay around even when the config option was off. Perhaps you were thinking of one of those incidents ? Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id 35CF46B0044 for ; Fri, 2 Nov 2012 19:26:00 -0400 (EDT) Received: by mail-ie0-f169.google.com with SMTP id 10so7148628ied.14 for ; Fri, 02 Nov 2012 16:25:59 -0700 (PDT) Date: Fri, 2 Nov 2012 16:26:03 -0700 (PDT) From: Hugh Dickins Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121102014336.GA1727@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu, 1 Nov 2012, Dave Jones wrote: > On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote: > > > > Fedora turns on CONFIG_DEBUG_VM? > > Yes. > > > All mm developers should thank you for the wider testing exposure; > > but I'm not so sure that Fedora users should thank you for turning > > it on - really it's for mm developers to wrap around !assertions or > > more expensive checks (e.g. checking calls) in their development. > > The last time I did some benchmarking the impact wasn't as ridiculous > as say lockdep, or spinlock debug. I think you're safe to assume that (outside of an individual developer's private tree) it will never be nearly as heavy as lockdep or debug pagealloc. I hadn't thought of spinlock debug as a heavy one, but yes, I guess it would be heavier than almost all VM_BUG_ON()s. > Maybe the benchmarks I was using > weren't pushing the VM very hard, but it seemed to me that the value > in getting info in potential problems early was higher than a small > performance increase. We thank you. I may have been over-estimating how much we put inside those VM_BUG_ON()s, sorry. Just so long as you're aware that there's a danger that one day we might slip something heavier in there. Those few explicit #ifdef CONFIG_DEBUG_VMs sometimes found in mm/ are probably the worst: you might want to check on the current crop. > > > Or did I read a few months ago that some change had been made to > > such definitions, and VM_BUG_ON(contents) are evaluated even when > > the config option is off? I do hope I'm mistaken on that. > > Pretty sure that isn't the case. I remember Andrew chastising people > a few times for putting checks in VM_BUG_ON's that needed to stay around > even when the config option was off. Perhaps you were thinking of one > of those incidents ? Avoiding side-effects in BUG_ON and VM_BUG_ON. Yes, that comes up from time to time, and I'm a believer on that. I think the discussion I'm mis/remembering sprung out of one of those: someone was surprised by the disassembly they found when it was configured off. The correct answer is to try it for myself and see. Not today. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id 25F7E6B004D for ; Mon, 5 Nov 2012 20:32:47 -0500 (EST) Received: by mail-ie0-f169.google.com with SMTP id 10so11318678ied.14 for ; Mon, 05 Nov 2012 17:32:46 -0800 (PST) Date: Mon, 5 Nov 2012 17:32:41 -0800 (PST) From: Hugh Dickins Subject: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Dave Jones , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora has converted to WARNING) in shmem_getpage_gfp(): WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] shmem_getpage_gfp+0xa5c/0xa70 [] shmem_fault+0x4f/0xa0 [] __do_fault+0x71/0x5c0 [] handle_pte_fault+0x97/0xae0 [] handle_mm_fault+0x289/0x350 [] __do_page_fault+0x18e/0x530 [] do_page_fault+0x2b/0x50 [] page_fault+0x28/0x30 [] tracesys+0xe1/0xe6 Thanks to Johannes for pointing to truncation: free_swap_and_cache() only does a trylock on the page, so the page lock we've held since before confirming swap is not enough to protect against truncation. What cleanup is needed in this case? Just delete_from_swap_cache(), which takes care of the memcg uncharge. Reported-by: Dave Jones Hypothesis-by: Johannes Weiner Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org --- mm/shmem.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) --- 3.7-rc4/mm/shmem.c 2012-10-14 16:16:58.361309122 -0700 +++ linux/mm/shmem.c 2012-11-01 14:31:04.288185742 -0700 @@ -1145,8 +1145,22 @@ repeat: if (!error) { error = shmem_add_to_page_cache(page, mapping, index, gfp, swp_to_radix_entry(swap)); - /* We already confirmed swap, and make no allocation */ - VM_BUG_ON(error); + /* + * We already confirmed swap under page lock, and make + * no memory allocation here, so usually no possibility + * of error; but free_swap_and_cache() only trylocks a + * page, so it is just possible that the entry has been + * truncated or holepunched since swap was confirmed. + * shmem_undo_range() will have done some of the + * unaccounting, now delete_from_swap_cache() will do + * the rest (including mem_cgroup_uncharge_swapcache). + * Reset swap.val? No, leave it so "failed" goes back to + * "repeat": reading a hole and writing should succeed. + */ + if (error) { + VM_BUG_ON(error != -ENOENT); + delete_from_swap_cache(page); + } } if (error) goto failed; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id A4D776B004D for ; Tue, 6 Nov 2012 08:54:10 -0500 (EST) Date: Tue, 6 Nov 2012 08:54:02 -0500 From: Dave Jones Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Message-ID: <20121106135402.GA3543@redhat.com> References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: > - /* We already confirmed swap, and make no allocation */ > - VM_BUG_ON(error); > + /* > + * We already confirmed swap under page lock, and make > + * no memory allocation here, so usually no possibility > + * of error; but free_swap_and_cache() only trylocks a > + * page, so it is just possible that the entry has been > + * truncated or holepunched since swap was confirmed. > + * shmem_undo_range() will have done some of the > + * unaccounting, now delete_from_swap_cache() will do > + * the rest (including mem_cgroup_uncharge_swapcache). > + * Reset swap.val? No, leave it so "failed" goes back to > + * "repeat": reading a hole and writing should succeed. > + */ > + if (error) { > + VM_BUG_ON(error != -ENOENT); > + delete_from_swap_cache(page); > + } > } I ran with this overnight, and still hit the (new!) VM_BUG_ON Perhaps we should print out what 'error' was too ? I'll rebuild with that.. ------------[ cut here ]------------ WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Hardware name: 2012 Client Platform Modules linked in: fuse ipt_ULOG scsi_transport_iscsi binfmt_misc dn_rtmsg nfnetlink nfc caif_socket caif af_802154 phonet af_rxrpc can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 p8022 psnap llc ax25 lockd sunrpc bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables gspca_ov519 gspca_main videodev kvm_intel usb_debug kvm crc32c_intel ghash_clmulni_intel microcode pcspkr i2c_i801 e1000e uinput i915 video i2c_algo_bit drm_kms_helper drm i2c_core Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] shmem_getpage_gfp+0xa5c/0xa70 [] ? shmem_getpage_gfp+0x29e/0xa70 [] shmem_fault+0x4f/0xa0 [] __do_fault+0x71/0x5c0 [] ? __lock_acquire+0x306/0x1ba0 [] ? local_clock+0x89/0xa0 [] handle_pte_fault+0x97/0xae0 [] ? sub_preempt_count+0x79/0xd0 [] ? delay_tsc+0xae/0x120 [] ? __const_udelay+0x28/0x30 [] handle_mm_fault+0x289/0x350 [] __do_page_fault+0x18e/0x530 [] ? getname_flags.part.32+0x30/0x150 [] ? getname_flags.part.32+0x30/0x150 [] ? set_track+0x8c/0x1a0 [] ? __slab_alloc+0x531/0x59e [] ? trace_hardirqs_on_caller+0x15d/0x1e0 [] ? rcu_user_exit+0xc9/0xf0 [] do_page_fault+0x2b/0x50 [] page_fault+0x28/0x30 [] ? strncpy_from_user+0x6c/0x120 [] getname_flags.part.32+0x86/0x150 [] getname+0x3a/0x60 [] sys_symlinkat+0x24/0x90 [] ? tracesys+0x7e/0xe6 [] tracesys+0xe1/0xe6 ---[ end trace 4ba438264ea16e97 ]--- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id 2A7246B0044 for ; Tue, 6 Nov 2012 18:48:25 -0500 (EST) Received: by mail-ie0-f169.google.com with SMTP id 10so1900763ied.14 for ; Tue, 06 Nov 2012 15:48:24 -0800 (PST) Date: Tue, 6 Nov 2012 15:48:20 -0800 (PST) From: Hugh Dickins Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <20121106135402.GA3543@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones Cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, 6 Nov 2012, Dave Jones wrote: > On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: > > > - /* We already confirmed swap, and make no allocation */ > > - VM_BUG_ON(error); > > + /* > > + * We already confirmed swap under page lock, and make > > + * no memory allocation here, so usually no possibility > > + * of error; but free_swap_and_cache() only trylocks a > > + * page, so it is just possible that the entry has been > > + * truncated or holepunched since swap was confirmed. > > + * shmem_undo_range() will have done some of the > > + * unaccounting, now delete_from_swap_cache() will do > > + * the rest (including mem_cgroup_uncharge_swapcache). > > + * Reset swap.val? No, leave it so "failed" goes back to > > + * "repeat": reading a hole and writing should succeed. > > + */ > > + if (error) { > > + VM_BUG_ON(error != -ENOENT); > > + delete_from_swap_cache(page); > > + } > > } > > I ran with this overnight, Thanks a lot... > and still hit the (new!) VM_BUG_ON ... but that's even more surprising than your original report. > > Perhaps we should print out what 'error' was too ? I'll rebuild with that.. Thanks; though I thought the error was going to turn out too boring, and was preparing a debug patch for you to show the expected and found values too. But then got very puzzled... > > ------------[ cut here ]------------ > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > Hardware name: 2012 Client Platform > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 That's the very same line number as in your original report, despite the long comment which the patch adds. Are you sure that kernel was built with the patch in? I wouldn't usually question you, but I'm going mad trying to understand how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that line, and when I was preparing the debug patch, I was thinking that an error from shmem_radix_tree_replace could also be -EEXIST, for when a different something rather than nothing is found [*]. But that's not the case, shmem_radix_tree_replace returns either 0 or -ENOENT. So if error != -ENOENT, that means shmem_add_to_page_cache went the radix_tree_insert route instead of the shmem_radix_tree_replace route; which means that its 'expected' is NULL, so swp_to_radix_entry(swap) is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt the radix_tree might be, I do not understand the new VM_BUG_ON firing. Please tell me it was the wrong kernel! Hugh [*] But in thinking it over, I realize that if shmem_radix_tree_replace had returned -EEXIST for the "wrong something" case, I would have been wrong to BUG on that; because just as truncation could remove an entry, something else could immediately after instantiate a new page there. So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's not saying what I had intended to say with it, and would have been wrong to say that anyway. It just looks stupid to me now, rather like inserting a VM_BUG_ON(false) - but that does become interesting when you report that you've hit it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx140.postini.com [74.125.245.140]) by kanga.kvack.org (Postfix) with SMTP id 1680A6B004D for ; Wed, 7 Nov 2012 17:38:38 -0500 (EST) Date: Wed, 7 Nov 2012 17:38:30 -0500 From: Dave Jones Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Message-ID: <20121107223830.GA12561@redhat.com> References: <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, Nov 06, 2012 at 03:48:20PM -0800, Hugh Dickins wrote: > > ------------[ cut here ]------------ > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > Hardware name: 2012 Client Platform > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 > > That's the very same line number as in your original report, despite > the long comment which the patch adds. Are you sure that kernel was > built with the patch in? I just changed the code by hand, and opted not to paste the comment in. It is plausible that I built that kernel and forgot to reboot into it, but I'm 99.9% sure that that wasn't the case. Unfortunatly I can't check immediately, as that machine for reasons unknown no longer wants to get past the BIOS POST check. I'll see if I can reproduce it on a different test box until I get that one back up. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx144.postini.com [74.125.245.144]) by kanga.kvack.org (Postfix) with SMTP id 8279B6B0070 for ; Tue, 13 Nov 2012 20:36:39 -0500 (EST) Received: by mail-yh0-f41.google.com with SMTP id 47so301967yhr.14 for ; Tue, 13 Nov 2012 17:36:38 -0800 (PST) Date: Tue, 13 Nov 2012 17:36:33 -0800 (PST) From: Hugh Dickins Subject: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON fix In-Reply-To: <20121107223830.GA12561@redhat.com> Message-ID: References: <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <20121107223830.GA12561@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Dave Jones , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org We're still hoping to hear back from Dave Jones: but either way, please fold this patch into the earlier fix for 3.7 and -stable. Remove its VM_BUG_ON: because either it's as I believe, a tautology which cannot happen, and does not assert what I'd intended when I put it in, and would even be wrong if it did (a non-NULL entry can validly materialize there); or Dave actually hit it on his updated kernel, in which case more research will be needed, but for upstream we do not want a user to BUG there. Signed-off-by: Hugh Dickins --- mm/shmem.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- mmotm/mm/shmem.c 2012-11-09 09:43:46.908046342 -0800 +++ linux/mm/shmem.c 2012-11-13 17:16:38.532528959 -0800 @@ -1158,10 +1158,8 @@ repeat: * Reset swap.val? No, leave it so "failed" goes back to * "repeat": reading a hole and writing should succeed. */ - if (error) { - VM_BUG_ON(error != -ENOENT); + if (error) delete_from_swap_cache(page); - } } if (error) goto failed; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx200.postini.com [74.125.245.200]) by kanga.kvack.org (Postfix) with SMTP id C2C446B002B for ; Tue, 13 Nov 2012 22:07:16 -0500 (EST) Received: by mail-ie0-f169.google.com with SMTP id 10so14627304ied.14 for ; Tue, 13 Nov 2012 19:07:16 -0800 (PST) Message-ID: <50A30ADD.9000209@gmail.com> Date: Wed, 14 Nov 2012 11:07:09 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/07/2012 07:48 AM, Hugh Dickins wrote: > On Tue, 6 Nov 2012, Dave Jones wrote: >> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: >> >> > - /* We already confirmed swap, and make no allocation */ >> > - VM_BUG_ON(error); >> > + /* >> > + * We already confirmed swap under page lock, and make >> > + * no memory allocation here, so usually no possibility >> > + * of error; but free_swap_and_cache() only trylocks a >> > + * page, so it is just possible that the entry has been >> > + * truncated or holepunched since swap was confirmed. >> > + * shmem_undo_range() will have done some of the >> > + * unaccounting, now delete_from_swap_cache() will do >> > + * the rest (including mem_cgroup_uncharge_swapcache). >> > + * Reset swap.val? No, leave it so "failed" goes back to >> > + * "repeat": reading a hole and writing should succeed. >> > + */ >> > + if (error) { >> > + VM_BUG_ON(error != -ENOENT); >> > + delete_from_swap_cache(page); >> > + } >> > } >> >> I ran with this overnight, > Thanks a lot... > >> and still hit the (new!) VM_BUG_ON > ... but that's even more surprising than your original report. > >> Perhaps we should print out what 'error' was too ? I'll rebuild with that.. > Thanks; though I thought the error was going to turn out too boring, > and was preparing a debug patch for you to show the expected and found > values too. But then got very puzzled... > >> ------------[ cut here ]------------ >> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >> Hardware name: 2012 Client Platform >> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 > That's the very same line number as in your original report, despite > the long comment which the patch adds. Are you sure that kernel was > built with the patch in? > > I wouldn't usually question you, but I'm going mad trying to understand > how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that > line, and when I was preparing the debug patch, I was thinking that an > error from shmem_radix_tree_replace could also be -EEXIST, for when a > different something rather than nothing is found [*]. But that's not > the case, shmem_radix_tree_replace returns either 0 or -ENOENT. > > So if error != -ENOENT, that means shmem_add_to_page_cache went the > radix_tree_insert route instead of the shmem_radix_tree_replace route; > which means that its 'expected' is NULL, so swp_to_radix_entry(swap) > is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt > the radix_tree might be, I do not understand the new VM_BUG_ON firing. > > Please tell me it was the wrong kernel! > Hugh > > [*] But in thinking it over, I realize that if shmem_radix_tree_replace > had returned -EEXIST for the "wrong something" case, I would have been > wrong to BUG on that; because just as truncation could remove an entry, > something else could immediately after instantiate a new page there. Hi Hugh, As you said, swp_to_radix_entry() does an "| 2", so even if truncation could remove an entry and something else could immediately after instantiate a new page there, but the expected parameter will not be NULL, the result is radix_tree_insert will not be called and shmem_add_to_page_cache will not return -EEXIST, then why trigger BUG_ON ? Regards, Jaegeuk > So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's > not saying what I had intended to say with it, and would have been > wrong to say that anyway. It just looks stupid to me now, rather > like inserting a VM_BUG_ON(false) - but that does become interesting > when you report that you've hit it. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx110.postini.com [74.125.245.110]) by kanga.kvack.org (Postfix) with SMTP id D6E516B0072 for ; Tue, 13 Nov 2012 22:50:34 -0500 (EST) Received: by mail-ye0-f169.google.com with SMTP id q11so561yen.14 for ; Tue, 13 Nov 2012 19:50:34 -0800 (PST) Date: Tue, 13 Nov 2012 19:50:25 -0800 (PST) From: Hugh Dickins Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <50A30ADD.9000209@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Jaegeuk Hanse Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, 14 Nov 2012, Jaegeuk Hanse wrote: > On 11/07/2012 07:48 AM, Hugh Dickins wrote: > > On Tue, 6 Nov 2012, Dave Jones wrote: > > > On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: > > > > > > > - /* We already confirmed swap, and make no > > > allocation */ > > > > - VM_BUG_ON(error); > > > > + /* > > > > + * We already confirmed swap under page lock, > > > and make > > > > + * no memory allocation here, so usually no > > > possibility > > > > + * of error; but free_swap_and_cache() only > > > trylocks a > > > > + * page, so it is just possible that the > > > entry has been > > > > + * truncated or holepunched since swap was > > > confirmed. > > > > + * shmem_undo_range() will have done some of > > > the > > > > + * unaccounting, now delete_from_swap_cache() > > > will do > > > > + * the rest (including > > > mem_cgroup_uncharge_swapcache). > > > > + * Reset swap.val? No, leave it so "failed" > > > goes back to > > > > + * "repeat": reading a hole and writing > > > should succeed. > > > > + */ > > > > + if (error) { > > > > + VM_BUG_ON(error != -ENOENT); > > > > + delete_from_swap_cache(page); > > > > + } > > > > } > > > > > > I ran with this overnight, > > Thanks a lot... > > > > > and still hit the (new!) VM_BUG_ON > > ... but that's even more surprising than your original report. > > > > > Perhaps we should print out what 'error' was too ? I'll rebuild with > > > that.. > > Thanks; though I thought the error was going to turn out too boring, > > and was preparing a debug patch for you to show the expected and found > > values too. But then got very puzzled... > > > > > ------------[ cut here ]------------ > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > Hardware name: 2012 Client Platform > > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 > > That's the very same line number as in your original report, despite > > the long comment which the patch adds. Are you sure that kernel was > > built with the patch in? > > > > I wouldn't usually question you, but I'm going mad trying to understand > > how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that > > line, and when I was preparing the debug patch, I was thinking that an > > error from shmem_radix_tree_replace could also be -EEXIST, for when a > > different something rather than nothing is found [*]. But that's not > > the case, shmem_radix_tree_replace returns either 0 or -ENOENT. > > > > So if error != -ENOENT, that means shmem_add_to_page_cache went the > > radix_tree_insert route instead of the shmem_radix_tree_replace route; > > which means that its 'expected' is NULL, so swp_to_radix_entry(swap) > > is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt > > the radix_tree might be, I do not understand the new VM_BUG_ON firing. > > > > Please tell me it was the wrong kernel! > > Hugh > > > > [*] But in thinking it over, I realize that if shmem_radix_tree_replace > > had returned -EEXIST for the "wrong something" case, I would have been > > wrong to BUG on that; because just as truncation could remove an entry, > > something else could immediately after instantiate a new page there. > > Hi Hugh, > > As you said, swp_to_radix_entry() does an "| 2", so even if truncation could > remove an entry and something else could immediately after instantiate a new > page there, but the expected parameter will not be NULL, the result is > radix_tree_insert will not be called and shmem_add_to_page_cache will not > return -EEXIST, then why trigger BUG_ON ? Why insert the VM_BUG_ON? Because at the time I thought that it asserted something useful; but I was mistaken, as explained above. How can the VM_BUG_ON trigger (without stack corruption, or something of that kind)? I have no idea. We are in agreement: I now think that VM_BUG_ON is misleading and silly, and sent Andrew a further patch to remove it a just couple of hours ago. Originally I was waiting to hear further from Dave; but his test machine was giving trouble, and it occurred to me that, never mind whether he says he has hit it again, or he has not hit it again, the answer is the same: don't send that VM_BUG_ON upstream. Hugh > > Regards, > Jaegeuk > > > So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's > > not saying what I had intended to say with it, and would have been > > wrong to say that anyway. It just looks stupid to me now, rather > > like inserting a VM_BUG_ON(false) - but that does become interesting > > when you report that you've hit it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx163.postini.com [74.125.245.163]) by kanga.kvack.org (Postfix) with SMTP id B50656B0062 for ; Wed, 14 Nov 2012 01:14:47 -0500 (EST) Date: Wed, 14 Nov 2012 01:14:37 -0500 From: Dave Jones Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Message-ID: <20121114061437.GA23458@redhat.com> References: <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Jaegeuk Hanse , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote: > Originally I was waiting to hear further from Dave; but his test > machine was giving trouble, and it occurred to me that, never mind > whether he says he has hit it again, or he has not hit it again, > the answer is the same: don't send that VM_BUG_ON upstream. Sorry, I'm supposedly on vacation. That said, a replacement test box has been running tests since last Friday without hitting that case. Maybe it was the last death throes of that other machine before it gave up the ghost completely. Does sound like an awful coincidence though. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx102.postini.com [74.125.245.102]) by kanga.kvack.org (Postfix) with SMTP id 108CF6B009E for ; Wed, 14 Nov 2012 05:06:04 -0500 (EST) Received: by mail-pa0-f41.google.com with SMTP id fa10so211655pad.14 for ; Wed, 14 Nov 2012 02:06:03 -0800 (PST) Date: Wed, 14 Nov 2012 02:06:00 -0800 (PST) From: Hugh Dickins Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <20121114061437.GA23458@redhat.com> Message-ID: References: <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <20121114061437.GA23458@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones Cc: Jaegeuk Hanse , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, 14 Nov 2012, Dave Jones wrote: > On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote: > > > Originally I was waiting to hear further from Dave; but his test > > machine was giving trouble, and it occurred to me that, never mind > > whether he says he has hit it again, or he has not hit it again, > > the answer is the same: don't send that VM_BUG_ON upstream. > > Sorry, I'm supposedly on vacation. Sorry for breaking in upon that, and thank you for responding even so. > That said, a replacement test box has been running tests since last Friday > without hitting that case. Maybe it was the last death throes of > that other machine before it gave up the ghost completely. > > Does sound like an awful coincidence though. I'm still clinging to your 0.1% possibility that it was not the intended kernel running. Anyway, I'm not going to worry about it further, until we see another hit - please do keep the VM_BUG_ON in your test kernel (i.e. resist that temptation to race in from your vacation to apply today's patch!), even though the right thing for 3.7 was to remove it. Thanks, Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx113.postini.com [74.125.245.113]) by kanga.kvack.org (Postfix) with SMTP id 1B9E26B0089 for ; Thu, 15 Nov 2012 02:39:59 -0500 (EST) Received: by mail-da0-f41.google.com with SMTP id i14so604102dad.14 for ; Wed, 14 Nov 2012 23:39:58 -0800 (PST) Message-ID: <50A49C46.9040406@gmail.com> Date: Thu, 15 Nov 2012 15:39:50 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/14/2012 11:50 AM, Hugh Dickins wrote: > On Wed, 14 Nov 2012, Jaegeuk Hanse wrote: >> On 11/07/2012 07:48 AM, Hugh Dickins wrote: >>> On Tue, 6 Nov 2012, Dave Jones wrote: >>>> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: >>>> >>>> > - /* We already confirmed swap, and make no >>>> allocation */ >>>> > - VM_BUG_ON(error); >>>> > + /* >>>> > + * We already confirmed swap under page lock, >>>> and make >>>> > + * no memory allocation here, so usually no >>>> possibility >>>> > + * of error; but free_swap_and_cache() only >>>> trylocks a >>>> > + * page, so it is just possible that the >>>> entry has been >>>> > + * truncated or holepunched since swap was >>>> confirmed. >>>> > + * shmem_undo_range() will have done some of >>>> the >>>> > + * unaccounting, now delete_from_swap_cache() >>>> will do >>>> > + * the rest (including >>>> mem_cgroup_uncharge_swapcache). >>>> > + * Reset swap.val? No, leave it so "failed" >>>> goes back to >>>> > + * "repeat": reading a hole and writing >>>> should succeed. >>>> > + */ >>>> > + if (error) { >>>> > + VM_BUG_ON(error != -ENOENT); >>>> > + delete_from_swap_cache(page); >>>> > + } >>>> > } >>>> >>>> I ran with this overnight, >>> Thanks a lot... >>> >>>> and still hit the (new!) VM_BUG_ON >>> ... but that's even more surprising than your original report. >>> >>>> Perhaps we should print out what 'error' was too ? I'll rebuild with >>>> that.. >>> Thanks; though I thought the error was going to turn out too boring, >>> and was preparing a debug patch for you to show the expected and found >>> values too. But then got very puzzled... >>> >>>> ------------[ cut here ]------------ >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Hardware name: 2012 Client Platform >>>> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 >>> That's the very same line number as in your original report, despite >>> the long comment which the patch adds. Are you sure that kernel was >>> built with the patch in? >>> >>> I wouldn't usually question you, but I'm going mad trying to understand >>> how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that >>> line, and when I was preparing the debug patch, I was thinking that an >>> error from shmem_radix_tree_replace could also be -EEXIST, for when a >>> different something rather than nothing is found [*]. But that's not >>> the case, shmem_radix_tree_replace returns either 0 or -ENOENT. >>> >>> So if error != -ENOENT, that means shmem_add_to_page_cache went the >>> radix_tree_insert route instead of the shmem_radix_tree_replace route; >>> which means that its 'expected' is NULL, so swp_to_radix_entry(swap) >>> is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt >>> the radix_tree might be, I do not understand the new VM_BUG_ON firing. >>> >>> Please tell me it was the wrong kernel! >>> Hugh >>> >>> [*] But in thinking it over, I realize that if shmem_radix_tree_replace >>> had returned -EEXIST for the "wrong something" case, I would have been >>> wrong to BUG on that; because just as truncation could remove an entry, >>> something else could immediately after instantiate a new page there. >> Hi Hugh, >> >> As you said, swp_to_radix_entry() does an "| 2", so even if truncation could >> remove an entry and something else could immediately after instantiate a new >> page there, but the expected parameter will not be NULL, the result is >> radix_tree_insert will not be called and shmem_add_to_page_cache will not >> return -EEXIST, then why trigger BUG_ON ? > Why insert the VM_BUG_ON? Because at the time I thought that it > asserted something useful; but I was mistaken, as explained above. > > How can the VM_BUG_ON trigger (without stack corruption, or something > of that kind)? I have no idea. > > We are in agreement: I now think that VM_BUG_ON is misleading and silly, > and sent Andrew a further patch to remove it a just couple of hours ago. > > Originally I was waiting to hear further from Dave; but his test > machine was giving trouble, and it occurred to me that, never mind > whether he says he has hit it again, or he has not hit it again, > the answer is the same: don't send that VM_BUG_ON upstream. > > Hugh Thanks Hugh. Another question. Why the function shmem_fallocate which you add to kernel need call shmem_getpage? Regards, Jaegeuk > >> Regards, >> Jaegeuk >> >>> So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's >>> not saying what I had intended to say with it, and would have been >>> wrong to say that anyway. It just looks stupid to me now, rather >>> like inserting a VM_BUG_ON(false) - but that does become interesting >>> when you report that you've hit it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx164.postini.com [74.125.245.164]) by kanga.kvack.org (Postfix) with SMTP id 3EF436B005A for ; Thu, 15 Nov 2012 14:56:09 -0500 (EST) Received: by mail-gg0-f169.google.com with SMTP id i1so413979ggm.14 for ; Thu, 15 Nov 2012 11:56:08 -0800 (PST) Date: Thu, 15 Nov 2012 11:56:05 -0800 (PST) From: Hugh Dickins Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <50A49C46.9040406@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Jaegeuk Hanse Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Offtopic... On Thu, 15 Nov 2012, Jaegeuk Hanse wrote: > > Another question. Why the function shmem_fallocate which you add to kernel > need call shmem_getpage? Because shmem_getpage(_gfp) is where shmem's page lookup and allocation complexities are handled. I assume the question behind your question is: why does shmem actually allocate pages for its fallocate, instead of just reserving the space? I did play with just reserving the space, with more special entries in the radix_tree to note the reservations made. It should be doable for the vm_enough_memory and sbinfo->used_blocks reservations. What absolutely deterred me from taking that path was the mem_cgroup case: shmem and swap and memcg are not easy to get working right together, and nobody would thank me for complicating memcg just for shmem_fallocate. By allocating pages, the pre-existing memcg code just works; if we used reservations instead, we would have to track their memcg charges in some additional new way. I see no justification for that complication. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id 6BAB36B0070 for ; Thu, 15 Nov 2012 19:40:22 -0500 (EST) Received: by mail-pa0-f41.google.com with SMTP id fa10so1603236pad.14 for ; Thu, 15 Nov 2012 16:40:21 -0800 (PST) Message-ID: <50A58B6E.8090609@gmail.com> Date: Fri, 16 Nov 2012 08:40:14 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/16/2012 03:56 AM, Hugh Dickins wrote: > Offtopic... > > On Thu, 15 Nov 2012, Jaegeuk Hanse wrote: >> Another question. Why the function shmem_fallocate which you add to kernel >> need call shmem_getpage? > Because shmem_getpage(_gfp) is where shmem's > page lookup and allocation complexities are handled. > > I assume the question behind your question is: why does shmem actually > allocate pages for its fallocate, instead of just reserving the space? Yeah, this is what I want to know. > > I did play with just reserving the space, with more special entries in > the radix_tree to note the reservations made. It should be doable for > the vm_enough_memory and sbinfo->used_blocks reservations. > > What absolutely deterred me from taking that path was the mem_cgroup > case: shmem and swap and memcg are not easy to get working right together, > and nobody would thank me for complicating memcg just for shmem_fallocate. > > By allocating pages, the pre-existing memcg code just works; if we used > reservations instead, we would have to track their memcg charges in some > additional new way. I see no justification for that complication. Oh, I see, thanks Hugh. :-) > Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx202.postini.com [74.125.245.202]) by kanga.kvack.org (Postfix) with SMTP id 2049C6B006C for ; Fri, 16 Nov 2012 04:34:30 -0500 (EST) Received: by mail-pb0-f41.google.com with SMTP id xa7so1983104pbc.14 for ; Fri, 16 Nov 2012 01:34:29 -0800 (PST) Message-ID: <50A6089B.7010708@gmail.com> Date: Fri, 16 Nov 2012 17:34:19 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/16/2012 03:56 AM, Hugh Dickins wrote: > Offtopic... > > On Thu, 15 Nov 2012, Jaegeuk Hanse wrote: >> Another question. Why the function shmem_fallocate which you add to kernel >> need call shmem_getpage? > Because shmem_getpage(_gfp) is where shmem's > page lookup and allocation complexities are handled. > > I assume the question behind your question is: why does shmem actually > allocate pages for its fallocate, instead of just reserving the space? > > I did play with just reserving the space, with more special entries in > the radix_tree to note the reservations made. It should be doable for > the vm_enough_memory and sbinfo->used_blocks reservations. > > What absolutely deterred me from taking that path was the mem_cgroup > case: shmem and swap and memcg are not easy to get working right together, > and nobody would thank me for complicating memcg just for shmem_fallocate. > > By allocating pages, the pre-existing memcg code just works; if we used > reservations instead, we would have to track their memcg charges in some > additional new way. I see no justification for that complication. Hi Hugh Some questions about your shmem/tmpfs: misc and fallocate patchset. - Since shmem_setattr can truncate tmpfs files, why need add another similar codes in function shmem_fallocate? What's the trick? - in tmpfs: support fallocate preallocation patch changelog: "Christoph Hellwig: What for exactly? Please explain why preallocating on tmpfs would make any sense. Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or -EOPNOTSUPP] on fallocate is just ugly." Could shmem/tmpfs fallocate prevent one process truncate the file which the second process mmap() and get SIGBUS when the second process access mmap but out of current size of file? Regards, Jaegeuk > Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id 518FA6B005A for ; Fri, 16 Nov 2012 23:48:48 -0500 (EST) Received: by mail-gh0-f169.google.com with SMTP id r1so709637ghr.14 for ; Fri, 16 Nov 2012 20:48:47 -0800 (PST) Date: Fri, 16 Nov 2012 20:48:46 -0800 (PST) From: Hugh Dickins Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <50A6089B.7010708@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> <50A6089B.7010708@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Jaegeuk Hanse Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Further offtopic.. On Fri, 16 Nov 2012, Jaegeuk Hanse wrote: > Some questions about your shmem/tmpfs: misc and fallocate patchset. > > - Since shmem_setattr can truncate tmpfs files, why need add another similar > codes in function shmem_fallocate? What's the trick? I don't know if I understand you. In general, hole-punching is different from truncation. Supporting the hole-punch mode of the fallocate system call is different from supporting truncation. They're closely related, and share code, but meet different specifications. > - in tmpfs: support fallocate preallocation patch changelog: > "Christoph Hellwig: What for exactly? Please explain why preallocating on > tmpfs would make any sense. > Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on > the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or > -EOPNOTSUPP] on fallocate is just ugly." > Could shmem/tmpfs fallocate prevent one process truncate the file which the > second process mmap() and get SIGBUS when the second process access mmap but > out of current size of file? Again, I don't know if I understand you. fallocate does not prevent truncation or races or SIGBUS. I believe that Kay meant that without using fallocate to allocate the memory in advance, systemd found it hard to protect itself from the possibility of getting a SIGBUS, if access to a shmem mapping happened to run out of memory/space in the middle. I never grasped why writing the file in advance was not good enough: fallocate happened to be what they hoped to use, and it was hard to deny it, given that tmpfs already supported hole-punching, and was about to convert to the fallocate interface for that. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx175.postini.com [74.125.245.175]) by kanga.kvack.org (Postfix) with SMTP id E62286B004D for ; Sat, 17 Nov 2012 19:58:08 -0500 (EST) Received: by mail-da0-f41.google.com with SMTP id e20so108729dak.14 for ; Sat, 17 Nov 2012 16:58:08 -0800 (PST) Message-ID: <50A83289.6020108@gmail.com> Date: Sun, 18 Nov 2012 08:57:45 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> <50A6089B.7010708@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/17/2012 12:48 PM, Hugh Dickins wrote: > Further offtopic.. Thanks for your explanation, Hugh. :-) > > On Fri, 16 Nov 2012, Jaegeuk Hanse wrote: >> Some questions about your shmem/tmpfs: misc and fallocate patchset. >> >> - Since shmem_setattr can truncate tmpfs files, why need add another similar >> codes in function shmem_fallocate? What's the trick? > I don't know if I understand you. In general, hole-punching is different > from truncation. Supporting the hole-punch mode of the fallocate system > call is different from supporting truncation. They're closely related, > and share code, but meet different specifications. What's the different between shmem/tmpfs hole-punching and truncate_setsize/truncate_pagecache? Do you mean one is punch hole in the file and the other one is shrink or extent the size of a file? >> - in tmpfs: support fallocate preallocation patch changelog: >> "Christoph Hellwig: What for exactly? Please explain why preallocating on >> tmpfs would make any sense. >> Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on >> the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or >> -EOPNOTSUPP] on fallocate is just ugly." >> Could shmem/tmpfs fallocate prevent one process truncate the file which the >> second process mmap() and get SIGBUS when the second process access mmap but >> out of current size of file? > Again, I don't know if I understand you. fallocate does not prevent > truncation or races or SIGBUS. I believe that Kay meant that without > using fallocate to allocate the memory in advance, systemd found it hard > to protect itself from the possibility of getting a SIGBUS, if access to > a shmem mapping happened to run out of memory/space in the middle. IIUC, it will return VM_xxx_OOM instead of SIGBUS if run out of memory. Then how can get SIGBUS in this scene? Regards, Jaegeuk > I never grasped why writing the file in advance was not good enough: > fallocate happened to be what they hoped to use, and it was hard to > deny it, given that tmpfs already supported hole-punching, and was > about to convert to the fallocate interface for that. > Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx137.postini.com [74.125.245.137]) by kanga.kvack.org (Postfix) with SMTP id 927E46B004D for ; Sat, 17 Nov 2012 20:48:22 -0500 (EST) Received: by mail-pb0-f41.google.com with SMTP id xa7so2940568pbc.14 for ; Sat, 17 Nov 2012 17:48:21 -0800 (PST) Message-ID: <50A83E5E.9060300@gmail.com> Date: Sun, 18 Nov 2012 09:48:14 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> <50A6089B.7010708@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 11/17/2012 12:48 PM, Hugh Dickins wrote: > Further offtopic.. Hi Hugh, - I see you add this in vfs.txt: + fallocate: called by the VFS to preallocate blocks or punch a hole. I want to know if it's necessary to add it to man page since users still don't know fallocate can punch a hole from man fallocate. - in function shmem_fallocate: + else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) + error = -ENOMEM; If this changelog "shmem_fallocate() compare counts and give up once the reactivated pages have started to coming back to writepage (approximately: some zones would in fact recycle faster than others)." describe why need this change? If the answer is yes, I have two questions. 1) how can guarantee it really don't need preallocation if just one or a few pages always reactivated, in this scene, nr_unswapped maybe grow bigger enough than shmem_falloc.nr_falloced 2) why return -ENOMEM, it's not really OOM, is it a trick or ...? Regards, Jaegeuk > > On Fri, 16 Nov 2012, Jaegeuk Hanse wrote: >> Some questions about your shmem/tmpfs: misc and fallocate patchset. >> >> - Since shmem_setattr can truncate tmpfs files, why need add another similar >> codes in function shmem_fallocate? What's the trick? > I don't know if I understand you. In general, hole-punching is different > from truncation. Supporting the hole-punch mode of the fallocate system > call is different from supporting truncation. They're closely related, > and share code, but meet different specifications. > >> - in tmpfs: support fallocate preallocation patch changelog: >> "Christoph Hellwig: What for exactly? Please explain why preallocating on >> tmpfs would make any sense. >> Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on >> the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or >> -EOPNOTSUPP] on fallocate is just ugly." >> Could shmem/tmpfs fallocate prevent one process truncate the file which the >> second process mmap() and get SIGBUS when the second process access mmap but >> out of current size of file? > Again, I don't know if I understand you. fallocate does not prevent > truncation or races or SIGBUS. I believe that Kay meant that without > using fallocate to allocate the memory in advance, systemd found it hard > to protect itself from the possibility of getting a SIGBUS, if access to > a shmem mapping happened to run out of memory/space in the middle. > > I never grasped why writing the file in advance was not good enough: > fallocate happened to be what they hoped to use, and it was hard to > deny it, given that tmpfs already supported hole-punching, and was > about to convert to the fallocate interface for that. > > Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934357Ab2JYCho (ORCPT ); Wed, 24 Oct 2012 22:37:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:22571 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932069Ab2JYChn (ORCPT ); Wed, 24 Oct 2012 22:37:43 -0400 Date: Wed, 24 Oct 2012 22:37:38 -0400 From: Dave Jones To: linux-mm@kvack.org Cc: Linux Kernel Subject: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121025023738.GA27001@redhat.com> Mail-Followup-To: Dave Jones , linux-mm@kvack.org, Linux Kernel MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Machine under significant load (4gb memory used, swap usage fluctuating) triggered this... WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] shmem_getpage_gfp+0xa5c/0xa70 [] ? shmem_getpage_gfp+0x29e/0xa70 [] shmem_fault+0x4f/0xa0 [] __do_fault+0x71/0x5c0 [] ? __lock_acquire+0x306/0x1ba0 [] ? local_clock+0x89/0xa0 [] handle_pte_fault+0x97/0xae0 [] ? sub_preempt_count+0x79/0xd0 [] ? delay_tsc+0xae/0x120 [] ? __const_udelay+0x28/0x30 [] handle_mm_fault+0x289/0x350 [] __do_page_fault+0x18e/0x530 [] ? local_clock+0x89/0xa0 [] ? get_parent_ip+0x11/0x50 [] ? get_parent_ip+0x11/0x50 [] ? sub_preempt_count+0x79/0xd0 [] ? rcu_user_exit+0xc9/0xf0 [] do_page_fault+0x2b/0x50 [] page_fault+0x28/0x30 [] ? copy_user_enhanced_fast_string+0x9/0x20 [] ? sys_futimesat+0x41/0xe0 [] ? syscall_trace_enter+0x25/0x2c0 [] ? tracesys+0x7e/0xe6 [] tracesys+0xe1/0xe6 1148 error = shmem_add_to_page_cache(page, mapping, index, 1149 gfp, swp_to_radix_entry(swap)); 1150 /* We already confirmed swap, and make no allocation */ 1151 VM_BUG_ON(error); 1152 } total used free shared buffers cached Mem: 3885528 2854064 1031464 0 9624 19208 -/+ buffers/cache: 2825232 1060296 Swap: 6029308 30656 5998652 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932904Ab2JYEgf (ORCPT ); Thu, 25 Oct 2012 00:36:35 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:63220 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932112Ab2JYEge (ORCPT ); Thu, 25 Oct 2012 00:36:34 -0400 Date: Wed, 24 Oct 2012 21:36:27 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121025023738.GA27001@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 24 Oct 2012, Dave Jones wrote: > Machine under significant load (4gb memory used, swap usage fluctuating) > triggered this... > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_null+0x1a/0x20 > [] shmem_getpage_gfp+0xa5c/0xa70 > [] ? shmem_getpage_gfp+0x29e/0xa70 > [] shmem_fault+0x4f/0xa0 > [] __do_fault+0x71/0x5c0 > [] ? __lock_acquire+0x306/0x1ba0 > [] ? local_clock+0x89/0xa0 > [] handle_pte_fault+0x97/0xae0 > [] ? sub_preempt_count+0x79/0xd0 > [] ? delay_tsc+0xae/0x120 > [] ? __const_udelay+0x28/0x30 > [] handle_mm_fault+0x289/0x350 > [] __do_page_fault+0x18e/0x530 > [] ? local_clock+0x89/0xa0 > [] ? get_parent_ip+0x11/0x50 > [] ? get_parent_ip+0x11/0x50 > [] ? sub_preempt_count+0x79/0xd0 > [] ? rcu_user_exit+0xc9/0xf0 > [] do_page_fault+0x2b/0x50 > [] page_fault+0x28/0x30 > [] ? copy_user_enhanced_fast_string+0x9/0x20 > [] ? sys_futimesat+0x41/0xe0 > [] ? syscall_trace_enter+0x25/0x2c0 > [] ? tracesys+0x7e/0xe6 > [] tracesys+0xe1/0xe6 > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > 1149 gfp, swp_to_radix_entry(swap)); > 1150 /* We already confirmed swap, and make no allocation */ > 1151 VM_BUG_ON(error); > 1152 } That's very surprising. Easy enough to handle an error there, but of course I made it a VM_BUG_ON because it violates my assumptions: I rather need to understand how this can be, and I've no idea. Clutching at straws, I expect this is entirely irrelevant, but: there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor in current linux.git; rather, there's a VM_BUG_ON on line 1149. So you've inserted a couple of lines for some reason (more useful trinity behaviour, perhaps)? And have some config option I'm unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? Hugh > > > total used free shared buffers cached > Mem: 3885528 2854064 1031464 0 9624 19208 > -/+ buffers/cache: 2825232 1060296 > Swap: 6029308 30656 5998652 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752833Ab2JYEu7 (ORCPT ); Thu, 25 Oct 2012 00:50:59 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:41278 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750838Ab2JYEu6 (ORCPT ); Thu, 25 Oct 2012 00:50:58 -0400 Message-ID: <5088C51D.3060009@gmail.com> Date: Thu, 25 Oct 2012 12:50:37 +0800 From: Ni zhan Chen User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/25/2012 12:36 PM, Hugh Dickins wrote: > On Wed, 24 Oct 2012, Dave Jones wrote: > >> Machine under significant load (4gb memory used, swap usage fluctuating) >> triggered this... >> >> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >> Call Trace: >> [] warn_slowpath_common+0x7f/0xc0 >> [] warn_slowpath_null+0x1a/0x20 >> [] shmem_getpage_gfp+0xa5c/0xa70 >> [] ? shmem_getpage_gfp+0x29e/0xa70 >> [] shmem_fault+0x4f/0xa0 >> [] __do_fault+0x71/0x5c0 >> [] ? __lock_acquire+0x306/0x1ba0 >> [] ? local_clock+0x89/0xa0 >> [] handle_pte_fault+0x97/0xae0 >> [] ? sub_preempt_count+0x79/0xd0 >> [] ? delay_tsc+0xae/0x120 >> [] ? __const_udelay+0x28/0x30 >> [] handle_mm_fault+0x289/0x350 >> [] __do_page_fault+0x18e/0x530 >> [] ? local_clock+0x89/0xa0 >> [] ? get_parent_ip+0x11/0x50 >> [] ? get_parent_ip+0x11/0x50 >> [] ? sub_preempt_count+0x79/0xd0 >> [] ? rcu_user_exit+0xc9/0xf0 >> [] do_page_fault+0x2b/0x50 >> [] page_fault+0x28/0x30 >> [] ? copy_user_enhanced_fast_string+0x9/0x20 >> [] ? sys_futimesat+0x41/0xe0 >> [] ? syscall_trace_enter+0x25/0x2c0 >> [] ? tracesys+0x7e/0xe6 >> [] tracesys+0xe1/0xe6 >> >> >> >> 1148 error = shmem_add_to_page_cache(page, mapping, index, >> 1149 gfp, swp_to_radix_entry(swap)); >> 1150 /* We already confirmed swap, and make no allocation */ >> 1151 VM_BUG_ON(error); >> 1152 } > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. > > Clutching at straws, I expect this is entirely irrelevant, but: > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > So you've inserted a couple of lines for some reason (more useful > trinity behaviour, perhaps)? And have some config option I'm > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? Hi Hugh, I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative rss in memcg memory.stat], one question: if function shmem_confirm_swap confirm the entry has already brought back from swap by a racing thread, then why call shmem_add_to_page_cache to add page from swapcache to pagecache again? otherwise, will goto unlock and then go to repeat? where I miss? Regards, Chen > > Hugh > >> >> total used free shared buffers cached >> Mem: 3885528 2854064 1031464 0 9624 19208 >> -/+ buffers/cache: 2825232 1060296 >> Swap: 6029308 30656 5998652 > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758254Ab2JYG7v (ORCPT ); Thu, 25 Oct 2012 02:59:51 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:38520 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757458Ab2JYG7u (ORCPT ); Thu, 25 Oct 2012 02:59:50 -0400 Date: Wed, 24 Oct 2012 23:59:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Ni zhan Chen cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <5088C51D.3060009@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Ni zhan Chen wrote: > On 10/25/2012 12:36 PM, Hugh Dickins wrote: > > On Wed, 24 Oct 2012, Dave Jones wrote: > > > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > > triggered this... > > > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > > > > > 1148 error = shmem_add_to_page_cache(page, > > > mapping, index, > > > 1149 gfp, > > > swp_to_radix_entry(swap)); > > > 1150 /* We already confirmed swap, and make no > > > allocation */ > > > 1151 VM_BUG_ON(error); > > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > > of course I made it a VM_BUG_ON because it violates my assumptions: > > I rather need to understand how this can be, and I've no idea. > > > > Clutching at straws, I expect this is entirely irrelevant, but: > > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > > > So you've inserted a couple of lines for some reason (more useful > > trinity behaviour, perhaps)? And have some config option I'm > > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? > > Hi Hugh, > > I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative > rss in memcg memory.stat], one question: Well, yes, I added the VM_BUG_ON in that commit. > > if function shmem_confirm_swap confirm the entry has already brought back > from swap by a racing thread, The reverse: true confirms that the swap entry has not been brought back from swap by a racing thread; false indicates that there has been a race. > then why call shmem_add_to_page_cache to add > page from swapcache to pagecache again? Adding it to pagecache again, after such a race, would set error to -EEXIST (originating from radix_tree_insert); but we don't do that, we add it to pagecache when it has not already been added. Or that's the intention: but Dave seems to have found an unexpected exception, despite us holding the page lock across all this. (But if it weren't for the memcg and replace_page issues, I'd much prefer to let shmem_add_to_page_cache discover the race as before.) Hugh > otherwise, will goto unlock and then go to repeat? where I miss? > > Regards, > Chen From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935581Ab2JYKVp (ORCPT ); Thu, 25 Oct 2012 06:21:45 -0400 Received: from mail-pa0-f46.google.com ([209.85.220.46]:49000 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935163Ab2JYKVo (ORCPT ); Thu, 25 Oct 2012 06:21:44 -0400 Message-ID: <508912B0.7080805@gmail.com> Date: Thu, 25 Oct 2012 18:21:36 +0800 From: Ni zhan Chen User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/25/2012 02:59 PM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Ni zhan Chen wrote: >> On 10/25/2012 12:36 PM, Hugh Dickins wrote: >>> On Wed, 24 Oct 2012, Dave Jones wrote: >>> >>>> Machine under significant load (4gb memory used, swap usage fluctuating) >>>> triggered this... >>>> >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >>>> >>>> 1148 error = shmem_add_to_page_cache(page, >>>> mapping, index, >>>> 1149 gfp, >>>> swp_to_radix_entry(swap)); >>>> 1150 /* We already confirmed swap, and make no >>>> allocation */ >>>> 1151 VM_BUG_ON(error); >>>> 1152 } >>> That's very surprising. Easy enough to handle an error there, but >>> of course I made it a VM_BUG_ON because it violates my assumptions: >>> I rather need to understand how this can be, and I've no idea. >>> >>> Clutching at straws, I expect this is entirely irrelevant, but: >>> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor >>> in current linux.git; rather, there's a VM_BUG_ON on line 1149. >>> >>> So you've inserted a couple of lines for some reason (more useful >>> trinity behaviour, perhaps)? And have some config option I'm >>> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? >> Hi Hugh, >> >> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative >> rss in memcg memory.stat], one question: > Well, yes, I added the VM_BUG_ON in that commit. > >> if function shmem_confirm_swap confirm the entry has already brought back >> from swap by a racing thread, > The reverse: true confirms that the swap entry has not been brought back > from swap by a racing thread; false indicates that there has been a race. > >> then why call shmem_add_to_page_cache to add >> page from swapcache to pagecache again? > Adding it to pagecache again, after such a race, would set error to > -EEXIST (originating from radix_tree_insert); but we don't do that, > we add it to pagecache when it has not already been added. > > Or that's the intention: but Dave seems to have found an unexpected > exception, despite us holding the page lock across all this. > > (But if it weren't for the memcg and replace_page issues, I'd much > prefer to let shmem_add_to_page_cache discover the race as before.) > > Hugh Hi Hugh Thanks for your response. You mean the -EEXIST originating from radix_tree_insert, in radix_tree_insert: if (slot != NULL) return -EEXIST; But why slot should be NULL? if no race, the pagecache related radix tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? Regards, Chen > >> otherwise, will goto unlock and then go to repeat? where I miss? >> >> Regards, >> Chen From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935188Ab2JYLOS (ORCPT ); Thu, 25 Oct 2012 07:14:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:22265 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933181Ab2JYLOQ (ORCPT ); Thu, 25 Oct 2012 07:14:16 -0400 Date: Thu, 25 Oct 2012 07:14:11 -0400 From: Dave Jones To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121025111411.GB24886@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > 1149 gfp, swp_to_radix_entry(swap)); > > 1150 /* We already confirmed swap, and make no allocation */ > > 1151 VM_BUG_ON(error); > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. > > Clutching at straws, I expect this is entirely irrelevant, but: > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > So you've inserted a couple of lines for some reason (more useful > trinity behaviour, perhaps)? detritus from the recent mpol_to_str bug that I was chasing. Shouldn't be relevant... diff -durpN '--exclude-from=/home/davej/.exclude' src/git-trees/kernel/linux/mm> --- src/git-trees/kernel/linux/mm/shmem.c 2012-10-12 10:01:46.613408580 -> +++ linux-dj/mm/shmem.c 2012-10-15 12:31:32.979653309 -0400 @@ -885,13 +885,15 @@ redirty: static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) { char buffer[64]; + int ret; if (!mpol || mpol->mode == MPOL_DEFAULT) return; /* show nothing */ - mpol_to_str(buffer, sizeof(buffer), mpol, 1); - - seq_printf(seq, ",mpol=%s", buffer); + memset(buffer, 0, sizeof(buffer)); + ret = mpol_to_str(buffer, sizeof(buffer), mpol, 1); + if (ret > 0) + seq_printf(seq, ",mpol=%s", buffer); } > And have some config option I'm > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? Yes, I do have this.. -#define VM_BUG_ON(cond) BUG_ON(cond) +#define VM_BUG_ON(cond) WARN_ON(cond) because I got tired of things not going over my usb serial port when I hit them a while ago. BUG_ON is pretty unfriendly to bug finding. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965721Ab2JYUwX (ORCPT ); Thu, 25 Oct 2012 16:52:23 -0400 Received: from zene.cmpxchg.org ([85.214.230.12]:59929 "EHLO zene.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934995Ab2JYUwV (ORCPT ); Thu, 25 Oct 2012 16:52:21 -0400 Date: Thu, 25 Oct 2012 16:52:13 -0400 From: Johannes Weiner To: Hugh Dickins Cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121025205213.GB4771@cmpxchg.org> References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > On Wed, 24 Oct 2012, Dave Jones wrote: > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > triggered this... > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > Call Trace: > > [] warn_slowpath_common+0x7f/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] shmem_getpage_gfp+0xa5c/0xa70 > > [] ? shmem_getpage_gfp+0x29e/0xa70 > > [] shmem_fault+0x4f/0xa0 > > [] __do_fault+0x71/0x5c0 > > [] ? __lock_acquire+0x306/0x1ba0 > > [] ? local_clock+0x89/0xa0 > > [] handle_pte_fault+0x97/0xae0 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? delay_tsc+0xae/0x120 > > [] ? __const_udelay+0x28/0x30 > > [] handle_mm_fault+0x289/0x350 > > [] __do_page_fault+0x18e/0x530 > > [] ? local_clock+0x89/0xa0 > > [] ? get_parent_ip+0x11/0x50 > > [] ? get_parent_ip+0x11/0x50 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? rcu_user_exit+0xc9/0xf0 > > [] do_page_fault+0x2b/0x50 > > [] page_fault+0x28/0x30 > > [] ? copy_user_enhanced_fast_string+0x9/0x20 > > [] ? sys_futimesat+0x41/0xe0 > > [] ? syscall_trace_enter+0x25/0x2c0 > > [] ? tracesys+0x7e/0xe6 > > [] tracesys+0xe1/0xe6 > > > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > 1149 gfp, swp_to_radix_entry(swap)); > > 1150 /* We already confirmed swap, and make no allocation */ > > 1151 VM_BUG_ON(error); > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. Could it be concurrent truncation clearing out the entry between shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see anything preventing that. The empty slot would not match the expected swap entry this call passes in and the returned error would be -ENOENT. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751128Ab2JYV1s (ORCPT ); Thu, 25 Oct 2012 17:27:48 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:57224 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750809Ab2JYV1q (ORCPT ); Thu, 25 Oct 2012 17:27:46 -0400 Date: Thu, 25 Oct 2012 14:27:41 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Ni zhan Chen cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <508912B0.7080805@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> <508912B0.7080805@gmail.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Ni zhan Chen wrote: > On 10/25/2012 02:59 PM, Hugh Dickins wrote: > > On Thu, 25 Oct 2012, Ni zhan Chen wrote: > > > > > > I think it maybe caused by your commit [d189922862e03ce: shmem: fix > > > negative > > > rss in memcg memory.stat], one question: > > Well, yes, I added the VM_BUG_ON in that commit. > > > > > if function shmem_confirm_swap confirm the entry has already brought back > > > from swap by a racing thread, > > The reverse: true confirms that the swap entry has not been brought back > > from swap by a racing thread; false indicates that there has been a race. > > > > > then why call shmem_add_to_page_cache to add > > > page from swapcache to pagecache again? > > Adding it to pagecache again, after such a race, would set error to > > -EEXIST (originating from radix_tree_insert); but we don't do that, > > we add it to pagecache when it has not already been added. > > > > Or that's the intention: but Dave seems to have found an unexpected > > exception, despite us holding the page lock across all this. > > > > (But if it weren't for the memcg and replace_page issues, I'd much > > prefer to let shmem_add_to_page_cache discover the race as before.) > > > > Hugh > > Hi Hugh > > Thanks for your response. You mean the -EEXIST originating from > radix_tree_insert, in radix_tree_insert: > if (slot != NULL) > return -EEXIST; > But why slot should be NULL? if no race, the pagecache related radix tree > entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? I was describing what would happen in a case that should not exist, that you had thought the common case. In actuality, the entry should not be NULL, it should be as you say there. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751513Ab2JYV3D (ORCPT ); Thu, 25 Oct 2012 17:29:03 -0400 Received: from mail-ia0-f174.google.com ([209.85.210.174]:60499 "EHLO mail-ia0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751283Ab2JYV27 (ORCPT ); Thu, 25 Oct 2012 17:28:59 -0400 Date: Thu, 25 Oct 2012 14:28:59 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121025111411.GB24886@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121025111411.GB24886@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Dave Jones wrote: > On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > > Clutching at straws, I expect this is entirely irrelevant, but: > > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor > > in current linux.git; rather, there's a VM_BUG_ON on line 1149. > > > > So you've inserted a couple of lines for some reason (more useful > > trinity behaviour, perhaps)? > > detritus from the recent mpol_to_str bug that I was chasing. > Shouldn't be relevant... > > > And have some config option I'm > > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning? > > Yes, I do have this.. > > -#define VM_BUG_ON(cond) BUG_ON(cond) > +#define VM_BUG_ON(cond) WARN_ON(cond) > > because I got tired of things not going over my usb serial port when I hit them > a while ago. BUG_ON is pretty unfriendly to bug finding. Makes sense, thanks for the reassurance. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751796Ab2JYVsk (ORCPT ); Thu, 25 Oct 2012 17:48:40 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:57642 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751079Ab2JYVsj (ORCPT ); Thu, 25 Oct 2012 17:48:39 -0400 Date: Thu, 25 Oct 2012 14:48:39 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Johannes Weiner cc: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121025205213.GB4771@cmpxchg.org> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121025205213.GB4771@cmpxchg.org> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 25 Oct 2012, Johannes Weiner wrote: > On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > On Wed, 24 Oct 2012, Dave Jones wrote: > > > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > > triggered this... > > > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > > 1149 gfp, swp_to_radix_entry(swap)); > > > 1150 /* We already confirmed swap, and make no allocation */ > > > 1151 VM_BUG_ON(error); > > > 1152 } > > > > That's very surprising. Easy enough to handle an error there, but > > of course I made it a VM_BUG_ON because it violates my assumptions: > > I rather need to understand how this can be, and I've no idea. > > Could it be concurrent truncation clearing out the entry between > shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see > anything preventing that. > > The empty slot would not match the expected swap entry this call > passes in and the returned error would be -ENOENT. Excellent notion, many thanks Hannes, I believe you've got it. I've hit that truncation problem in swapoff (and commented on it in shmem_unuse_inode), but never hit it or considered it here. I think of the page lock as holding it stable, but truncation's free_swap_and_cache only does a trylock on the swapcache page, so we're not secured against that possibility. So I'd like to change it to VM_BUG_ON(error && error != -ENOENT), but there's a little tidying up to do in the -ENOENT case, which needs more thought. A delete_from_swap_cache(page) - though we can be lazy and leave that to reclaim for such a rare occurrence - and probably a mem_cgroup uncharge; but the memcg hooks are always the hardest to get right, I'll have think about that one carefully. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751991Ab2JZBsa (ORCPT ); Thu, 25 Oct 2012 21:48:30 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:36039 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750712Ab2JZBs3 (ORCPT ); Thu, 25 Oct 2012 21:48:29 -0400 Message-ID: <5089EBE1.1050009@gmail.com> Date: Fri, 26 Oct 2012 09:48:17 +0800 From: Ni zhan Chen User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <5088C51D.3060009@gmail.com> <508912B0.7080805@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/26/2012 05:27 AM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Ni zhan Chen wrote: >> On 10/25/2012 02:59 PM, Hugh Dickins wrote: >>> On Thu, 25 Oct 2012, Ni zhan Chen wrote: >>>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix >>>> negative >>>> rss in memcg memory.stat], one question: >>> Well, yes, I added the VM_BUG_ON in that commit. >>> >>>> if function shmem_confirm_swap confirm the entry has already brought back >>>> from swap by a racing thread, >>> The reverse: true confirms that the swap entry has not been brought back >>> from swap by a racing thread; false indicates that there has been a race. >>> >>>> then why call shmem_add_to_page_cache to add >>>> page from swapcache to pagecache again? >>> Adding it to pagecache again, after such a race, would set error to >>> -EEXIST (originating from radix_tree_insert); but we don't do that, >>> we add it to pagecache when it has not already been added. >>> >>> Or that's the intention: but Dave seems to have found an unexpected >>> exception, despite us holding the page lock across all this. >>> >>> (But if it weren't for the memcg and replace_page issues, I'd much >>> prefer to let shmem_add_to_page_cache discover the race as before.) >>> >>> Hugh >> Hi Hugh >> >> Thanks for your response. You mean the -EEXIST originating from >> radix_tree_insert, in radix_tree_insert: >> if (slot != NULL) >> return -EEXIST; >> But why slot should be NULL? if no race, the pagecache related radix tree >> entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss? > I was describing what would happen in a case that should not exist, > that you had thought the common case. In actuality, the entry should > not be NULL, it should be as you say there. Thanks for your patience. So in the common case, the entry should be the value I mentioned, then why has this check? if (slot != NULL) return -EEXIST; the common case will return -EEXIST. > > Hugh > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753079Ab2JZCPT (ORCPT ); Thu, 25 Oct 2012 22:15:19 -0400 Received: from mail-ia0-f174.google.com ([209.85.210.174]:49338 "EHLO mail-ia0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751165Ab2JZCPR (ORCPT ); Thu, 25 Oct 2012 22:15:17 -0400 Message-ID: <5089F22D.70007@gmail.com> Date: Fri, 26 Oct 2012 10:15:09 +0800 From: Ni zhan Chen User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Hugh Dickins CC: Johannes Weiner , Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] References: <20121025023738.GA27001@redhat.com> <20121025205213.GB4771@cmpxchg.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/26/2012 05:48 AM, Hugh Dickins wrote: > On Thu, 25 Oct 2012, Johannes Weiner wrote: >> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: >>> On Wed, 24 Oct 2012, Dave Jones wrote: >>> >>>> Machine under significant load (4gb memory used, swap usage fluctuating) >>>> triggered this... >>>> >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 >>>> >>>> 1148 error = shmem_add_to_page_cache(page, mapping, index, >>>> 1149 gfp, swp_to_radix_entry(swap)); >>>> 1150 /* We already confirmed swap, and make no allocation */ >>>> 1151 VM_BUG_ON(error); >>>> 1152 } >>> That's very surprising. Easy enough to handle an error there, but >>> of course I made it a VM_BUG_ON because it violates my assumptions: >>> I rather need to understand how this can be, and I've no idea. >> Could it be concurrent truncation clearing out the entry between >> shmem_confirm_swap() and shmem_add_to_page_cache()? I don't see >> anything preventing that. >> >> The empty slot would not match the expected swap entry this call >> passes in and the returned error would be -ENOENT. > Excellent notion, many thanks Hannes, I believe you've got it. > > I've hit that truncation problem in swapoff (and commented on it > in shmem_unuse_inode), but never hit it or considered it here. > I think of the page lock as holding it stable, but truncation's > free_swap_and_cache only does a trylock on the swapcache page, > so we're not secured against that possibility. Hi Hugh, Even though free_swap_and_cache only does a trylock on the swapcache page, but it doens't call delete_from_swap_cache and the associated entry should still be there, I am interested in what you have already introduce to protect it? > > So I'd like to change it to VM_BUG_ON(error && error != -ENOENT), > but there's a little tidying up to do in the -ENOENT case, which Do you mean radix_tree_insert will return -ENOENT if the associated entry is not present? Why I can't find this return value in the function radix_tree_insert? > needs more thought. A delete_from_swap_cache(page) - though we > can be lazy and leave that to reclaim for such a rare occurrence - > and probably a mem_cgroup uncharge; but the memcg hooks are always > the hardest to get right, I'll have think about that one carefully. > > Hugh > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762161Ab2KATLA (ORCPT ); Thu, 1 Nov 2012 15:11:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39543 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752285Ab2KATK5 (ORCPT ); Thu, 1 Nov 2012 15:10:57 -0400 Date: Thu, 1 Nov 2012 15:10:52 -0400 From: Dave Jones To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121101191052.GA5884@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121025023738.GA27001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > On Wed, 24 Oct 2012, Dave Jones wrote: > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > triggered this... > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 > > Call Trace: > > [] warn_slowpath_common+0x7f/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] shmem_getpage_gfp+0xa5c/0xa70 > > [] ? shmem_getpage_gfp+0x29e/0xa70 > > [] shmem_fault+0x4f/0xa0 > > [] __do_fault+0x71/0x5c0 > > [] ? __lock_acquire+0x306/0x1ba0 > > [] ? local_clock+0x89/0xa0 > > [] handle_pte_fault+0x97/0xae0 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? delay_tsc+0xae/0x120 > > [] ? __const_udelay+0x28/0x30 > > [] handle_mm_fault+0x289/0x350 > > [] __do_page_fault+0x18e/0x530 > > [] ? local_clock+0x89/0xa0 > > [] ? get_parent_ip+0x11/0x50 > > [] ? get_parent_ip+0x11/0x50 > > [] ? sub_preempt_count+0x79/0xd0 > > [] ? rcu_user_exit+0xc9/0xf0 > > [] do_page_fault+0x2b/0x50 > > [] page_fault+0x28/0x30 > > [] ? copy_user_enhanced_fast_string+0x9/0x20 > > [] ? sys_futimesat+0x41/0xe0 > > [] ? syscall_trace_enter+0x25/0x2c0 > > [] ? tracesys+0x7e/0xe6 > > [] tracesys+0xe1/0xe6 > > > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > 1149 gfp, swp_to_radix_entry(swap)); > > 1150 /* We already confirmed swap, and make no allocation */ > > 1151 VM_BUG_ON(error); > > 1152 } > > That's very surprising. Easy enough to handle an error there, but > of course I made it a VM_BUG_ON because it violates my assumptions: > I rather need to understand how this can be, and I've no idea. I just noticed we had a user report hitting this same warning, but with a different trace.. : [] warn_slowpath_common+0x7f/0xc0 : [] warn_slowpath_null+0x1a/0x20 : [] shmem_getpage_gfp+0x7f3/0x830 : [] ? vma_adjust+0x3ed/0x620 : [] shmem_file_aio_read+0x1f2/0x380 : [] do_sync_read+0xa7/0xe0 : [] vfs_read+0xa9/0x180 : [] sys_read+0x4a/0x90 : [] system_call_fastpath+0x16/0x1b Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992664Ab2KAXDs (ORCPT ); Thu, 1 Nov 2012 19:03:48 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:51942 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2992576Ab2KAXDo (ORCPT ); Thu, 1 Nov 2012 19:03:44 -0400 Date: Thu, 1 Nov 2012 16:03:40 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121101191052.GA5884@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 1 Nov 2012, Dave Jones wrote: > On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote: > > On Wed, 24 Oct 2012, Dave Jones wrote: > > > > > Machine under significant load (4gb memory used, swap usage fluctuating) > > > triggered this... > > > > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > > > > 1148 error = shmem_add_to_page_cache(page, mapping, index, > > > 1149 gfp, swp_to_radix_entry(swap)); > > > 1150 /* We already confirmed swap, and make no allocation */ > > > 1151 VM_BUG_ON(error); > > > 1152 } > > > > That's very surprising. Easy enough to handle an error there, but > > of course I made it a VM_BUG_ON because it violates my assumptions: > > I rather need to understand how this can be, and I've no idea. > > I just noticed we had a user report hitting this same warning, but > with a different trace.. > > : [] warn_slowpath_common+0x7f/0xc0 > : [] warn_slowpath_null+0x1a/0x20 > : [] shmem_getpage_gfp+0x7f3/0x830 > : [] ? vma_adjust+0x3ed/0x620 > : [] shmem_file_aio_read+0x1f2/0x380 > : [] do_sync_read+0xa7/0xe0 > : [] vfs_read+0xa9/0x180 > : [] sys_read+0x4a/0x90 > : [] system_call_fastpath+0x16/0x1b Equally explicable by Hannes's hypothesis; but useful supporting evidence, thank you. Except... earlier in the thread you explained how you hacked #define VM_BUG_ON(cond) WARN_ON(cond) to get this to come out as a warning instead of a bug, and now it looks as if "a user" has here done the same. Which is very much a user's right, of course; but does make me wonder whether that user might actually be davej ;) Never mind, whatever, it's more justification for the fix - which I've honestly not forgotten, but somehow not got around to sending (with a couple of others even longer outstanding). On its way shortly, for some unpredictable value of shortly. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756949Ab2KAXUj (ORCPT ); Thu, 1 Nov 2012 19:20:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55064 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751565Ab2KAXUg (ORCPT ); Thu, 1 Nov 2012 19:20:36 -0400 Date: Thu, 1 Nov 2012 19:20:30 -0400 From: Dave Jones To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121101232030.GA25519@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote: > > I just noticed we had a user report hitting this same warning, but > > with a different trace.. > > > > : [] warn_slowpath_common+0x7f/0xc0 > > : [] warn_slowpath_null+0x1a/0x20 > > : [] shmem_getpage_gfp+0x7f3/0x830 > > : [] ? vma_adjust+0x3ed/0x620 > > : [] shmem_file_aio_read+0x1f2/0x380 > > : [] do_sync_read+0xa7/0xe0 > > : [] vfs_read+0xa9/0x180 > > : [] sys_read+0x4a/0x90 > > : [] system_call_fastpath+0x16/0x1b > > Equally explicable by Hannes's hypothesis; > but useful supporting evidence, thank you. > > Except... earlier in the thread you explained how you hacked > #define VM_BUG_ON(cond) WARN_ON(cond) > to get this to come out as a warning instead of a bug, > and now it looks as if "a user" has here done the same. > > Which is very much a user's right, of course; but does > make me wonder whether that user might actually be davej ;) indirectly. I made the same change in the Fedora kernel a while ago to test a hypothesis that we weren't getting any VM_BUG_ON reports. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2992551Ab2KAXsr (ORCPT ); Thu, 1 Nov 2012 19:48:47 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:54242 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751565Ab2KAXsq (ORCPT ); Thu, 1 Nov 2012 19:48:46 -0400 Date: Thu, 1 Nov 2012 16:48:41 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121101232030.GA25519@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 1 Nov 2012, Dave Jones wrote: > On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote: > > > > Except... earlier in the thread you explained how you hacked > > #define VM_BUG_ON(cond) WARN_ON(cond) > > to get this to come out as a warning instead of a bug, > > and now it looks as if "a user" has here done the same. > > > > Which is very much a user's right, of course; but does > > make me wonder whether that user might actually be davej ;) > > indirectly. I made the same change in the Fedora kernel a while ago > to test a hypothesis that we weren't getting any VM_BUG_ON reports. Fedora turns on CONFIG_DEBUG_VM? All mm developers should thank you for the wider testing exposure; but I'm not so sure that Fedora users should thank you for turning it on - really it's for mm developers to wrap around !assertions or more expensive checks (e.g. checking calls) in their development. Or did I read a few months ago that some change had been made to such definitions, and VM_BUG_ON(contents) are evaluated even when the config option is off? I do hope I'm mistaken on that. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762600Ab2KBBno (ORCPT ); Thu, 1 Nov 2012 21:43:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26821 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762542Ab2KBBnn (ORCPT ); Thu, 1 Nov 2012 21:43:43 -0400 Date: Thu, 1 Nov 2012 21:43:36 -0400 From: Dave Jones To: Hugh Dickins Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Message-ID: <20121102014336.GA1727@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote: > On Thu, 1 Nov 2012, Dave Jones wrote: > > On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote: > > > > > > Except... earlier in the thread you explained how you hacked > > > #define VM_BUG_ON(cond) WARN_ON(cond) > > > to get this to come out as a warning instead of a bug, > > > and now it looks as if "a user" has here done the same. > > > > > > Which is very much a user's right, of course; but does > > > make me wonder whether that user might actually be davej ;) > > > > indirectly. I made the same change in the Fedora kernel a while ago > > to test a hypothesis that we weren't getting any VM_BUG_ON reports. > > Fedora turns on CONFIG_DEBUG_VM? Yes. > All mm developers should thank you for the wider testing exposure; > but I'm not so sure that Fedora users should thank you for turning > it on - really it's for mm developers to wrap around !assertions or > more expensive checks (e.g. checking calls) in their development. The last time I did some benchmarking the impact wasn't as ridiculous as say lockdep, or spinlock debug. Maybe the benchmarks I was using weren't pushing the VM very hard, but it seemed to me that the value in getting info in potential problems early was higher than a small performance increase. > Or did I read a few months ago that some change had been made to > such definitions, and VM_BUG_ON(contents) are evaluated even when > the config option is off? I do hope I'm mistaken on that. Pretty sure that isn't the case. I remember Andrew chastising people a few times for putting checks in VM_BUG_ON's that needed to stay around even when the config option was off. Perhaps you were thinking of one of those incidents ? Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759389Ab2KBX0C (ORCPT ); Fri, 2 Nov 2012 19:26:02 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:50539 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752705Ab2KBXZ7 (ORCPT ); Fri, 2 Nov 2012 19:25:59 -0400 Date: Fri, 2 Nov 2012 16:26:03 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] In-Reply-To: <20121102014336.GA1727@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 1 Nov 2012, Dave Jones wrote: > On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote: > > > > Fedora turns on CONFIG_DEBUG_VM? > > Yes. > > > All mm developers should thank you for the wider testing exposure; > > but I'm not so sure that Fedora users should thank you for turning > > it on - really it's for mm developers to wrap around !assertions or > > more expensive checks (e.g. checking calls) in their development. > > The last time I did some benchmarking the impact wasn't as ridiculous > as say lockdep, or spinlock debug. I think you're safe to assume that (outside of an individual developer's private tree) it will never be nearly as heavy as lockdep or debug pagealloc. I hadn't thought of spinlock debug as a heavy one, but yes, I guess it would be heavier than almost all VM_BUG_ON()s. > Maybe the benchmarks I was using > weren't pushing the VM very hard, but it seemed to me that the value > in getting info in potential problems early was higher than a small > performance increase. We thank you. I may have been over-estimating how much we put inside those VM_BUG_ON()s, sorry. Just so long as you're aware that there's a danger that one day we might slip something heavier in there. Those few explicit #ifdef CONFIG_DEBUG_VMs sometimes found in mm/ are probably the worst: you might want to check on the current crop. > > > Or did I read a few months ago that some change had been made to > > such definitions, and VM_BUG_ON(contents) are evaluated even when > > the config option is off? I do hope I'm mistaken on that. > > Pretty sure that isn't the case. I remember Andrew chastising people > a few times for putting checks in VM_BUG_ON's that needed to stay around > even when the config option was off. Perhaps you were thinking of one > of those incidents ? Avoiding side-effects in BUG_ON and VM_BUG_ON. Yes, that comes up from time to time, and I'm a believer on that. I think the discussion I'm mis/remembering sprung out of one of those: someone was surprised by the disassembly they found when it was configured off. The correct answer is to try it for myself and see. Not today. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754973Ab2KFBcs (ORCPT ); Mon, 5 Nov 2012 20:32:48 -0500 Received: from mail-ie0-f174.google.com ([209.85.223.174]:60298 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754822Ab2KFBcq (ORCPT ); Mon, 5 Nov 2012 20:32:46 -0500 Date: Mon, 5 Nov 2012 17:32:41 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Dave Jones , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora has converted to WARNING) in shmem_getpage_gfp(): WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] shmem_getpage_gfp+0xa5c/0xa70 [] shmem_fault+0x4f/0xa0 [] __do_fault+0x71/0x5c0 [] handle_pte_fault+0x97/0xae0 [] handle_mm_fault+0x289/0x350 [] __do_page_fault+0x18e/0x530 [] do_page_fault+0x2b/0x50 [] page_fault+0x28/0x30 [] tracesys+0xe1/0xe6 Thanks to Johannes for pointing to truncation: free_swap_and_cache() only does a trylock on the page, so the page lock we've held since before confirming swap is not enough to protect against truncation. What cleanup is needed in this case? Just delete_from_swap_cache(), which takes care of the memcg uncharge. Reported-by: Dave Jones Hypothesis-by: Johannes Weiner Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org --- mm/shmem.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) --- 3.7-rc4/mm/shmem.c 2012-10-14 16:16:58.361309122 -0700 +++ linux/mm/shmem.c 2012-11-01 14:31:04.288185742 -0700 @@ -1145,8 +1145,22 @@ repeat: if (!error) { error = shmem_add_to_page_cache(page, mapping, index, gfp, swp_to_radix_entry(swap)); - /* We already confirmed swap, and make no allocation */ - VM_BUG_ON(error); + /* + * We already confirmed swap under page lock, and make + * no memory allocation here, so usually no possibility + * of error; but free_swap_and_cache() only trylocks a + * page, so it is just possible that the entry has been + * truncated or holepunched since swap was confirmed. + * shmem_undo_range() will have done some of the + * unaccounting, now delete_from_swap_cache() will do + * the rest (including mem_cgroup_uncharge_swapcache). + * Reset swap.val? No, leave it so "failed" goes back to + * "repeat": reading a hole and writing should succeed. + */ + if (error) { + VM_BUG_ON(error != -ENOENT); + delete_from_swap_cache(page); + } } if (error) goto failed; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752029Ab2KFNyN (ORCPT ); Tue, 6 Nov 2012 08:54:13 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52409 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750989Ab2KFNyM (ORCPT ); Tue, 6 Nov 2012 08:54:12 -0500 Date: Tue, 6 Nov 2012 08:54:02 -0500 From: Dave Jones To: Hugh Dickins Cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Message-ID: <20121106135402.GA3543@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: > - /* We already confirmed swap, and make no allocation */ > - VM_BUG_ON(error); > + /* > + * We already confirmed swap under page lock, and make > + * no memory allocation here, so usually no possibility > + * of error; but free_swap_and_cache() only trylocks a > + * page, so it is just possible that the entry has been > + * truncated or holepunched since swap was confirmed. > + * shmem_undo_range() will have done some of the > + * unaccounting, now delete_from_swap_cache() will do > + * the rest (including mem_cgroup_uncharge_swapcache). > + * Reset swap.val? No, leave it so "failed" goes back to > + * "repeat": reading a hole and writing should succeed. > + */ > + if (error) { > + VM_BUG_ON(error != -ENOENT); > + delete_from_swap_cache(page); > + } > } I ran with this overnight, and still hit the (new!) VM_BUG_ON Perhaps we should print out what 'error' was too ? I'll rebuild with that.. ------------[ cut here ]------------ WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Hardware name: 2012 Client Platform Modules linked in: fuse ipt_ULOG scsi_transport_iscsi binfmt_misc dn_rtmsg nfnetlink nfc caif_socket caif af_802154 phonet af_rxrpc can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 p8022 psnap llc ax25 lockd sunrpc bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables gspca_ov519 gspca_main videodev kvm_intel usb_debug kvm crc32c_intel ghash_clmulni_intel microcode pcspkr i2c_i801 e1000e uinput i915 video i2c_algo_bit drm_kms_helper drm i2c_core Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] shmem_getpage_gfp+0xa5c/0xa70 [] ? shmem_getpage_gfp+0x29e/0xa70 [] shmem_fault+0x4f/0xa0 [] __do_fault+0x71/0x5c0 [] ? __lock_acquire+0x306/0x1ba0 [] ? local_clock+0x89/0xa0 [] handle_pte_fault+0x97/0xae0 [] ? sub_preempt_count+0x79/0xd0 [] ? delay_tsc+0xae/0x120 [] ? __const_udelay+0x28/0x30 [] handle_mm_fault+0x289/0x350 [] __do_page_fault+0x18e/0x530 [] ? getname_flags.part.32+0x30/0x150 [] ? getname_flags.part.32+0x30/0x150 [] ? set_track+0x8c/0x1a0 [] ? __slab_alloc+0x531/0x59e [] ? trace_hardirqs_on_caller+0x15d/0x1e0 [] ? rcu_user_exit+0xc9/0xf0 [] do_page_fault+0x2b/0x50 [] page_fault+0x28/0x30 [] ? strncpy_from_user+0x6c/0x120 [] getname_flags.part.32+0x86/0x150 [] getname+0x3a/0x60 [] sys_symlinkat+0x24/0x90 [] ? tracesys+0x7e/0xe6 [] tracesys+0xe1/0xe6 ---[ end trace 4ba438264ea16e97 ]--- Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753941Ab2KFXs0 (ORCPT ); Tue, 6 Nov 2012 18:48:26 -0500 Received: from mail-ia0-f174.google.com ([209.85.210.174]:62811 "EHLO mail-ia0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753314Ab2KFXsY (ORCPT ); Tue, 6 Nov 2012 18:48:24 -0500 Date: Tue, 6 Nov 2012 15:48:20 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <20121106135402.GA3543@redhat.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 6 Nov 2012, Dave Jones wrote: > On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: > > > - /* We already confirmed swap, and make no allocation */ > > - VM_BUG_ON(error); > > + /* > > + * We already confirmed swap under page lock, and make > > + * no memory allocation here, so usually no possibility > > + * of error; but free_swap_and_cache() only trylocks a > > + * page, so it is just possible that the entry has been > > + * truncated or holepunched since swap was confirmed. > > + * shmem_undo_range() will have done some of the > > + * unaccounting, now delete_from_swap_cache() will do > > + * the rest (including mem_cgroup_uncharge_swapcache). > > + * Reset swap.val? No, leave it so "failed" goes back to > > + * "repeat": reading a hole and writing should succeed. > > + */ > > + if (error) { > > + VM_BUG_ON(error != -ENOENT); > > + delete_from_swap_cache(page); > > + } > > } > > I ran with this overnight, Thanks a lot... > and still hit the (new!) VM_BUG_ON ... but that's even more surprising than your original report. > > Perhaps we should print out what 'error' was too ? I'll rebuild with that.. Thanks; though I thought the error was going to turn out too boring, and was preparing a debug patch for you to show the expected and found values too. But then got very puzzled... > > ------------[ cut here ]------------ > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > Hardware name: 2012 Client Platform > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 That's the very same line number as in your original report, despite the long comment which the patch adds. Are you sure that kernel was built with the patch in? I wouldn't usually question you, but I'm going mad trying to understand how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that line, and when I was preparing the debug patch, I was thinking that an error from shmem_radix_tree_replace could also be -EEXIST, for when a different something rather than nothing is found [*]. But that's not the case, shmem_radix_tree_replace returns either 0 or -ENOENT. So if error != -ENOENT, that means shmem_add_to_page_cache went the radix_tree_insert route instead of the shmem_radix_tree_replace route; which means that its 'expected' is NULL, so swp_to_radix_entry(swap) is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt the radix_tree might be, I do not understand the new VM_BUG_ON firing. Please tell me it was the wrong kernel! Hugh [*] But in thinking it over, I realize that if shmem_radix_tree_replace had returned -EEXIST for the "wrong something" case, I would have been wrong to BUG on that; because just as truncation could remove an entry, something else could immediately after instantiate a new page there. So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's not saying what I had intended to say with it, and would have been wrong to say that anyway. It just looks stupid to me now, rather like inserting a VM_BUG_ON(false) - but that does become interesting when you report that you've hit it. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754208Ab2KGWik (ORCPT ); Wed, 7 Nov 2012 17:38:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:15996 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751404Ab2KGWij (ORCPT ); Wed, 7 Nov 2012 17:38:39 -0500 Date: Wed, 7 Nov 2012 17:38:30 -0500 From: Dave Jones To: Hugh Dickins Cc: Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Message-ID: <20121107223830.GA12561@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 06, 2012 at 03:48:20PM -0800, Hugh Dickins wrote: > > ------------[ cut here ]------------ > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > Hardware name: 2012 Client Platform > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 > > That's the very same line number as in your original report, despite > the long comment which the patch adds. Are you sure that kernel was > built with the patch in? I just changed the code by hand, and opted not to paste the comment in. It is plausible that I built that kernel and forgot to reboot into it, but I'm 99.9% sure that that wasn't the case. Unfortunatly I can't check immediately, as that machine for reasons unknown no longer wants to get past the BIOS POST check. I'll see if I can reproduce it on a different test box until I get that one back up. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756188Ab2KNBgm (ORCPT ); Tue, 13 Nov 2012 20:36:42 -0500 Received: from mail-ye0-f174.google.com ([209.85.213.174]:64021 "EHLO mail-ye0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756052Ab2KNBgk (ORCPT ); Tue, 13 Nov 2012 20:36:40 -0500 Date: Tue, 13 Nov 2012 17:36:33 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: Dave Jones , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON fix In-Reply-To: <20121107223830.GA12561@redhat.com> Message-ID: References: <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <20121107223830.GA12561@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We're still hoping to hear back from Dave Jones: but either way, please fold this patch into the earlier fix for 3.7 and -stable. Remove its VM_BUG_ON: because either it's as I believe, a tautology which cannot happen, and does not assert what I'd intended when I put it in, and would even be wrong if it did (a non-NULL entry can validly materialize there); or Dave actually hit it on his updated kernel, in which case more research will be needed, but for upstream we do not want a user to BUG there. Signed-off-by: Hugh Dickins --- mm/shmem.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- mmotm/mm/shmem.c 2012-11-09 09:43:46.908046342 -0800 +++ linux/mm/shmem.c 2012-11-13 17:16:38.532528959 -0800 @@ -1158,10 +1158,8 @@ repeat: * Reset swap.val? No, leave it so "failed" goes back to * "repeat": reading a hole and writing should succeed. */ - if (error) { - VM_BUG_ON(error != -ENOENT); + if (error) delete_from_swap_cache(page); - } } if (error) goto failed; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932541Ab2KNDHS (ORCPT ); Tue, 13 Nov 2012 22:07:18 -0500 Received: from mail-ie0-f174.google.com ([209.85.223.174]:45704 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932477Ab2KNDHQ (ORCPT ); Tue, 13 Nov 2012 22:07:16 -0500 Message-ID: <50A30ADD.9000209@gmail.com> Date: Wed, 14 Nov 2012 11:07:09 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/07/2012 07:48 AM, Hugh Dickins wrote: > On Tue, 6 Nov 2012, Dave Jones wrote: >> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: >> >> > - /* We already confirmed swap, and make no allocation */ >> > - VM_BUG_ON(error); >> > + /* >> > + * We already confirmed swap under page lock, and make >> > + * no memory allocation here, so usually no possibility >> > + * of error; but free_swap_and_cache() only trylocks a >> > + * page, so it is just possible that the entry has been >> > + * truncated or holepunched since swap was confirmed. >> > + * shmem_undo_range() will have done some of the >> > + * unaccounting, now delete_from_swap_cache() will do >> > + * the rest (including mem_cgroup_uncharge_swapcache). >> > + * Reset swap.val? No, leave it so "failed" goes back to >> > + * "repeat": reading a hole and writing should succeed. >> > + */ >> > + if (error) { >> > + VM_BUG_ON(error != -ENOENT); >> > + delete_from_swap_cache(page); >> > + } >> > } >> >> I ran with this overnight, > Thanks a lot... > >> and still hit the (new!) VM_BUG_ON > ... but that's even more surprising than your original report. > >> Perhaps we should print out what 'error' was too ? I'll rebuild with that.. > Thanks; though I thought the error was going to turn out too boring, > and was preparing a debug patch for you to show the expected and found > values too. But then got very puzzled... > >> ------------[ cut here ]------------ >> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >> Hardware name: 2012 Client Platform >> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 > That's the very same line number as in your original report, despite > the long comment which the patch adds. Are you sure that kernel was > built with the patch in? > > I wouldn't usually question you, but I'm going mad trying to understand > how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that > line, and when I was preparing the debug patch, I was thinking that an > error from shmem_radix_tree_replace could also be -EEXIST, for when a > different something rather than nothing is found [*]. But that's not > the case, shmem_radix_tree_replace returns either 0 or -ENOENT. > > So if error != -ENOENT, that means shmem_add_to_page_cache went the > radix_tree_insert route instead of the shmem_radix_tree_replace route; > which means that its 'expected' is NULL, so swp_to_radix_entry(swap) > is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt > the radix_tree might be, I do not understand the new VM_BUG_ON firing. > > Please tell me it was the wrong kernel! > Hugh > > [*] But in thinking it over, I realize that if shmem_radix_tree_replace > had returned -EEXIST for the "wrong something" case, I would have been > wrong to BUG on that; because just as truncation could remove an entry, > something else could immediately after instantiate a new page there. Hi Hugh, As you said, swp_to_radix_entry() does an "| 2", so even if truncation could remove an entry and something else could immediately after instantiate a new page there, but the expected parameter will not be NULL, the result is radix_tree_insert will not be called and shmem_add_to_page_cache will not return -EEXIST, then why trigger BUG_ON ? Regards, Jaegeuk > So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's > not saying what I had intended to say with it, and would have been > wrong to say that anyway. It just looks stupid to me now, rather > like inserting a VM_BUG_ON(false) - but that does become interesting > when you report that you've hit it. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932646Ab2KNDuf (ORCPT ); Tue, 13 Nov 2012 22:50:35 -0500 Received: from mail-gh0-f179.google.com ([209.85.160.179]:35555 "EHLO mail-gh0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932576Ab2KNDue (ORCPT ); Tue, 13 Nov 2012 22:50:34 -0500 Date: Tue, 13 Nov 2012 19:50:25 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Jaegeuk Hanse cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <50A30ADD.9000209@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 14 Nov 2012, Jaegeuk Hanse wrote: > On 11/07/2012 07:48 AM, Hugh Dickins wrote: > > On Tue, 6 Nov 2012, Dave Jones wrote: > > > On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: > > > > > > > - /* We already confirmed swap, and make no > > > allocation */ > > > > - VM_BUG_ON(error); > > > > + /* > > > > + * We already confirmed swap under page lock, > > > and make > > > > + * no memory allocation here, so usually no > > > possibility > > > > + * of error; but free_swap_and_cache() only > > > trylocks a > > > > + * page, so it is just possible that the > > > entry has been > > > > + * truncated or holepunched since swap was > > > confirmed. > > > > + * shmem_undo_range() will have done some of > > > the > > > > + * unaccounting, now delete_from_swap_cache() > > > will do > > > > + * the rest (including > > > mem_cgroup_uncharge_swapcache). > > > > + * Reset swap.val? No, leave it so "failed" > > > goes back to > > > > + * "repeat": reading a hole and writing > > > should succeed. > > > > + */ > > > > + if (error) { > > > > + VM_BUG_ON(error != -ENOENT); > > > > + delete_from_swap_cache(page); > > > > + } > > > > } > > > > > > I ran with this overnight, > > Thanks a lot... > > > > > and still hit the (new!) VM_BUG_ON > > ... but that's even more surprising than your original report. > > > > > Perhaps we should print out what 'error' was too ? I'll rebuild with > > > that.. > > Thanks; though I thought the error was going to turn out too boring, > > and was preparing a debug patch for you to show the expected and found > > values too. But then got very puzzled... > > > > > ------------[ cut here ]------------ > > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() > > > Hardware name: 2012 Client Platform > > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 > > That's the very same line number as in your original report, despite > > the long comment which the patch adds. Are you sure that kernel was > > built with the patch in? > > > > I wouldn't usually question you, but I'm going mad trying to understand > > how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that > > line, and when I was preparing the debug patch, I was thinking that an > > error from shmem_radix_tree_replace could also be -EEXIST, for when a > > different something rather than nothing is found [*]. But that's not > > the case, shmem_radix_tree_replace returns either 0 or -ENOENT. > > > > So if error != -ENOENT, that means shmem_add_to_page_cache went the > > radix_tree_insert route instead of the shmem_radix_tree_replace route; > > which means that its 'expected' is NULL, so swp_to_radix_entry(swap) > > is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt > > the radix_tree might be, I do not understand the new VM_BUG_ON firing. > > > > Please tell me it was the wrong kernel! > > Hugh > > > > [*] But in thinking it over, I realize that if shmem_radix_tree_replace > > had returned -EEXIST for the "wrong something" case, I would have been > > wrong to BUG on that; because just as truncation could remove an entry, > > something else could immediately after instantiate a new page there. > > Hi Hugh, > > As you said, swp_to_radix_entry() does an "| 2", so even if truncation could > remove an entry and something else could immediately after instantiate a new > page there, but the expected parameter will not be NULL, the result is > radix_tree_insert will not be called and shmem_add_to_page_cache will not > return -EEXIST, then why trigger BUG_ON ? Why insert the VM_BUG_ON? Because at the time I thought that it asserted something useful; but I was mistaken, as explained above. How can the VM_BUG_ON trigger (without stack corruption, or something of that kind)? I have no idea. We are in agreement: I now think that VM_BUG_ON is misleading and silly, and sent Andrew a further patch to remove it a just couple of hours ago. Originally I was waiting to hear further from Dave; but his test machine was giving trouble, and it occurred to me that, never mind whether he says he has hit it again, or he has not hit it again, the answer is the same: don't send that VM_BUG_ON upstream. Hugh > > Regards, > Jaegeuk > > > So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's > > not saying what I had intended to say with it, and would have been > > wrong to say that anyway. It just looks stupid to me now, rather > > like inserting a VM_BUG_ON(false) - but that does become interesting > > when you report that you've hit it. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752810Ab2KNGOu (ORCPT ); Wed, 14 Nov 2012 01:14:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:24651 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751587Ab2KNGOt (ORCPT ); Wed, 14 Nov 2012 01:14:49 -0500 Date: Wed, 14 Nov 2012 01:14:37 -0500 From: Dave Jones To: Hugh Dickins Cc: Jaegeuk Hanse , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Message-ID: <20121114061437.GA23458@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Jaegeuk Hanse , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote: > Originally I was waiting to hear further from Dave; but his test > machine was giving trouble, and it occurred to me that, never mind > whether he says he has hit it again, or he has not hit it again, > the answer is the same: don't send that VM_BUG_ON upstream. Sorry, I'm supposedly on vacation. That said, a replacement test box has been running tests since last Friday without hitting that case. Maybe it was the last death throes of that other machine before it gave up the ghost completely. Does sound like an awful coincidence though. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161066Ab2KNKGG (ORCPT ); Wed, 14 Nov 2012 05:06:06 -0500 Received: from mail-pa0-f46.google.com ([209.85.220.46]:48837 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161016Ab2KNKGD (ORCPT ); Wed, 14 Nov 2012 05:06:03 -0500 Date: Wed, 14 Nov 2012 02:06:00 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones cc: Jaegeuk Hanse , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <20121114061437.GA23458@redhat.com> Message-ID: References: <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <20121114061437.GA23458@redhat.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 14 Nov 2012, Dave Jones wrote: > On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote: > > > Originally I was waiting to hear further from Dave; but his test > > machine was giving trouble, and it occurred to me that, never mind > > whether he says he has hit it again, or he has not hit it again, > > the answer is the same: don't send that VM_BUG_ON upstream. > > Sorry, I'm supposedly on vacation. Sorry for breaking in upon that, and thank you for responding even so. > That said, a replacement test box has been running tests since last Friday > without hitting that case. Maybe it was the last death throes of > that other machine before it gave up the ghost completely. > > Does sound like an awful coincidence though. I'm still clinging to your 0.1% possibility that it was not the intended kernel running. Anyway, I'm not going to worry about it further, until we see another hit - please do keep the VM_BUG_ON in your test kernel (i.e. resist that temptation to race in from your vacation to apply today's patch!), even though the right thing for 3.7 was to remove it. Thanks, Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754156Ab2KOHj7 (ORCPT ); Thu, 15 Nov 2012 02:39:59 -0500 Received: from mail-pb0-f46.google.com ([209.85.160.46]:61175 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751970Ab2KOHj6 (ORCPT ); Thu, 15 Nov 2012 02:39:58 -0500 Message-ID: <50A49C46.9040406@gmail.com> Date: Thu, 15 Nov 2012 15:39:50 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/14/2012 11:50 AM, Hugh Dickins wrote: > On Wed, 14 Nov 2012, Jaegeuk Hanse wrote: >> On 11/07/2012 07:48 AM, Hugh Dickins wrote: >>> On Tue, 6 Nov 2012, Dave Jones wrote: >>>> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote: >>>> >>>> > - /* We already confirmed swap, and make no >>>> allocation */ >>>> > - VM_BUG_ON(error); >>>> > + /* >>>> > + * We already confirmed swap under page lock, >>>> and make >>>> > + * no memory allocation here, so usually no >>>> possibility >>>> > + * of error; but free_swap_and_cache() only >>>> trylocks a >>>> > + * page, so it is just possible that the >>>> entry has been >>>> > + * truncated or holepunched since swap was >>>> confirmed. >>>> > + * shmem_undo_range() will have done some of >>>> the >>>> > + * unaccounting, now delete_from_swap_cache() >>>> will do >>>> > + * the rest (including >>>> mem_cgroup_uncharge_swapcache). >>>> > + * Reset swap.val? No, leave it so "failed" >>>> goes back to >>>> > + * "repeat": reading a hole and writing >>>> should succeed. >>>> > + */ >>>> > + if (error) { >>>> > + VM_BUG_ON(error != -ENOENT); >>>> > + delete_from_swap_cache(page); >>>> > + } >>>> > } >>>> >>>> I ran with this overnight, >>> Thanks a lot... >>> >>>> and still hit the (new!) VM_BUG_ON >>> ... but that's even more surprising than your original report. >>> >>>> Perhaps we should print out what 'error' was too ? I'll rebuild with >>>> that.. >>> Thanks; though I thought the error was going to turn out too boring, >>> and was preparing a debug patch for you to show the expected and found >>> values too. But then got very puzzled... >>> >>>> ------------[ cut here ]------------ >>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() >>>> Hardware name: 2012 Client Platform >>>> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54 >>> That's the very same line number as in your original report, despite >>> the long comment which the patch adds. Are you sure that kernel was >>> built with the patch in? >>> >>> I wouldn't usually question you, but I'm going mad trying to understand >>> how the VM_BUG_ON(error != -ENOENT) fires. At the time I wrote that >>> line, and when I was preparing the debug patch, I was thinking that an >>> error from shmem_radix_tree_replace could also be -EEXIST, for when a >>> different something rather than nothing is found [*]. But that's not >>> the case, shmem_radix_tree_replace returns either 0 or -ENOENT. >>> >>> So if error != -ENOENT, that means shmem_add_to_page_cache went the >>> radix_tree_insert route instead of the shmem_radix_tree_replace route; >>> which means that its 'expected' is NULL, so swp_to_radix_entry(swap) >>> is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt >>> the radix_tree might be, I do not understand the new VM_BUG_ON firing. >>> >>> Please tell me it was the wrong kernel! >>> Hugh >>> >>> [*] But in thinking it over, I realize that if shmem_radix_tree_replace >>> had returned -EEXIST for the "wrong something" case, I would have been >>> wrong to BUG on that; because just as truncation could remove an entry, >>> something else could immediately after instantiate a new page there. >> Hi Hugh, >> >> As you said, swp_to_radix_entry() does an "| 2", so even if truncation could >> remove an entry and something else could immediately after instantiate a new >> page there, but the expected parameter will not be NULL, the result is >> radix_tree_insert will not be called and shmem_add_to_page_cache will not >> return -EEXIST, then why trigger BUG_ON ? > Why insert the VM_BUG_ON? Because at the time I thought that it > asserted something useful; but I was mistaken, as explained above. > > How can the VM_BUG_ON trigger (without stack corruption, or something > of that kind)? I have no idea. > > We are in agreement: I now think that VM_BUG_ON is misleading and silly, > and sent Andrew a further patch to remove it a just couple of hours ago. > > Originally I was waiting to hear further from Dave; but his test > machine was giving trouble, and it occurred to me that, never mind > whether he says he has hit it again, or he has not hit it again, > the answer is the same: don't send that VM_BUG_ON upstream. > > Hugh Thanks Hugh. Another question. Why the function shmem_fallocate which you add to kernel need call shmem_getpage? Regards, Jaegeuk > >> Regards, >> Jaegeuk >> >>> So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's >>> not saying what I had intended to say with it, and would have been >>> wrong to say that anyway. It just looks stupid to me now, rather >>> like inserting a VM_BUG_ON(false) - but that does become interesting >>> when you report that you've hit it. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1768877Ab2KOT4K (ORCPT ); Thu, 15 Nov 2012 14:56:10 -0500 Received: from mail-yh0-f46.google.com ([209.85.213.46]:38009 "EHLO mail-yh0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1768751Ab2KOT4J (ORCPT ); Thu, 15 Nov 2012 14:56:09 -0500 Date: Thu, 15 Nov 2012 11:56:05 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Jaegeuk Hanse cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <50A49C46.9040406@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Offtopic... On Thu, 15 Nov 2012, Jaegeuk Hanse wrote: > > Another question. Why the function shmem_fallocate which you add to kernel > need call shmem_getpage? Because shmem_getpage(_gfp) is where shmem's page lookup and allocation complexities are handled. I assume the question behind your question is: why does shmem actually allocate pages for its fallocate, instead of just reserving the space? I did play with just reserving the space, with more special entries in the radix_tree to note the reservations made. It should be doable for the vm_enough_memory and sbinfo->used_blocks reservations. What absolutely deterred me from taking that path was the mem_cgroup case: shmem and swap and memcg are not easy to get working right together, and nobody would thank me for complicating memcg just for shmem_fallocate. By allocating pages, the pre-existing memcg code just works; if we used reservations instead, we would have to track their memcg charges in some additional new way. I see no justification for that complication. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751361Ab2KPAkX (ORCPT ); Thu, 15 Nov 2012 19:40:23 -0500 Received: from mail-pb0-f46.google.com ([209.85.160.46]:50292 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751083Ab2KPAkW (ORCPT ); Thu, 15 Nov 2012 19:40:22 -0500 Message-ID: <50A58B6E.8090609@gmail.com> Date: Fri, 16 Nov 2012 08:40:14 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/16/2012 03:56 AM, Hugh Dickins wrote: > Offtopic... > > On Thu, 15 Nov 2012, Jaegeuk Hanse wrote: >> Another question. Why the function shmem_fallocate which you add to kernel >> need call shmem_getpage? > Because shmem_getpage(_gfp) is where shmem's > page lookup and allocation complexities are handled. > > I assume the question behind your question is: why does shmem actually > allocate pages for its fallocate, instead of just reserving the space? Yeah, this is what I want to know. > > I did play with just reserving the space, with more special entries in > the radix_tree to note the reservations made. It should be doable for > the vm_enough_memory and sbinfo->used_blocks reservations. > > What absolutely deterred me from taking that path was the mem_cgroup > case: shmem and swap and memcg are not easy to get working right together, > and nobody would thank me for complicating memcg just for shmem_fallocate. > > By allocating pages, the pre-existing memcg code just works; if we used > reservations instead, we would have to track their memcg charges in some > additional new way. I see no justification for that complication. Oh, I see, thanks Hugh. :-) > Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751562Ab2KPJeb (ORCPT ); Fri, 16 Nov 2012 04:34:31 -0500 Received: from mail-pa0-f46.google.com ([209.85.220.46]:63823 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751231Ab2KPJe3 (ORCPT ); Fri, 16 Nov 2012 04:34:29 -0500 Message-ID: <50A6089B.7010708@gmail.com> Date: Fri, 16 Nov 2012 17:34:19 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/16/2012 03:56 AM, Hugh Dickins wrote: > Offtopic... > > On Thu, 15 Nov 2012, Jaegeuk Hanse wrote: >> Another question. Why the function shmem_fallocate which you add to kernel >> need call shmem_getpage? > Because shmem_getpage(_gfp) is where shmem's > page lookup and allocation complexities are handled. > > I assume the question behind your question is: why does shmem actually > allocate pages for its fallocate, instead of just reserving the space? > > I did play with just reserving the space, with more special entries in > the radix_tree to note the reservations made. It should be doable for > the vm_enough_memory and sbinfo->used_blocks reservations. > > What absolutely deterred me from taking that path was the mem_cgroup > case: shmem and swap and memcg are not easy to get working right together, > and nobody would thank me for complicating memcg just for shmem_fallocate. > > By allocating pages, the pre-existing memcg code just works; if we used > reservations instead, we would have to track their memcg charges in some > additional new way. I see no justification for that complication. Hi Hugh Some questions about your shmem/tmpfs: misc and fallocate patchset. - Since shmem_setattr can truncate tmpfs files, why need add another similar codes in function shmem_fallocate? What's the trick? - in tmpfs: support fallocate preallocation patch changelog: "Christoph Hellwig: What for exactly? Please explain why preallocating on tmpfs would make any sense. Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or -EOPNOTSUPP] on fallocate is just ugly." Could shmem/tmpfs fallocate prevent one process truncate the file which the second process mmap() and get SIGBUS when the second process access mmap but out of current size of file? Regards, Jaegeuk > Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754055Ab2KQEst (ORCPT ); Fri, 16 Nov 2012 23:48:49 -0500 Received: from mail-gh0-f174.google.com ([209.85.160.174]:43341 "EHLO mail-gh0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752054Ab2KQEsr (ORCPT ); Fri, 16 Nov 2012 23:48:47 -0500 Date: Fri, 16 Nov 2012 20:48:46 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Jaegeuk Hanse cc: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON In-Reply-To: <50A6089B.7010708@gmail.com> Message-ID: References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> <50A6089B.7010708@gmail.com> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Further offtopic.. On Fri, 16 Nov 2012, Jaegeuk Hanse wrote: > Some questions about your shmem/tmpfs: misc and fallocate patchset. > > - Since shmem_setattr can truncate tmpfs files, why need add another similar > codes in function shmem_fallocate? What's the trick? I don't know if I understand you. In general, hole-punching is different from truncation. Supporting the hole-punch mode of the fallocate system call is different from supporting truncation. They're closely related, and share code, but meet different specifications. > - in tmpfs: support fallocate preallocation patch changelog: > "Christoph Hellwig: What for exactly? Please explain why preallocating on > tmpfs would make any sense. > Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on > the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or > -EOPNOTSUPP] on fallocate is just ugly." > Could shmem/tmpfs fallocate prevent one process truncate the file which the > second process mmap() and get SIGBUS when the second process access mmap but > out of current size of file? Again, I don't know if I understand you. fallocate does not prevent truncation or races or SIGBUS. I believe that Kay meant that without using fallocate to allocate the memory in advance, systemd found it hard to protect itself from the possibility of getting a SIGBUS, if access to a shmem mapping happened to run out of memory/space in the middle. I never grasped why writing the file in advance was not good enough: fallocate happened to be what they hoped to use, and it was hard to deny it, given that tmpfs already supported hole-punching, and was about to convert to the fallocate interface for that. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752535Ab2KRA6K (ORCPT ); Sat, 17 Nov 2012 19:58:10 -0500 Received: from mail-da0-f46.google.com ([209.85.210.46]:48608 "EHLO mail-da0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752496Ab2KRA6I (ORCPT ); Sat, 17 Nov 2012 19:58:08 -0500 Message-ID: <50A83289.6020108@gmail.com> Date: Sun, 18 Nov 2012 08:57:45 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> <50A6089B.7010708@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/17/2012 12:48 PM, Hugh Dickins wrote: > Further offtopic.. Thanks for your explanation, Hugh. :-) > > On Fri, 16 Nov 2012, Jaegeuk Hanse wrote: >> Some questions about your shmem/tmpfs: misc and fallocate patchset. >> >> - Since shmem_setattr can truncate tmpfs files, why need add another similar >> codes in function shmem_fallocate? What's the trick? > I don't know if I understand you. In general, hole-punching is different > from truncation. Supporting the hole-punch mode of the fallocate system > call is different from supporting truncation. They're closely related, > and share code, but meet different specifications. What's the different between shmem/tmpfs hole-punching and truncate_setsize/truncate_pagecache? Do you mean one is punch hole in the file and the other one is shrink or extent the size of a file? >> - in tmpfs: support fallocate preallocation patch changelog: >> "Christoph Hellwig: What for exactly? Please explain why preallocating on >> tmpfs would make any sense. >> Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on >> the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or >> -EOPNOTSUPP] on fallocate is just ugly." >> Could shmem/tmpfs fallocate prevent one process truncate the file which the >> second process mmap() and get SIGBUS when the second process access mmap but >> out of current size of file? > Again, I don't know if I understand you. fallocate does not prevent > truncation or races or SIGBUS. I believe that Kay meant that without > using fallocate to allocate the memory in advance, systemd found it hard > to protect itself from the possibility of getting a SIGBUS, if access to > a shmem mapping happened to run out of memory/space in the middle. IIUC, it will return VM_xxx_OOM instead of SIGBUS if run out of memory. Then how can get SIGBUS in this scene? Regards, Jaegeuk > I never grasped why writing the file in advance was not good enough: > fallocate happened to be what they hoped to use, and it was hard to > deny it, given that tmpfs already supported hole-punching, and was > about to convert to the fallocate interface for that. > Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752550Ab2KRBsX (ORCPT ); Sat, 17 Nov 2012 20:48:23 -0500 Received: from mail-pb0-f46.google.com ([209.85.160.46]:40729 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752501Ab2KRBsW (ORCPT ); Sat, 17 Nov 2012 20:48:22 -0500 Message-ID: <50A83E5E.9060300@gmail.com> Date: Sun, 18 Nov 2012 09:48:14 +0800 From: Jaegeuk Hanse User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON References: <20121025023738.GA27001@redhat.com> <20121101191052.GA5884@redhat.com> <20121101232030.GA25519@redhat.com> <20121102014336.GA1727@redhat.com> <20121106135402.GA3543@redhat.com> <50A30ADD.9000209@gmail.com> <50A49C46.9040406@gmail.com> <50A6089B.7010708@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/17/2012 12:48 PM, Hugh Dickins wrote: > Further offtopic.. Hi Hugh, - I see you add this in vfs.txt: + fallocate: called by the VFS to preallocate blocks or punch a hole. I want to know if it's necessary to add it to man page since users still don't know fallocate can punch a hole from man fallocate. - in function shmem_fallocate: + else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced) + error = -ENOMEM; If this changelog "shmem_fallocate() compare counts and give up once the reactivated pages have started to coming back to writepage (approximately: some zones would in fact recycle faster than others)." describe why need this change? If the answer is yes, I have two questions. 1) how can guarantee it really don't need preallocation if just one or a few pages always reactivated, in this scene, nr_unswapped maybe grow bigger enough than shmem_falloc.nr_falloced 2) why return -ENOMEM, it's not really OOM, is it a trick or ...? Regards, Jaegeuk > > On Fri, 16 Nov 2012, Jaegeuk Hanse wrote: >> Some questions about your shmem/tmpfs: misc and fallocate patchset. >> >> - Since shmem_setattr can truncate tmpfs files, why need add another similar >> codes in function shmem_fallocate? What's the trick? > I don't know if I understand you. In general, hole-punching is different > from truncation. Supporting the hole-punch mode of the fallocate system > call is different from supporting truncation. They're closely related, > and share code, but meet different specifications. > >> - in tmpfs: support fallocate preallocation patch changelog: >> "Christoph Hellwig: What for exactly? Please explain why preallocating on >> tmpfs would make any sense. >> Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on >> the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or >> -EOPNOTSUPP] on fallocate is just ugly." >> Could shmem/tmpfs fallocate prevent one process truncate the file which the >> second process mmap() and get SIGBUS when the second process access mmap but >> out of current size of file? > Again, I don't know if I understand you. fallocate does not prevent > truncation or races or SIGBUS. I believe that Kay meant that without > using fallocate to allocate the memory in advance, systemd found it hard > to protect itself from the possibility of getting a SIGBUS, if access to > a shmem mapping happened to run out of memory/space in the middle. > > I never grasped why writing the file in advance was not good enough: > fallocate happened to be what they hoped to use, and it was hard to > deny it, given that tmpfs already supported hole-punching, and was > about to convert to the fallocate interface for that. > > Hugh