* RE: XFS memory allocation deadlock in 2.6.38 [not found] <081DDE43F61F3D43929A181B477DCA95639B52FD@MSXAOA6.twosigma.com> @ 2011-03-23 19:39 ` Sean Noonan 2011-03-24 17:43 ` Christoph Hellwig 0 siblings, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-23 19:39 UTC (permalink / raw) To: Sean Noonan, 'linux-kernel@vger.kernel.org' Cc: Martin Bligh, Trammell Hudson, Christos Zoulas, 'linux-xfs@oss.sgi.com', Stephen Degler I believe this patch fixes the behavior: diff --git a/mm/memory.c b/mm/memory.c index e48945a..740d5ab 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3461,7 +3461,9 @@ int make_pages_present(unsigned long addr, unsigned long end) * to break COW, except for shared mappings because these don't COW * and we would not want to dirty them for nothing. */ - write = (vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE; + write = (vma->vm_flags & VM_WRITE) != 0; + if (write && ((vma->vm_flags & VM_SHARED) !=0) && (vma->vm_file == NULL)) + write = 0; BUG_ON(addr >= end); BUG_ON(end > vma->vm_end); len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE; This was traced to the following commit: 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 is the first bad commit commit 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 Author: Michel Lespinasse <walken@google.com> Date: Thu Jan 13 15:46:09 2011 -0800 mlock: avoid dirtying pages and triggering writeback When faulting in pages for mlock(), we want to break COW for anonymous or file pages within VM_WRITABLE, non-VM_SHARED vmas. However, there is no need to write-fault into VM_SHARED vmas since shared file pages can be mlocked first and dirtied later, when/if they actually get written to. Skipping the write fault is desirable, as we don't want to unnecessarily cause these pages to be dirtied and queued for writeback. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Theodore Tso <tytso@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> :040000 040000 604eede2f45b7e5276ce9725b715ed15a868861d 3c175eadf4cf33d4f78d4d455c9a04f3df2c199e M mm -----Original Message----- From: Sean Noonan Sent: Monday, March 21, 2011 12:20 To: 'linux-kernel@vger.kernel.org' Cc: Trammell Hudson; Martin Bligh; Stephen Degler; Christos Zoulas Subject: XFS memory allocation deadlock in 2.6.38 This message was originally posted to the XFS mailing list, but received no responses. Thus, I am sending it to LKML on the advice of Martin. Using the attached program, we are able to reproduce this bug reliably. $ make vmtest $ ./vmtest /xfs/hugefile.dat $(( 16 * 1024 * 1024 * 1024 )) # vmtest <path_to_file> <size_in_bytes> /xfs/hugefile.dat: mapped 17179869184 bytes in 33822066943 ticks 749660: avg 13339 max 234667 ticks 371945: avg 26885 max 281616 ticks --- At this point, we see the following on the console: [593492.694806] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) [593506.724367] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) [593524.837717] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) [593556.742386] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) This is the same message presented in http://oss.sgi.com/bugzilla/show_bug.cgi?id=410 We started testing with 2.6.38-rc7 and have seen this bug through to the .0 release. This does not appear to be present in 2.6.33, but we have not done testing in between. We have tested with ext4 and do not encounter this bug. CONFIG_XFS_FS=y CONFIG_XFS_QUOTA=y CONFIG_XFS_POSIX_ACL=y CONFIG_XFS_RT=y # CONFIG_XFS_DEBUG is not set # CONFIG_VXFS_FS is not set Here is the stack from the process: [<ffffffff81357553>] call_rwsem_down_write_failed+0x13/0x20 [<ffffffff812ddf1e>] xfs_ilock+0x7e/0x110 [<ffffffff8130132f>] __xfs_get_blocks+0x8f/0x4e0 [<ffffffff813017b1>] xfs_get_blocks+0x11/0x20 [<ffffffff8114ba3e>] __block_write_begin+0x1ee/0x5b0 [<ffffffff8114be9d>] block_page_mkwrite+0x9d/0xf0 [<ffffffff81307e05>] xfs_vm_page_mkwrite+0x15/0x20 [<ffffffff810f2ddb>] do_wp_page+0x54b/0x820 [<ffffffff810f347c>] handle_pte_fault+0x3cc/0x820 [<ffffffff810f5145>] handle_mm_fault+0x175/0x2f0 [<ffffffff8102e399>] do_page_fault+0x159/0x470 [<ffffffff816cf6cf>] page_fault+0x1f/0x30 [<ffffffffffffffff>] 0xffffffffffffffff # uname -a Linux testhost 2.6.38 #2 SMP PREEMPT Fri Mar 18 15:00:59 GMT 2011 x86_64 GNU/Linux Please let me know if additional information is required. Thanks! Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-23 19:39 ` XFS memory allocation deadlock in 2.6.38 Sean Noonan @ 2011-03-24 17:43 ` Christoph Hellwig 2011-03-24 23:45 ` Michel Lespinasse 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2011-03-24 17:43 UTC (permalink / raw) To: Sean Noonan Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'linux-xfs@oss.sgi.com', linux-mm, walken Michel, can you take a look at this bug report? It looks like a regression in your mlock handling changes. On Wed, Mar 23, 2011 at 03:39:05PM -0400, Sean Noonan wrote: > I believe this patch fixes the behavior: > diff --git a/mm/memory.c b/mm/memory.c > index e48945a..740d5ab 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3461,7 +3461,9 @@ int make_pages_present(unsigned long addr, unsigned long end) > * to break COW, except for shared mappings because these don't COW > * and we would not want to dirty them for nothing. > */ > - write = (vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE; > + write = (vma->vm_flags & VM_WRITE) != 0; > + if (write && ((vma->vm_flags & VM_SHARED) !=0) && (vma->vm_file == NULL)) > + write = 0; > BUG_ON(addr >= end); > BUG_ON(end > vma->vm_end); > len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE; > > > This was traced to the following commit: > 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 is the first bad commit > commit 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 > Author: Michel Lespinasse <walken@google.com> > Date: Thu Jan 13 15:46:09 2011 -0800 > > mlock: avoid dirtying pages and triggering writeback > > When faulting in pages for mlock(), we want to break COW for anonymous or > file pages within VM_WRITABLE, non-VM_SHARED vmas. However, there is no > need to write-fault into VM_SHARED vmas since shared file pages can be > mlocked first and dirtied later, when/if they actually get written to. > Skipping the write fault is desirable, as we don't want to unnecessarily > cause these pages to be dirtied and queued for writeback. > > Signed-off-by: Michel Lespinasse <walken@google.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Nick Piggin <npiggin@kernel.dk> > Cc: Theodore Tso <tytso@google.com> > Cc: Michael Rubin <mrubin@google.com> > Cc: Suleiman Souhlal <suleiman@google.com> > Cc: Dave Chinner <david@fromorbit.com> > Cc: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > > :040000 040000 604eede2f45b7e5276ce9725b715ed15a868861d 3c175eadf4cf33d4f78d4d455c9a04f3df2c199e M mm > > > -----Original Message----- > From: Sean Noonan > Sent: Monday, March 21, 2011 12:20 > To: 'linux-kernel@vger.kernel.org' > Cc: Trammell Hudson; Martin Bligh; Stephen Degler; Christos Zoulas > Subject: XFS memory allocation deadlock in 2.6.38 > > This message was originally posted to the XFS mailing list, but received no responses. Thus, I am sending it to LKML on the advice of Martin. > > Using the attached program, we are able to reproduce this bug reliably. > $ make vmtest > $ ./vmtest /xfs/hugefile.dat $(( 16 * 1024 * 1024 * 1024 )) # vmtest <path_to_file> <size_in_bytes> > /xfs/hugefile.dat: mapped 17179869184 bytes in 33822066943 ticks > 749660: avg 13339 max 234667 ticks > 371945: avg 26885 max 281616 ticks > --- > At this point, we see the following on the console: > [593492.694806] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > [593506.724367] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > [593524.837717] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > [593556.742386] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > > This is the same message presented in > http://oss.sgi.com/bugzilla/show_bug.cgi?id=410 > > We started testing with 2.6.38-rc7 and have seen this bug through to the .0 release. This does not appear to be present in 2.6.33, but we have not done testing in between. We have tested with ext4 and do not encounter this bug. > CONFIG_XFS_FS=y > CONFIG_XFS_QUOTA=y > CONFIG_XFS_POSIX_ACL=y > CONFIG_XFS_RT=y > # CONFIG_XFS_DEBUG is not set > # CONFIG_VXFS_FS is not set > > Here is the stack from the process: > [<ffffffff81357553>] call_rwsem_down_write_failed+0x13/0x20 > [<ffffffff812ddf1e>] xfs_ilock+0x7e/0x110 > [<ffffffff8130132f>] __xfs_get_blocks+0x8f/0x4e0 > [<ffffffff813017b1>] xfs_get_blocks+0x11/0x20 > [<ffffffff8114ba3e>] __block_write_begin+0x1ee/0x5b0 > [<ffffffff8114be9d>] block_page_mkwrite+0x9d/0xf0 > [<ffffffff81307e05>] xfs_vm_page_mkwrite+0x15/0x20 > [<ffffffff810f2ddb>] do_wp_page+0x54b/0x820 > [<ffffffff810f347c>] handle_pte_fault+0x3cc/0x820 > [<ffffffff810f5145>] handle_mm_fault+0x175/0x2f0 > [<ffffffff8102e399>] do_page_fault+0x159/0x470 > [<ffffffff816cf6cf>] page_fault+0x1f/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff > > # uname -a > Linux testhost 2.6.38 #2 SMP PREEMPT Fri Mar 18 15:00:59 GMT 2011 x86_64 GNU/Linux > > Please let me know if additional information is required. > > Thanks! > > Sean > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs ---end quoted text--- _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-24 17:43 ` Christoph Hellwig @ 2011-03-24 23:45 ` Michel Lespinasse 2011-03-28 14:58 ` Sean Noonan 0 siblings, 1 reply; 32+ messages in thread From: Michel Lespinasse @ 2011-03-24 23:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Trammell Hudson, Christos Zoulas, Sean Noonan, Martin Bligh, linux-kernel@vger.kernel.org, Stephen Degler, linux-xfs@oss.sgi.com, linux-mm On Thu, Mar 24, 2011 at 10:43 AM, Christoph Hellwig <hch@infradead.org> wrote: > Michel, > > can you take a look at this bug report? It looks like a regression > in your mlock handling changes. I had a quick look and at this point I can describe how the patch will affect behavior of this test, but not why this causes a deadlock with xfs. The test creates a writable, shared mapping of a file that does not have data blocks allocated on disk, and also uses the MAP_POPULATE flag. Before 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272, make_pages_present during the mmap would cause data blocks to get allocated on disk with an xfs_vm_page_mkwrite call, and then the file pages would get mapped as writable ptes. After 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272, make_pages_present does NOT cause data blocks to get allocated on disk. Instead, xfs_vm_readpages is called, which (I suppose) does not allocate the data blocks and returns zero filled pages instead, which get mapped as readonly ptes. Later, the test tries writing into the mmap'ed block, causing minor page faults, xfs_vm_page_mkwrite calls and data block allocations to occur. Regarding the deadlock: I am curious to see if it could be made to happen before 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272. Could you test what happens if you remove the MAP_POPULATE flag from your mmap call, and instead read all pages from userspace right after the mmap ? I expect you would then be able to trigger the deadlock before 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272. This leaves the issue of the change of behavior for MAP_POPULATE on ftruncated file holes. I'm not sure what to say there though, because MAP_POPULATE is documented to cause file read-ahead (and it still does after 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272), but that doesn't say anything about block allocation. Hope this helps, -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-24 23:45 ` Michel Lespinasse @ 2011-03-28 14:58 ` Sean Noonan 2011-03-28 21:06 ` Michel Lespinasse 0 siblings, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-28 14:58 UTC (permalink / raw) To: 'Michel Lespinasse', Christoph Hellwig Cc: Trammell Hudson, Christos Zoulas, Bligh, linux-kernel@vger.kernel.org, Stephen Degler, linux-xfs@oss.sgi.com, linux-mm@kvack.org, Martin > Regarding the deadlock: I am curious to see if it could be made to > happen before 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272. Could you test > what happens if you remove the MAP_POPULATE flag from your mmap call, > and instead read all pages from userspace right after the mmap ? I > expect you would then be able to trigger the deadlock before > 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272. I still see the deadlock without MAP_POPULATE Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-28 14:58 ` Sean Noonan @ 2011-03-28 21:06 ` Michel Lespinasse 2011-03-28 21:34 ` Sean Noonan 0 siblings, 1 reply; 32+ messages in thread From: Michel Lespinasse @ 2011-03-28 21:06 UTC (permalink / raw) To: Sean Noonan Cc: linux-xfs@oss.sgi.com, Christos Zoulas, Trammell Hudson, Martin Bligh, linux-kernel@vger.kernel.org, Stephen Degler, Christoph Hellwig, linux-mm@kvack.org On Mon, Mar 28, 2011 at 7:58 AM, Sean Noonan <Sean.Noonan@twosigma.com> wrote: >> Regarding the deadlock: I am curious to see if it could be made to >> happen before 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272. Could you test >> what happens if you remove the MAP_POPULATE flag from your mmap call, >> and instead read all pages from userspace right after the mmap ? I >> expect you would then be able to trigger the deadlock before >> 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272. > > I still see the deadlock without MAP_POPULATE Could you test if you see the deadlock before 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 without MAP_POPULATE ? -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-28 21:06 ` Michel Lespinasse @ 2011-03-28 21:34 ` Sean Noonan 2011-03-29 0:25 ` Michel Lespinasse ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-28 21:34 UTC (permalink / raw) To: 'Michel Lespinasse' Cc: linux-xfs@oss.sgi.com, Christos Zoulas, Trammell Hudson, Martin Bligh, linux-kernel@vger.kernel.org, Stephen Degler, Christoph Hellwig, linux-mm@kvack.org > Could you test if you see the deadlock before > 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 without MAP_POPULATE ? Built and tested 72ddc8f72270758951ccefb7d190f364d20215ab. Confirmed that the original bug does not present in this version. Confirmed that removing MAP_POPULATE does cause the deadlock to occur. Here is the stack of the test: # cat /proc/3846/stack [<ffffffff812e8a64>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff81271c1d>] xfs_ilock+0x9d/0x110 [<ffffffff81271cae>] xfs_ilock_map_shared+0x1e/0x50 [<ffffffff81294985>] __xfs_get_blocks+0xc5/0x4e0 [<ffffffff81294dcc>] xfs_get_blocks+0xc/0x10 [<ffffffff811322c2>] do_mpage_readpage+0x462/0x660 [<ffffffff8113250a>] mpage_readpage+0x4a/0x60 [<ffffffff81295433>] xfs_vm_readpage+0x13/0x20 [<ffffffff810bb850>] filemap_fault+0x2d0/0x4e0 [<ffffffff810d8680>] __do_fault+0x50/0x510 [<ffffffff810da542>] handle_mm_fault+0x1a2/0xe60 [<ffffffff8102a466>] do_page_fault+0x146/0x440 [<ffffffff8164e6cf>] page_fault+0x1f/0x30 [<ffffffffffffffff>] 0xffffffffffffffff xfssyncd is stuck in D state. # cat /proc/2484/stack [<ffffffff8106ee1c>] down+0x3c/0x50 [<ffffffff81297802>] xfs_buf_lock+0x72/0x170 [<ffffffff8128762d>] xfs_getsb+0x1d/0x50 [<ffffffff8128e6af>] xfs_trans_getsb+0x5f/0x150 [<ffffffff8128821e>] xfs_mod_sb+0x4e/0xe0 [<ffffffff8126e4ea>] xfs_fs_log_dummy+0x5a/0xb0 [<ffffffff812a2a13>] xfs_sync_worker+0x83/0x90 [<ffffffff812a28e2>] xfssyncd+0x172/0x220 [<ffffffff81069576>] kthread+0x96/0xa0 [<ffffffff81003354>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-28 21:34 ` Sean Noonan @ 2011-03-29 0:25 ` Michel Lespinasse 2011-03-29 1:51 ` Dave Chinner 2011-03-29 19:05 ` Sean Noonan 2 siblings, 0 replies; 32+ messages in thread From: Michel Lespinasse @ 2011-03-29 0:25 UTC (permalink / raw) To: Sean Noonan Cc: linux-xfs@oss.sgi.com, Christos Zoulas, Trammell Hudson, Martin Bligh, linux-kernel@vger.kernel.org, Stephen Degler, Christoph Hellwig, linux-mm@kvack.org On Mon, Mar 28, 2011 at 2:34 PM, Sean Noonan <Sean.Noonan@twosigma.com> wrote: >> Could you test if you see the deadlock before >> 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 without MAP_POPULATE ? > > Built and tested 72ddc8f72270758951ccefb7d190f364d20215ab. > Confirmed that the original bug does not present in this version. > Confirmed that removing MAP_POPULATE does cause the deadlock to occur. It seems that the test (without MAP_POPULATE) reveals that the root cause is an xfs bug, which had been hidden up to now by MAP_POPULATE preallocating disk blocks (but could always be triggered by the same test without the MAP_POPULATE flag). I'm not sure how to go about debugging the xfs deadlock; it would probably be best if an xfs person could have a look ? -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-28 21:34 ` Sean Noonan 2011-03-29 0:25 ` Michel Lespinasse @ 2011-03-29 1:51 ` Dave Chinner 2011-03-29 2:49 ` Sean Noonan 2011-03-29 19:05 ` Sean Noonan 2 siblings, 1 reply; 32+ messages in thread From: Dave Chinner @ 2011-03-29 1:51 UTC (permalink / raw) To: Sean Noonan Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, linux-kernel@vger.kernel.org, Stephen Degler, Christoph Hellwig, linux-mm@kvack.org, linux-xfs@oss.sgi.com, 'Michel Lespinasse' On Mon, Mar 28, 2011 at 05:34:09PM -0400, Sean Noonan wrote: > > Could you test if you see the deadlock before > > 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 without MAP_POPULATE ? > > Built and tested 72ddc8f72270758951ccefb7d190f364d20215ab. > Confirmed that the original bug does not present in this version. > Confirmed that removing MAP_POPULATE does cause the deadlock to occur. > > Here is the stack of the test: > # cat /proc/3846/stack > [<ffffffff812e8a64>] call_rwsem_down_read_failed+0x14/0x30 > [<ffffffff81271c1d>] xfs_ilock+0x9d/0x110 > [<ffffffff81271cae>] xfs_ilock_map_shared+0x1e/0x50 > [<ffffffff81294985>] __xfs_get_blocks+0xc5/0x4e0 > [<ffffffff81294dcc>] xfs_get_blocks+0xc/0x10 > [<ffffffff811322c2>] do_mpage_readpage+0x462/0x660 > [<ffffffff8113250a>] mpage_readpage+0x4a/0x60 > [<ffffffff81295433>] xfs_vm_readpage+0x13/0x20 > [<ffffffff810bb850>] filemap_fault+0x2d0/0x4e0 > [<ffffffff810d8680>] __do_fault+0x50/0x510 > [<ffffffff810da542>] handle_mm_fault+0x1a2/0xe60 > [<ffffffff8102a466>] do_page_fault+0x146/0x440 > [<ffffffff8164e6cf>] page_fault+0x1f/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff Something else is holding the inode locked here. > xfssyncd is stuck in D state. > # cat /proc/2484/stack > [<ffffffff8106ee1c>] down+0x3c/0x50 > [<ffffffff81297802>] xfs_buf_lock+0x72/0x170 > [<ffffffff8128762d>] xfs_getsb+0x1d/0x50 > [<ffffffff8128e6af>] xfs_trans_getsb+0x5f/0x150 > [<ffffffff8128821e>] xfs_mod_sb+0x4e/0xe0 > [<ffffffff8126e4ea>] xfs_fs_log_dummy+0x5a/0xb0 > [<ffffffff812a2a13>] xfs_sync_worker+0x83/0x90 > [<ffffffff812a28e2>] xfssyncd+0x172/0x220 > [<ffffffff81069576>] kthread+0x96/0xa0 > [<ffffffff81003354>] kernel_thread_helper+0x4/0x10 > [<ffffffffffffffff>] 0xffffffffffffffff And this is indicating that something else is holding the superblock locked here. IOWs, whatever thread is having trouble with memory allocation is causing these threads to block and so they can be ignored. What's the stack trace of the thread that is throwing the "I can't allocating a page" errors? As it is, the question I'd really like answered is how a machine with 48GB RAM can possibly be short of memory when running mmap() on a 16GB file. The error that XFS is throwing indicates that the machine cannot allocate a single page of memory, so where has all your memory gone, and why hasn't the OOM killer been let off the leash? What is consuming the other 32GB of RAM or preventing it from being allocated? Also, I was unable to reproduce this at all on a machine with only 2GB of RAM, regardless of the kernel version and/or MAP_POPULATE, so I'm left to wonder what is special about your test system... Perhaps the output of xfs_bmap -vvp <file> after a successful vs deadlocked run would be instructive.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-29 1:51 ` Dave Chinner @ 2011-03-29 2:49 ` Sean Noonan 0 siblings, 0 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-29 2:49 UTC (permalink / raw) To: 'Dave Chinner' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, linux-kernel@vger.kernel.org, Stephen Degler, Christoph Hellwig, linux-mm@kvack.org, linux-xfs@oss.sgi.com, 'Michel Lespinasse' > As it is, the question I'd really like answered is how a machine with > 48GB RAM can possibly be short of memory when running mmap() on a > 16GB file. The error that XFS is throwing indicates that the > machine cannot allocate a single page of memory, so where has all > your memory gone, and why hasn't the OOM killer been let off the > leash? What is consuming the other 32GB of RAM or preventing it > from being allocated? Here's meminfo while a test was deadlocking. As you can see, we certainly aren't running out of RAM. # cat /proc/meminfo MemTotal: 49551548 kB MemFree: 44139876 kB Buffers: 5324 kB Cached: 4970552 kB SwapCached: 0 kB Active: 52772 kB Inactive: 4960624 kB Active(anon): 37864 kB Inactive(anon): 0 kB Active(file): 14908 kB Inactive(file): 4960624 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 4914084 kB Writeback: 0 kB AnonPages: 37636 kB Mapped: 4925460 kB Shmem: 280 kB Slab: 223212 kB SReclaimable: 176280 kB SUnreclaim: 46932 kB KernelStack: 3968 kB PageTables: 35228 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 47073968 kB Committed_AS: 86556 kB VmallocTotal: 34359738367 kB VmallocUsed: 380892 kB VmallocChunk: 34331773836 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 2048 kB DirectMap2M: 2086912 kB DirectMap1G: 48234496 kB > Perhaps the output of xfs_bmap -vvp <file> after a successful vs deadlocked run would be instructive.... I will try to get this tomorrow. Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-28 21:34 ` Sean Noonan 2011-03-29 0:25 ` Michel Lespinasse 2011-03-29 1:51 ` Dave Chinner @ 2011-03-29 19:05 ` Sean Noonan 2011-03-29 19:24 ` 'Christoph Hellwig' 2 siblings, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-29 19:05 UTC (permalink / raw) To: Sean Noonan, 'Michel Lespinasse' Cc: 'linux-xfs@oss.sgi.com', Christos Zoulas, Trammell Hudson, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org' >> Could you test if you see the deadlock before >> 5ecfda041e4b4bd858d25bbf5a16c2a6c06d7272 without MAP_POPULATE ? > Built and tested 72ddc8f72270758951ccefb7d190f364d20215ab. > Confirmed that the original bug does not present in this version. > Confirmed that removing MAP_POPULATE does cause the deadlock to occur. git bisect leads to this: bdfb04301fa5fdd95f219539a9a5b9663b1e5fc2 is the first bad commit commit bdfb04301fa5fdd95f219539a9a5b9663b1e5fc2 Author: Christoph Hellwig <hch@infradead.org> Date: Wed Jan 20 21:55:30 2010 +0000 xfs: replace KM_LARGE with explicit vmalloc use We use the KM_LARGE flag to make kmem_alloc and friends use vmalloc if necessary. As we only need this for a few boot/mount time allocations just switch to explicit vmalloc calls there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com> :040000 040000 1eed68ced17d8794fa842396c01c3b9677c6e709 d462932a318f8c823fa2a73156e980a688968cb2 M fs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:05 ` Sean Noonan @ 2011-03-29 19:24 ` 'Christoph Hellwig' 2011-03-29 19:39 ` Johannes Weiner ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: 'Christoph Hellwig' @ 2011-03-29 19:24 UTC (permalink / raw) To: Sean Noonan Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' Can you check if the brute force patch below helps? If it does I still need to refine it a bit, but it could be that we are doing an allocation under an xfs lock that could recurse back into the filesystem. We have a per-process flag to disable that for normal kmalloc allocation, but we lost it for vmalloc in the commit you bisected the regression to. Index: xfs/fs/xfs/linux-2.6/kmem.h =================================================================== --- xfs.orig/fs/xfs/linux-2.6/kmem.h 2011-03-29 21:16:58.039224236 +0200 +++ xfs/fs/xfs/linux-2.6/kmem.h 2011-03-29 21:17:08.368223598 +0200 @@ -63,7 +63,7 @@ static inline void *kmem_zalloc_large(si { void *ptr; - ptr = vmalloc(size); + ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL); if (ptr) memset(ptr, 0, size); return ptr; _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:24 ` 'Christoph Hellwig' @ 2011-03-29 19:39 ` Johannes Weiner 2011-03-29 19:43 ` 'Christoph Hellwig' 2011-03-29 19:46 ` Sean Noonan 2011-03-29 19:54 ` Sean Noonan 2 siblings, 1 reply; 32+ messages in thread From: Johannes Weiner @ 2011-03-29 19:39 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: Trammell Hudson, Christos Zoulas, Sean Noonan, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'linux-xfs@oss.sgi.com', 'linux-mm@kvack.org', 'Michel Lespinasse' On Tue, Mar 29, 2011 at 03:24:34PM -0400, 'Christoph Hellwig' wrote: > Can you check if the brute force patch below helps? If it does I > still need to refine it a bit, but it could be that we are doing > an allocation under an xfs lock that could recurse back into the > filesystem. We have a per-process flag to disable that for normal > kmalloc allocation, but we lost it for vmalloc in the commit you > bisected the regression to. > > > Index: xfs/fs/xfs/linux-2.6/kmem.h > =================================================================== > --- xfs.orig/fs/xfs/linux-2.6/kmem.h 2011-03-29 21:16:58.039224236 +0200 > +++ xfs/fs/xfs/linux-2.6/kmem.h 2011-03-29 21:17:08.368223598 +0200 > @@ -63,7 +63,7 @@ static inline void *kmem_zalloc_large(si > { > void *ptr; > > - ptr = vmalloc(size); > + ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL); > if (ptr) > memset(ptr, 0, size); > return ptr; Note that vmalloc is currently broken in that it does a GFP_KERNEL allocation if it has to allocate page table pages, even when invoked with GFP_NOFS: http://marc.info/?l=linux-mm&m=128942194520631&w=4 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:39 ` Johannes Weiner @ 2011-03-29 19:43 ` 'Christoph Hellwig' 0 siblings, 0 replies; 32+ messages in thread From: 'Christoph Hellwig' @ 2011-03-29 19:43 UTC (permalink / raw) To: Johannes Weiner Cc: Trammell Hudson, Christos Zoulas, Sean Noonan, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' On Tue, Mar 29, 2011 at 09:39:07PM +0200, Johannes Weiner wrote: > > - ptr = vmalloc(size); > > + ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL); > > if (ptr) > > memset(ptr, 0, size); > > return ptr; > > Note that vmalloc is currently broken in that it does a GFP_KERNEL > allocation if it has to allocate page table pages, even when invoked > with GFP_NOFS: > > http://marc.info/?l=linux-mm&m=128942194520631&w=4 Oh great. In that case we had a chance to hit the deadlock even before the offending commit, just a much smaller one. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:24 ` 'Christoph Hellwig' 2011-03-29 19:39 ` Johannes Weiner @ 2011-03-29 19:46 ` Sean Noonan 2011-03-29 20:02 ` 'Christoph Hellwig' 2011-03-29 19:54 ` Sean Noonan 2 siblings, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-29 19:46 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'linux-xfs@oss.sgi.com', 'linux-mm@kvack.org', 'Michel Lespinasse' > Can you check if the brute force patch below helps? No such luck. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:46 ` Sean Noonan @ 2011-03-29 20:02 ` 'Christoph Hellwig' 2011-03-29 20:23 ` Sean Noonan 2011-03-29 22:42 ` Dave Chinner 0 siblings, 2 replies; 32+ messages in thread From: 'Christoph Hellwig' @ 2011-03-29 20:02 UTC (permalink / raw) To: Sean Noonan Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' On Tue, Mar 29, 2011 at 03:46:21PM -0400, Sean Noonan wrote: > > Can you check if the brute force patch below helps? > > No such luck. Actually thinking about it - we never do the vmalloc under any fs lock, so this can't be the reason. But nothing else in the patch spring to mind either, so to narrow this down does reverting the patch on 2.6.38 also fix it? The revert isn't quite trivial due to changes since then, so here's the patch I came up with: Index: xfs/fs/xfs/linux-2.6/kmem.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/kmem.c 2011-03-29 21:55:12.871726512 +0200 +++ xfs/fs/xfs/linux-2.6/kmem.c 2011-03-29 21:55:31.648723706 +0200 @@ -16,6 +16,7 @@ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ #include <linux/mm.h> +#include <linux/vmalloc.h> #include <linux/highmem.h> #include <linux/slab.h> #include <linux/swap.h> @@ -25,25 +26,8 @@ #include "kmem.h" #include "xfs_message.h" -/* - * Greedy allocation. May fail and may return vmalloced memory. - * - * Must be freed using kmem_free_large. - */ -void * -kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) -{ - void *ptr; - size_t kmsize = maxsize; - - while (!(ptr = kmem_zalloc_large(kmsize))) { - if ((kmsize >>= 1) <= minsize) - kmsize = minsize; - } - if (ptr) - *size = kmsize; - return ptr; -} +#define MAX_VMALLOCS 6 +#define MAX_SLAB_SIZE 0x20000 void * kmem_alloc(size_t size, unsigned int __nocast flags) @@ -52,8 +36,19 @@ kmem_alloc(size_t size, unsigned int __n gfp_t lflags = kmem_flags_convert(flags); void *ptr; +#ifdef DEBUG + if (unlikely(!(flags & KM_LARGE) && (size > PAGE_SIZE))) { + printk(KERN_WARNING "Large %s attempt, size=%ld\n", + __func__, (long)size); + dump_stack(); + } +#endif + do { - ptr = kmalloc(size, lflags); + if (size < MAX_SLAB_SIZE || retries > MAX_VMALLOCS) + ptr = kmalloc(size, lflags); + else + ptr = __vmalloc(size, lflags, PAGE_KERNEL); if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP))) return ptr; if (!(++retries % 100)) @@ -75,6 +70,27 @@ kmem_zalloc(size_t size, unsigned int __ return ptr; } +void * +kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize, + unsigned int __nocast flags) +{ + void *ptr; + size_t kmsize = maxsize; + unsigned int kmflags = (flags & ~KM_SLEEP) | KM_NOSLEEP; + + while (!(ptr = kmem_zalloc(kmsize, kmflags))) { + if ((kmsize <= minsize) && (flags & KM_NOSLEEP)) + break; + if ((kmsize >>= 1) <= minsize) { + kmsize = minsize; + kmflags = flags; + } + } + if (ptr) + *size = kmsize; + return ptr; +} + void kmem_free(const void *ptr) { Index: xfs/fs/xfs/linux-2.6/kmem.h =================================================================== --- xfs.orig/fs/xfs/linux-2.6/kmem.h 2011-03-29 21:55:12.879725146 +0200 +++ xfs/fs/xfs/linux-2.6/kmem.h 2011-03-29 21:55:31.652725467 +0200 @@ -21,7 +21,6 @@ #include <linux/slab.h> #include <linux/sched.h> #include <linux/mm.h> -#include <linux/vmalloc.h> /* * General memory allocation interfaces @@ -31,6 +30,7 @@ #define KM_NOSLEEP 0x0002u #define KM_NOFS 0x0004u #define KM_MAYFAIL 0x0008u +#define KM_LARGE 0x0010u /* * We use a special process flag to avoid recursive callbacks into @@ -42,7 +42,7 @@ kmem_flags_convert(unsigned int __nocast { gfp_t lflags; - BUG_ON(flags & ~(KM_SLEEP|KM_NOSLEEP|KM_NOFS|KM_MAYFAIL)); + BUG_ON(flags & ~(KM_SLEEP|KM_NOSLEEP|KM_NOFS|KM_MAYFAIL|KM_LARGE)); if (flags & KM_NOSLEEP) { lflags = GFP_ATOMIC | __GFP_NOWARN; @@ -56,25 +56,10 @@ kmem_flags_convert(unsigned int __nocast extern void *kmem_alloc(size_t, unsigned int __nocast); extern void *kmem_zalloc(size_t, unsigned int __nocast); +extern void *kmem_zalloc_greedy(size_t *, size_t, size_t, unsigned int __nocast); extern void *kmem_realloc(const void *, size_t, size_t, unsigned int __nocast); extern void kmem_free(const void *); -static inline void *kmem_zalloc_large(size_t size) -{ - void *ptr; - - ptr = vmalloc(size); - if (ptr) - memset(ptr, 0, size); - return ptr; -} -static inline void kmem_free_large(void *ptr) -{ - vfree(ptr); -} - -extern void *kmem_zalloc_greedy(size_t *, size_t, size_t); - /* * Zone interfaces */ Index: xfs/fs/xfs/quota/xfs_qm.c =================================================================== --- xfs.orig/fs/xfs/quota/xfs_qm.c 2011-03-29 21:55:12.859726589 +0200 +++ xfs/fs/xfs/quota/xfs_qm.c 2011-03-29 21:55:41.387278609 +0200 @@ -110,11 +110,12 @@ xfs_Gqm_init(void) */ udqhash = kmem_zalloc_greedy(&hsize, XFS_QM_HASHSIZE_LOW * sizeof(xfs_dqhash_t), - XFS_QM_HASHSIZE_HIGH * sizeof(xfs_dqhash_t)); + XFS_QM_HASHSIZE_HIGH * sizeof(xfs_dqhash_t), + KM_SLEEP | KM_MAYFAIL | KM_LARGE); if (!udqhash) goto out; - gdqhash = kmem_zalloc_large(hsize); + gdqhash = kmem_zalloc(hsize, KM_SLEEP | KM_LARGE); if (!gdqhash) goto out_free_udqhash; @@ -171,7 +172,7 @@ xfs_Gqm_init(void) return xqm; out_free_udqhash: - kmem_free_large(udqhash); + kmem_free(udqhash); out: return NULL; } @@ -194,8 +195,8 @@ xfs_qm_destroy( xfs_qm_list_destroy(&(xqm->qm_usr_dqhtable[i])); xfs_qm_list_destroy(&(xqm->qm_grp_dqhtable[i])); } - kmem_free_large(xqm->qm_usr_dqhtable); - kmem_free_large(xqm->qm_grp_dqhtable); + kmem_free(xqm->qm_usr_dqhtable); + kmem_free(xqm->qm_grp_dqhtable); xqm->qm_usr_dqhtable = NULL; xqm->qm_grp_dqhtable = NULL; xqm->qm_dqhashmask = 0; Index: xfs/fs/xfs/xfs_itable.c =================================================================== --- xfs.orig/fs/xfs/xfs_itable.c 2011-03-29 21:55:12.851725366 +0200 +++ xfs/fs/xfs/xfs_itable.c 2011-03-29 21:55:31.660724287 +0200 @@ -259,10 +259,8 @@ xfs_bulkstat( (XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog); nimask = ~(nicluster - 1); nbcluster = nicluster >> mp->m_sb.sb_inopblog; - irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4); - if (!irbuf) - return ENOMEM; - + irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4, + KM_SLEEP | KM_MAYFAIL | KM_LARGE); nirbuf = irbsize / sizeof(*irbuf); /* @@ -527,7 +525,7 @@ xfs_bulkstat( /* * Done, we're either out of filesystem or space to put the data. */ - kmem_free_large(irbuf); + kmem_free(irbuf); *ubcountp = ubelem; /* * Found some inodes, return them now and return the error next time. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-29 20:02 ` 'Christoph Hellwig' @ 2011-03-29 20:23 ` Sean Noonan 2011-03-29 22:42 ` Dave Chinner 1 sibling, 0 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-29 20:23 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'linux-xfs@oss.sgi.com', 'linux-mm@kvack.org', 'Michel Lespinasse' > mind either, so to narrow this down does reverting the patch on > 2.6.38 also fix it? The revert isn't quite trivial due to changes > since then, so here's the patch I came up with: This patch does fix the problem. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 20:02 ` 'Christoph Hellwig' 2011-03-29 20:23 ` Sean Noonan @ 2011-03-29 22:42 ` Dave Chinner 2011-03-29 22:45 ` Sean Noonan 2011-03-30 9:23 ` 'Christoph Hellwig' 1 sibling, 2 replies; 32+ messages in thread From: Dave Chinner @ 2011-03-29 22:42 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: Trammell Hudson, Christos Zoulas, Sean Noonan, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'linux-xfs@oss.sgi.com', 'linux-mm@kvack.org', 'Michel Lespinasse' On Tue, Mar 29, 2011 at 04:02:56PM -0400, 'Christoph Hellwig' wrote: > On Tue, Mar 29, 2011 at 03:46:21PM -0400, Sean Noonan wrote: > > > Can you check if the brute force patch below helps? > > > > No such luck. > > Actually thinking about it - we never do the vmalloc under any fs lock, > so this can't be the reason. But nothing else in the patch spring to > mind either, so to narrow this down does reverting the patch on > 2.6.38 also fix it? The revert isn't quite trivial due to changes > since then, so here's the patch I came up with: > > > Index: xfs/fs/xfs/linux-2.6/kmem.c > =================================================================== > --- xfs.orig/fs/xfs/linux-2.6/kmem.c 2011-03-29 21:55:12.871726512 +0200 > +++ xfs/fs/xfs/linux-2.6/kmem.c 2011-03-29 21:55:31.648723706 +0200 > @@ -16,6 +16,7 @@ > * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > */ > #include <linux/mm.h> > +#include <linux/vmalloc.h> > #include <linux/highmem.h> > #include <linux/slab.h> > #include <linux/swap.h> > @@ -25,25 +26,8 @@ > #include "kmem.h" > #include "xfs_message.h" > > -/* > - * Greedy allocation. May fail and may return vmalloced memory. > - * > - * Must be freed using kmem_free_large. > - */ > -void * > -kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > -{ > - void *ptr; > - size_t kmsize = maxsize; > - > - while (!(ptr = kmem_zalloc_large(kmsize))) { > - if ((kmsize >>= 1) <= minsize) > - kmsize = minsize; > - } > - if (ptr) > - *size = kmsize; > - return ptr; > -} > +#define MAX_VMALLOCS 6 > +#define MAX_SLAB_SIZE 0x20000 Why those values for the magic numbers? .... > Index: xfs/fs/xfs/quota/xfs_qm.c > =================================================================== > --- xfs.orig/fs/xfs/quota/xfs_qm.c 2011-03-29 21:55:12.859726589 +0200 > +++ xfs/fs/xfs/quota/xfs_qm.c 2011-03-29 21:55:41.387278609 +0200 > @@ -110,11 +110,12 @@ xfs_Gqm_init(void) > */ > udqhash = kmem_zalloc_greedy(&hsize, > XFS_QM_HASHSIZE_LOW * sizeof(xfs_dqhash_t), > - XFS_QM_HASHSIZE_HIGH * sizeof(xfs_dqhash_t)); > + XFS_QM_HASHSIZE_HIGH * sizeof(xfs_dqhash_t), > + KM_SLEEP | KM_MAYFAIL | KM_LARGE); > if (!udqhash) > goto out; > > - gdqhash = kmem_zalloc_large(hsize); > + gdqhash = kmem_zalloc(hsize, KM_SLEEP | KM_LARGE); Needs a KM_MAYFAIL as well? > if (!gdqhash) > goto out_free_udqhash; > .... > Index: xfs/fs/xfs/xfs_itable.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_itable.c 2011-03-29 21:55:12.851725366 +0200 > +++ xfs/fs/xfs/xfs_itable.c 2011-03-29 21:55:31.660724287 +0200 > @@ -259,10 +259,8 @@ xfs_bulkstat( > (XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog); > nimask = ~(nicluster - 1); > nbcluster = nicluster >> mp->m_sb.sb_inopblog; > - irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4); > - if (!irbuf) > - return ENOMEM; > - > + irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4, > + KM_SLEEP | KM_MAYFAIL | KM_LARGE); > nirbuf = irbsize / sizeof(*irbuf); Need to keep the if (!irbuf) check as KM_MAYFAIL is passed. Cheers, Dave -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-29 22:42 ` Dave Chinner @ 2011-03-29 22:45 ` Sean Noonan 2011-03-30 9:23 ` 'Christoph Hellwig' 1 sibling, 0 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-29 22:45 UTC (permalink / raw) To: 'Dave Chinner', 'Christoph Hellwig' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Degler, 'linux-xfs@oss.sgi.com', 'linux-mm@kvack.org', Stephen, 'Michel Lespinasse' > Need to keep the if (!irbuf) check as KM_MAYFAIL is passed. It wasn't in before the bug presented, so leaving it in wouldn't be a true test as to whether the bug has been tracked to the correct place. I'll test again with the if (!irbuf). Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 22:42 ` Dave Chinner 2011-03-29 22:45 ` Sean Noonan @ 2011-03-30 9:23 ` 'Christoph Hellwig' 1 sibling, 0 replies; 32+ messages in thread From: 'Christoph Hellwig' @ 2011-03-30 9:23 UTC (permalink / raw) To: Dave Chinner Cc: Trammell Hudson, Christos Zoulas, Sean Noonan, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' On Wed, Mar 30, 2011 at 09:42:30AM +1100, Dave Chinner wrote: > > +#define MAX_VMALLOCS 6 > > +#define MAX_SLAB_SIZE 0x20000 > > Why those values for the magic numbers? Ask the person who added it originall, it's just a revert to the code before my commit to clean up our vmalloc usage. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:24 ` 'Christoph Hellwig' 2011-03-29 19:39 ` Johannes Weiner 2011-03-29 19:46 ` Sean Noonan @ 2011-03-29 19:54 ` Sean Noonan 2011-03-30 0:09 ` Dave Chinner 2 siblings, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-29 19:54 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'linux-xfs@oss.sgi.com', 'linux-mm@kvack.org', 'Michel Lespinasse' > Can you check if the brute force patch below helps? Not sure if this helps at all, but here is the stack from all three processes involved. This is without MAP_POPULATE and with the patch you just sent. # ps aux | grep 'D[+]*[[:space:]]' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2314 0.2 0.0 0 0 ? D 19:44 0:00 [flush-8:0] root 2402 0.0 0.0 0 0 ? D 19:44 0:00 [xfssyncd/sda9] root 3861 2.6 9.9 16785280 4912848 pts/0 D+ 19:45 0:07 ./vmtest /xfs/hugefile.dat 17179869184 # for p in 2314 2402 3861; do echo $p; cat /proc/$p/stack; done 2314 [<ffffffff810d634a>] congestion_wait+0x7a/0x130 [<ffffffff8129721c>] kmem_alloc+0x6c/0xf0 [<ffffffff8127c07e>] xfs_inode_item_format+0x36e/0x3b0 [<ffffffff8128401f>] xfs_log_commit_cil+0x4f/0x3b0 [<ffffffff8128ff31>] _xfs_trans_commit+0x1f1/0x2b0 [<ffffffff8127c716>] xfs_iomap_write_allocate+0x1a6/0x340 [<ffffffff81298883>] xfs_map_blocks+0x193/0x2c0 [<ffffffff812992fa>] xfs_vm_writepage+0x1ca/0x520 [<ffffffff810c4bd2>] __writepage+0x12/0x40 [<ffffffff810c53dd>] write_cache_pages+0x1dd/0x4f0 [<ffffffff810c573c>] generic_writepages+0x4c/0x70 [<ffffffff812986b8>] xfs_vm_writepages+0x58/0x70 [<ffffffff810c577c>] do_writepages+0x1c/0x40 [<ffffffff811247d1>] writeback_single_inode+0xf1/0x240 [<ffffffff81124edd>] writeback_sb_inodes+0xdd/0x1b0 [<ffffffff81125966>] writeback_inodes_wb+0x76/0x160 [<ffffffff81125d93>] wb_writeback+0x343/0x550 [<ffffffff81126126>] wb_do_writeback+0x186/0x2e0 [<ffffffff81126342>] bdi_writeback_thread+0xc2/0x310 [<ffffffff81067846>] kthread+0x96/0xa0 [<ffffffff8165a414>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff 2402 [<ffffffff8106d0ec>] down+0x3c/0x50 [<ffffffff8129a7bd>] xfs_buf_lock+0x5d/0x170 [<ffffffff8128a87d>] xfs_getsb+0x1d/0x50 [<ffffffff81291bcf>] xfs_trans_getsb+0x5f/0x150 [<ffffffff8128b80e>] xfs_mod_sb+0x4e/0xe0 [<ffffffff81271dbf>] xfs_fs_log_dummy+0x4f/0x90 [<ffffffff812a61c1>] xfs_sync_worker+0x81/0x90 [<ffffffff812a6092>] xfssyncd+0x172/0x220 [<ffffffff81067846>] kthread+0x96/0xa0 [<ffffffff8165a414>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff 3861 [<ffffffff812ec744>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff812754dd>] xfs_ilock+0x9d/0x110 [<ffffffff8127556e>] xfs_ilock_map_shared+0x1e/0x50 [<ffffffff81297c45>] __xfs_get_blocks+0xc5/0x4e0 [<ffffffff8129808c>] xfs_get_blocks+0xc/0x10 [<ffffffff81135ca2>] do_mpage_readpage+0x462/0x660 [<ffffffff81135eea>] mpage_readpage+0x4a/0x60 [<ffffffff812986e3>] xfs_vm_readpage+0x13/0x20 [<ffffffff810bd150>] filemap_fault+0x2d0/0x4e0 [<ffffffff810db0a0>] __do_fault+0x50/0x4f0 [<ffffffff810db85e>] handle_pte_fault+0x7e/0xc90 [<ffffffff810ddbf8>] handle_mm_fault+0x138/0x230 [<ffffffff8102b37c>] do_page_fault+0x12c/0x420 [<ffffffff81658fcf>] page_fault+0x1f/0x30 [<ffffffffffffffff>] 0xffffffffffffffff _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-29 19:54 ` Sean Noonan @ 2011-03-30 0:09 ` Dave Chinner 2011-03-30 1:32 ` Sean Noonan 2011-03-30 9:30 ` 'Christoph Hellwig' 0 siblings, 2 replies; 32+ messages in thread From: Dave Chinner @ 2011-03-30 0:09 UTC (permalink / raw) To: Sean Noonan Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' On Tue, Mar 29, 2011 at 03:54:12PM -0400, Sean Noonan wrote: > > Can you check if the brute force patch below helps? > > Not sure if this helps at all, but here is the stack from all three processes involved. This is without MAP_POPULATE and with the patch you just sent. > > # ps aux | grep 'D[+]*[[:space:]]' > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 2314 0.2 0.0 0 0 ? D 19:44 0:00 [flush-8:0] > root 2402 0.0 0.0 0 0 ? D 19:44 0:00 [xfssyncd/sda9] > root 3861 2.6 9.9 16785280 4912848 pts/0 D+ 19:45 0:07 ./vmtest /xfs/hugefile.dat 17179869184 > > # for p in 2314 2402 3861; do echo $p; cat /proc/$p/stack; done > 2314 > [<ffffffff810d634a>] congestion_wait+0x7a/0x130 > [<ffffffff8129721c>] kmem_alloc+0x6c/0xf0 > [<ffffffff8127c07e>] xfs_inode_item_format+0x36e/0x3b0 > [<ffffffff8128401f>] xfs_log_commit_cil+0x4f/0x3b0 > [<ffffffff8128ff31>] _xfs_trans_commit+0x1f1/0x2b0 > [<ffffffff8127c716>] xfs_iomap_write_allocate+0x1a6/0x340 > [<ffffffff81298883>] xfs_map_blocks+0x193/0x2c0 > [<ffffffff812992fa>] xfs_vm_writepage+0x1ca/0x520 > [<ffffffff810c4bd2>] __writepage+0x12/0x40 > [<ffffffff810c53dd>] write_cache_pages+0x1dd/0x4f0 > [<ffffffff810c573c>] generic_writepages+0x4c/0x70 > [<ffffffff812986b8>] xfs_vm_writepages+0x58/0x70 > [<ffffffff810c577c>] do_writepages+0x1c/0x40 > [<ffffffff811247d1>] writeback_single_inode+0xf1/0x240 > [<ffffffff81124edd>] writeback_sb_inodes+0xdd/0x1b0 > [<ffffffff81125966>] writeback_inodes_wb+0x76/0x160 > [<ffffffff81125d93>] wb_writeback+0x343/0x550 > [<ffffffff81126126>] wb_do_writeback+0x186/0x2e0 > [<ffffffff81126342>] bdi_writeback_thread+0xc2/0x310 > [<ffffffff81067846>] kthread+0x96/0xa0 > [<ffffffff8165a414>] kernel_thread_helper+0x4/0x10 > [<ffffffffffffffff>] 0xffffffffffffffff So, it's trying to allocate a buffer for the inode extent list, so should only be a couple of hundred bytes, and at most ~2kB if you are using large inodes. That still doesn't seem like it should be having memory allocation problems here with 44GB of free RAM.... Hmmmm. I wonder - the process is doing a random walk of 16GB, so it's probably created tens of thousands of delayed allocation extents before any real allocation was done. xfs_inode_item_format() uses the in-core data fork size for the extent buffer allocation which in this case would be much larger than what can possibly fit inside the inode data fork. Lets see - worst case is 8GB of sparse blocks, which is 2^21 delalloc blocks, which gives a worst case allocation size of 2^21 * sizeof(struct xfs_bmbt_rec), which is roughly 64MB. Which would overflow the return value. Even at 1k delalloc extents, we'll be asking for an order-15 allocation when all we really need is an order-0 allocation. Ok, so that looks like root cause of the problem. can you try the patch below to see if it fixes the problem (without any other patches applied or reverted). Cheers,, Dave. -- Dave Chinner david@fromorbit.com xfs: fix extent format buffer allocation size From: Dave Chinner <dchinner@redhat.com> When formatting an inode item, we have to allocate a separate buffer to hold extents when there are delayed allocation extents on the inode and it is in extent format. The allocation size is derived from the in-core data fork representation, which accounts for delayed allocation extents, while the on-disk representation does not contain any delalloc extents. As a result of this mismatch, the allocated buffer can be far larger than needed to hold the real extent list which, due to the fact the inode is in extent format, is limited to the size of the literal area of the inode. However, we can have thousands of delalloc extents, resulting in an allocation size orders of magnitude larger than is needed to hold all the real extents. Fix this by limiting the size of the buffer being allocated to the size of the literal area of the inodes in the filesystem (i.e. the maximum size an inode fork can grow to). Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_inode_item.c | 69 ++++++++++++++++++++++++++++------------------ 1 files changed, 42 insertions(+), 27 deletions(-) diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 46cc401..12cdc39 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -198,6 +198,43 @@ xfs_inode_item_size( } /* + * xfs_inode_item_format_extents - convert in-core extents to on-disk form + * + * For either the data or attr fork in extent format, we need to endian convert + * the in-core extent as we place them into the on-disk inode. In this case, we + * ned to do this conversion before we write the extents into the log. Because + * we don't have the disk inode to write into here, we allocate a buffer and + * format the extents into it via xfs_iextents_copy(). We free the buffer in + * the unlock routine after the copy for the log has been made. + * + * For the data fork, there can be delayed allocation extents + * in the inode as well, so the in-core data fork can be much larger than the + * on-disk data representation of real inodes. Hence we need to limit the size + * of the allocation to what will fit in the inode fork, otherwise we could be + * asking for excessively large allocation sizes. + */ +STATIC void +xfs_inode_item_format_extents( + struct xfs_inode *ip, + struct xfs_log_iovec *vecp, + int whichfork, + int type) +{ + xfs_bmbt_rec_t *ext_buffer; + + ext_buffer = kmem_alloc(XFS_IFORK_SIZE(ip, whichfork), + KM_SLEEP | KM_NOFS); + if (whichfork == XFS_DATA_FORK) + ip->i_itemp->ili_extents_buf = ext_buffer; + else + ip->i_itemp->ili_aextents_buf = ext_buffer; + + vecp->i_addr = ext_buffer; + vecp->i_len = xfs_iextents_copy(ip, ext_buffer, whichfork); + vecp->i_type = type; +} + +/* * This is called to fill in the vector of log iovecs for the * given inode log item. It fills the first item with an inode * log format structure, the second with the on-disk inode structure, @@ -213,7 +250,6 @@ xfs_inode_item_format( struct xfs_inode *ip = iip->ili_inode; uint nvecs; size_t data_bytes; - xfs_bmbt_rec_t *ext_buffer; xfs_mount_t *mp; vecp->i_addr = &iip->ili_format; @@ -320,22 +356,8 @@ xfs_inode_item_format( } else #endif { - /* - * There are delayed allocation extents - * in the inode, or we need to convert - * the extents to on disk format. - * Use xfs_iextents_copy() - * to copy only the real extents into - * a separate buffer. We'll free the - * buffer in the unlock routine. - */ - ext_buffer = kmem_alloc(ip->i_df.if_bytes, - KM_SLEEP); - iip->ili_extents_buf = ext_buffer; - vecp->i_addr = ext_buffer; - vecp->i_len = xfs_iextents_copy(ip, ext_buffer, - XFS_DATA_FORK); - vecp->i_type = XLOG_REG_TYPE_IEXT; + xfs_inode_item_format_extents(ip, vecp, + XFS_DATA_FORK, XLOG_REG_TYPE_IEXT); } ASSERT(vecp->i_len <= ip->i_df.if_bytes); iip->ili_format.ilf_dsize = vecp->i_len; @@ -445,19 +467,12 @@ xfs_inode_item_format( */ vecp->i_addr = ip->i_afp->if_u1.if_extents; vecp->i_len = ip->i_afp->if_bytes; + vecp->i_type = XLOG_REG_TYPE_IATTR_EXT; #else ASSERT(iip->ili_aextents_buf == NULL); - /* - * Need to endian flip before logging - */ - ext_buffer = kmem_alloc(ip->i_afp->if_bytes, - KM_SLEEP); - iip->ili_aextents_buf = ext_buffer; - vecp->i_addr = ext_buffer; - vecp->i_len = xfs_iextents_copy(ip, ext_buffer, - XFS_ATTR_FORK); + xfs_inode_item_format_extents(ip, vecp, + XFS_ATTR_FORK, XLOG_REG_TYPE_IATTR_EXT); #endif - vecp->i_type = XLOG_REG_TYPE_IATTR_EXT; iip->ili_format.ilf_asize = vecp->i_len; vecp++; nvecs++; _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-30 0:09 ` Dave Chinner @ 2011-03-30 1:32 ` Sean Noonan 2011-03-30 1:44 ` Dave Chinner 2011-03-30 9:30 ` 'Christoph Hellwig' 1 sibling, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-30 1:32 UTC (permalink / raw) To: 'Dave Chinner' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' > Ok, so that looks like root cause of the problem. can you try the > patch below to see if it fixes the problem (without any other > patches applied or reverted). It looks like this does fix the deadlock problem. However, it appears to come at the price of significantly higher mmap startup costs. # ./vmtest /xfs/hugefile.dat $(( 16 * 1024 * 1024 * 1024 )) /xfs/d-1/hugefile.dat: mapped 17179869184 bytes in 324387362198 ticks Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-30 1:32 ` Sean Noonan @ 2011-03-30 1:44 ` Dave Chinner 2011-03-30 1:52 ` Sean Noonan 0 siblings, 1 reply; 32+ messages in thread From: Dave Chinner @ 2011-03-30 1:44 UTC (permalink / raw) To: Sean Noonan Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' On Tue, Mar 29, 2011 at 09:32:06PM -0400, Sean Noonan wrote: > > Ok, so that looks like root cause of the problem. can you try the > > patch below to see if it fixes the problem (without any other > > patches applied or reverted). > > It looks like this does fix the deadlock problem. However, it > appears to come at the price of significantly higher mmap startup > costs. It shouldn't make any difference to startup costs with the current code uses read faults to populate the region and that doesn't cause any allocation to occur and hence this code is not executed during the populate phase. Is this repeatable or is it just a one-off result? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-30 1:44 ` Dave Chinner @ 2011-03-30 1:52 ` Sean Noonan 0 siblings, 0 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-30 1:52 UTC (permalink / raw) To: 'Dave Chinner' Cc: Trammell Hudson, Christos Zoulas, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' > Is this repeatable or is it just a one-off result? It was repeated three times before I sent the email, but I can't reproduce it again now. Call it a fluke. Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-30 0:09 ` Dave Chinner 2011-03-30 1:32 ` Sean Noonan @ 2011-03-30 9:30 ` 'Christoph Hellwig' 1 sibling, 0 replies; 32+ messages in thread From: 'Christoph Hellwig' @ 2011-03-30 9:30 UTC (permalink / raw) To: Dave Chinner Cc: Trammell Hudson, Christos Zoulas, Sean Noonan, Martin Bligh, 'linux-kernel@vger.kernel.org', Stephen Degler, 'Christoph Hellwig', 'linux-mm@kvack.org', 'linux-xfs@oss.sgi.com', 'Michel Lespinasse' On Wed, Mar 30, 2011 at 11:09:42AM +1100, Dave Chinner wrote: > + ext_buffer = kmem_alloc(XFS_IFORK_SIZE(ip, whichfork), > + KM_SLEEP | KM_NOFS); The old code didn't use KM_NOFS, and I don't think it needed it either, as we call the iop_format handlers inside the region covered by the PF_FSTRANS flag. Also I think the routine needs to be under #ifndef XFS_NATIVE_HOST, as we do not use it for big endian builds. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* XFS memory allocation deadlock in 2.6.38
@ 2011-03-18 15:46 Sean Noonan
2011-03-22 10:00 ` Christoph Hellwig
2011-03-22 23:25 ` Dave Chinner
0 siblings, 2 replies; 32+ messages in thread
From: Sean Noonan @ 2011-03-18 15:46 UTC (permalink / raw)
To: 'linux-xfs@oss.sgi.com'; +Cc: Martin Bligh, Trammell Hudson
[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]
Using the attached program, we are able to reproduce this bug reliably.
$ make vmtest
$ ./vmtest /xfs/hugefile.dat $(( 16 * 1024 * 1024 * 1024 )) # vmtest <path_to_file> <size_in_bytes>
/xfs/hugefile.dat: mapped 17179869184 bytes in 33822066943 ticks
749660: avg 13339 max 234667 ticks
371945: avg 26885 max 281616 ticks
---
At this point, we see the following on the console:
[593492.694806] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[593506.724367] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[593524.837717] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
[593556.742386] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
This is the same message presented in
http://oss.sgi.com/bugzilla/show_bug.cgi?id=410
We started testing with 2.6.38-rc7 and have seen this bug through to the .0 release. This does not appear to be present in 2.6.33, but we have not done testing in between. We have tested with ext4 and do not encounter this bug.
CONFIG_XFS_FS=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
# CONFIG_XFS_DEBUG is not set
# CONFIG_VXFS_FS is not set
Here is the stack from the process:
[<ffffffff81357553>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff812ddf1e>] xfs_ilock+0x7e/0x110
[<ffffffff8130132f>] __xfs_get_blocks+0x8f/0x4e0
[<ffffffff813017b1>] xfs_get_blocks+0x11/0x20
[<ffffffff8114ba3e>] __block_write_begin+0x1ee/0x5b0
[<ffffffff8114be9d>] block_page_mkwrite+0x9d/0xf0
[<ffffffff81307e05>] xfs_vm_page_mkwrite+0x15/0x20
[<ffffffff810f2ddb>] do_wp_page+0x54b/0x820
[<ffffffff810f347c>] handle_pte_fault+0x3cc/0x820
[<ffffffff810f5145>] handle_mm_fault+0x175/0x2f0
[<ffffffff8102e399>] do_page_fault+0x159/0x470
[<ffffffff816cf6cf>] page_fault+0x1f/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
# uname -a
Linux testhost 2.6.38 #2 SMP PREEMPT Fri Mar 18 15:00:59 GMT 2011 x86_64 GNU/Linux
Please let me know if additional information is required.
Thanks!
Sean
[-- Attachment #2: vmtest.c --]
[-- Type: text/plain, Size: 2185 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdint.h>
#include <inttypes.h>
#include <errno.h>
#include <fcntl.h>
#include <err.h>
static inline uint64_t
rdtsc(void)
{
uint32_t low, high;
__asm__ __volatile__("rdtsc" : "=a"(low), "=d"(high));
return low | ((uint64_t) high) << 32;
}
void *
mmapfile(
const char * filename,
uint64_t len
)
{
int perms = 0666;
int open_flag = O_RDWR | O_CREAT;
int mmap_flags = PROT_READ | PROT_WRITE;
const int fd = open(filename, open_flag, perms);
if (fd < 0)
goto fail;
// Ensure that the file is empty and the right size
if (ftruncate(fd, 0) < 0)
goto fail;
if (ftruncate(fd, len) < 0)
goto fail;
// Map the entire actual length of the file
void * const base = mmap(
NULL,
len,
mmap_flags,
MAP_SHARED | MAP_POPULATE,
fd,
0
);
if (base == MAP_FAILED)
goto fail;
close(fd);
return base;
fail:
err(1, "%s: Unable to map %"PRIu64" bytes", filename, len);
}
int main(
int argc,
char ** argv
)
{
const char * filename = argv[1];
const uint64_t len = argc > 2 ? strtoul(argv[2], NULL, 0) : (5ul << 30);
const uint64_t max_index = len / sizeof(uint64_t);
uint64_t mmap_time = -rdtsc();
uint64_t * const buf = mmapfile(filename, len);
mmap_time += rdtsc();
fprintf(stderr, "%s: mapped %"PRIu64" bytes in %"PRIu64" ticks\n",
filename,
len,
mmap_time
);
while (1)
{
uint64_t max = 0;
uint64_t sum = 0;
uint64_t i;
const uint64_t loop_start = rdtsc();
const uint64_t iters = 1 << 30;
uint64_t start = loop_start;
for (i = 0 ; i < iters ; i++)
{
uint64_t i = lrand48() % max_index;
buf[i] += start;
uint64_t end = rdtsc();
const uint64_t delta = end - start;
start = end;
sum += delta;
if (delta > max)
max = delta;
// Force a report every 10 billion ticks ~= 3 seconds
if (end - loop_start > 10e9)
break;
}
printf("%"PRIu64": avg %"PRIu64" max %"PRIu64" ticks\n",
i,
i ? sum / i : 0,
max
);
}
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: XFS memory allocation deadlock in 2.6.38 2011-03-18 15:46 Sean Noonan @ 2011-03-22 10:00 ` Christoph Hellwig 2011-03-22 13:53 ` Sean Noonan 2011-03-22 23:25 ` Dave Chinner 1 sibling, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2011-03-22 10:00 UTC (permalink / raw) To: Sean Noonan Cc: 'linux-xfs@oss.sgi.com', Martin Bligh, Trammell Hudson > We started testing with 2.6.38-rc7 and have seen this bug through to the .0 release. This does not appear to be present in 2.6.33, but we have not done testing in between. We have tested with ext4 and do not encounter this bug. Does reverting commit aea1b9532143218f8599ecedbbd6bfbf812385e1 fix the issue for you? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-22 10:00 ` Christoph Hellwig @ 2011-03-22 13:53 ` Sean Noonan 0 siblings, 0 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-22 13:53 UTC (permalink / raw) To: 'Christoph Hellwig' Cc: 'linux-xfs@oss.sgi.com', Martin Bligh, Christos Zoulas, Stephen Degler, Trammell Hudson No, reverting this commit did not fix the issue. We happened to have 2.6.36.1 and 2.6.37.2 available and tested against both of those. I was not able to reproduce the bug with either version. Sean -----Original Message----- From: Christoph Hellwig [mailto:hch@infradead.org] Sent: Tuesday, March 22, 2011 06:00 To: Sean Noonan Cc: 'linux-xfs@oss.sgi.com'; Martin Bligh; Trammell Hudson Subject: Re: XFS memory allocation deadlock in 2.6.38 > We started testing with 2.6.38-rc7 and have seen this bug through to the .0 release. This does not appear to be present in 2.6.33, but we have not done testing in between. We have tested with ext4 and do not encounter this bug. Does reverting commit aea1b9532143218f8599ecedbbd6bfbf812385e1 fix the issue for you? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-18 15:46 Sean Noonan 2011-03-22 10:00 ` Christoph Hellwig @ 2011-03-22 23:25 ` Dave Chinner 2011-03-23 21:33 ` Sean Noonan 1 sibling, 1 reply; 32+ messages in thread From: Dave Chinner @ 2011-03-22 23:25 UTC (permalink / raw) To: Sean Noonan Cc: 'linux-xfs@oss.sgi.com', Martin Bligh, Trammell Hudson On Fri, Mar 18, 2011 at 11:46:04AM -0400, Sean Noonan wrote: > Using the attached program, we are able to reproduce this bug reliably. > $ make vmtest > $ ./vmtest /xfs/hugefile.dat $(( 16 * 1024 * 1024 * 1024 )) # vmtest <path_to_file> <size_in_bytes> > /xfs/hugefile.dat: mapped 17179869184 bytes in 33822066943 ticks > 749660: avg 13339 max 234667 ticks > 371945: avg 26885 max 281616 ticks > --- > At this point, we see the following on the console: > [593492.694806] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > [593506.724367] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > [593524.837717] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > [593556.742386] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) What is the configuration of the machine you are testing on? I can't reproduce this on a current 2.6.39-tot tree on a 2p/2GB RAM VM that has it's blockdev images on a single SATA drive.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-22 23:25 ` Dave Chinner @ 2011-03-23 21:33 ` Sean Noonan 2011-03-23 22:03 ` Dave Chinner 0 siblings, 1 reply; 32+ messages in thread From: Sean Noonan @ 2011-03-23 21:33 UTC (permalink / raw) To: 'Dave Chinner' Cc: 'linux-xfs@oss.sgi.com', Martin Bligh, Trammell Hudson > What is the configuration of the machine you are testing on? I can't > reproduce this on a current 2.6.39-tot tree on a 2p/2GB RAM VM that > has it's blockdev images on a single SATA drive.... The machine we are testing with has 12 cores, 12 hyperthreaded siblings, and 48GB of RAM. The filesystem is backed by a partition on one large hardware RAID. There is a second bug we've run into that I'll report tomorrow. Could you point me to the current tree you are working with? 2.6.39-tot doesn't mean anything to me. Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: XFS memory allocation deadlock in 2.6.38 2011-03-23 21:33 ` Sean Noonan @ 2011-03-23 22:03 ` Dave Chinner 2011-03-23 22:33 ` Sean Noonan 0 siblings, 1 reply; 32+ messages in thread From: Dave Chinner @ 2011-03-23 22:03 UTC (permalink / raw) To: Sean Noonan Cc: 'linux-xfs@oss.sgi.com', Martin Bligh, Trammell Hudson On Wed, Mar 23, 2011 at 05:33:36PM -0400, Sean Noonan wrote: > > What is the configuration of the machine you are testing on? I can't > > reproduce this on a current 2.6.39-tot tree on a 2p/2GB RAM VM that > > has it's blockdev images on a single SATA drive.... > > The machine we are testing with has 12 cores, 12 hyperthreaded > siblings, and 48GB of RAM. The filesystem is backed by a > partition on one large hardware RAID. There is a second bug we've > run into that I'll report tomorrow. So why would creating a 16GB file cause and OOM condition if you have 48GB RAM? That doesn't make a whole lot of sense to me. Is that memory free, of are other things running and consuming memory? > Could you point me to the current tree you are working with? > 2.6.39-tot doesn't mean anything to me. tot = Top of Tree. IOWs, 2.6.39-tot means the latest development kernel Linus has released. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: XFS memory allocation deadlock in 2.6.38 2011-03-23 22:03 ` Dave Chinner @ 2011-03-23 22:33 ` Sean Noonan 0 siblings, 0 replies; 32+ messages in thread From: Sean Noonan @ 2011-03-23 22:33 UTC (permalink / raw) To: 'Dave Chinner' Cc: 'linux-xfs@oss.sgi.com', Martin Bligh, Trammell Hudson > So why would creating a 16GB file cause and OOM condition if you > have 48GB RAM? That doesn't make a whole lot of sense to me. Is > that memory free, of are other things running and consuming memory? As far as I know this isn't an OOM condition. It is a bug, which is why I sent a patch that fixes this specific case earlier today. Sean _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2011-03-30 9:27 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <081DDE43F61F3D43929A181B477DCA95639B52FD@MSXAOA6.twosigma.com>
2011-03-23 19:39 ` XFS memory allocation deadlock in 2.6.38 Sean Noonan
2011-03-24 17:43 ` Christoph Hellwig
2011-03-24 23:45 ` Michel Lespinasse
2011-03-28 14:58 ` Sean Noonan
2011-03-28 21:06 ` Michel Lespinasse
2011-03-28 21:34 ` Sean Noonan
2011-03-29 0:25 ` Michel Lespinasse
2011-03-29 1:51 ` Dave Chinner
2011-03-29 2:49 ` Sean Noonan
2011-03-29 19:05 ` Sean Noonan
2011-03-29 19:24 ` 'Christoph Hellwig'
2011-03-29 19:39 ` Johannes Weiner
2011-03-29 19:43 ` 'Christoph Hellwig'
2011-03-29 19:46 ` Sean Noonan
2011-03-29 20:02 ` 'Christoph Hellwig'
2011-03-29 20:23 ` Sean Noonan
2011-03-29 22:42 ` Dave Chinner
2011-03-29 22:45 ` Sean Noonan
2011-03-30 9:23 ` 'Christoph Hellwig'
2011-03-29 19:54 ` Sean Noonan
2011-03-30 0:09 ` Dave Chinner
2011-03-30 1:32 ` Sean Noonan
2011-03-30 1:44 ` Dave Chinner
2011-03-30 1:52 ` Sean Noonan
2011-03-30 9:30 ` 'Christoph Hellwig'
2011-03-18 15:46 Sean Noonan
2011-03-22 10:00 ` Christoph Hellwig
2011-03-22 13:53 ` Sean Noonan
2011-03-22 23:25 ` Dave Chinner
2011-03-23 21:33 ` Sean Noonan
2011-03-23 22:03 ` Dave Chinner
2011-03-23 22:33 ` Sean Noonan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox