Re: [PATCH] xfs: flush vmap aliases when mapping fails

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
       [not found] <1299713876-7747-1-git-send-email-david@fromorbit.com>
@ 2011-03-10  7:37 ` Christoph Hellwig
  2011-03-10 22:49   ` Dave Chinner
  2011-03-18 14:24   ` Johannes Weiner
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2011-03-10  7:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, npiggin, linux-mm

On Thu, Mar 10, 2011 at 10:37:56AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> On 32 bit systems, vmalloc space is limited and XFS can chew through
> it quickly as the vmalloc space is lazily freed. This can result in
> failure to map buffers, even when there is apparently large amounts
> of vmalloc space available. Hence, if we fail to map a buffer, purge
> the aliases that have not yet been freed to hopefuly free up enough
> vmalloc space to allow a retry to succeed.

IMHO this should be done by vm_map_ram internally.  If we can't get the
core code fixes we can put this in as a last resort.

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |   14 +++++++++++---
>  1 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
> index 3cc671c..a5a260f 100644
> --- a/fs/xfs/linux-2.6/xfs_buf.c
> +++ b/fs/xfs/linux-2.6/xfs_buf.c
> @@ -455,9 +455,17 @@ _xfs_buf_map_pages(
>  		bp->b_addr = page_address(bp->b_pages[0]) + bp->b_offset;
>  		bp->b_flags |= XBF_MAPPED;
>  	} else if (flags & XBF_MAPPED) {
> -		bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
> -					-1, PAGE_KERNEL);
> -		if (unlikely(bp->b_addr == NULL))
> +		int retried = 0;
> +
> +		do {
> +			bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
> +						-1, PAGE_KERNEL);
> +			if (bp->b_addr)
> +				break;
> +			vm_unmap_aliases();
> +		} while (retried++ <= 1);
> +
> +		if (!bp->b_addr)
>  			return -ENOMEM;
>  		bp->b_addr += bp->b_offset;
>  		bp->b_flags |= XBF_MAPPED;
> -- 
> 1.7.2.3
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
  2011-03-10  7:37 ` [PATCH] xfs: flush vmap aliases when mapping fails Christoph Hellwig
@ 2011-03-10 22:49   ` Dave Chinner
  2011-03-17 14:24     ` Christoph Hellwig
  2011-03-21 12:25     ` Johannes Weiner
  2011-03-18 14:24   ` Johannes Weiner
  1 sibling, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2011-03-10 22:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, npiggin, linux-mm

On Thu, Mar 10, 2011 at 02:37:51AM -0500, Christoph Hellwig wrote:
> On Thu, Mar 10, 2011 at 10:37:56AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > On 32 bit systems, vmalloc space is limited and XFS can chew through
> > it quickly as the vmalloc space is lazily freed. This can result in
> > failure to map buffers, even when there is apparently large amounts
> > of vmalloc space available. Hence, if we fail to map a buffer, purge
> > the aliases that have not yet been freed to hopefuly free up enough
> > vmalloc space to allow a retry to succeed.
> 
> IMHO this should be done by vm_map_ram internally.  If we can't get the
> core code fixes we can put this in as a last resort.

OK. The patch was done as part of the triage for this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=27492

where the vmalloc space on 32 bit systems is getting exhausted. I
can easily move this flush-and-retry into the vmap code.

FWIW, while the VM folk might be paying attention about vmap realted
stuff, this vmap BUG() also needs triage:

https://bugzilla.kernel.org/show_bug.cgi?id=27002

And, finally, the mm-vmap-area-cache.patch in the current mmotm also
needs to be pushed forward because we've been getting reports of
excessive CPU time being spent walking the vmap area rbtree during
vm_map_ram operations and this patch supposedly fixes that
problem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
  2011-03-10 22:49   ` Dave Chinner
@ 2011-03-17 14:24     ` Christoph Hellwig
  2011-03-21 12:25     ` Johannes Weiner
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2011-03-17 14:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-mm, npiggin, xfs

On Fri, Mar 11, 2011 at 09:49:45AM +1100, Dave Chinner wrote:
> > IMHO this should be done by vm_map_ram internally.  If we can't get the
> > core code fixes we can put this in as a last resort.
> 
> OK. The patch was done as part of the triage for this bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=27492
> 
> where the vmalloc space on 32 bit systems is getting exhausted. I
> can easily move this flush-and-retry into the vmap code.

Looks like we're not going to make any progress on the VM side for this,
so I think we'll need the XFS variant for 2.6.39.


Reviewed-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
  2011-03-10 22:49   ` Dave Chinner
  2011-03-17 14:24     ` Christoph Hellwig
@ 2011-03-21 12:25     ` Johannes Weiner
  2011-03-22 12:57       ` Johannes Weiner
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2011-03-21 12:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Nick Piggin, Hugh Dickins, Andrew Morton, xfs,
	linux-mm

On Fri, Mar 11, 2011 at 09:49:45AM +1100, Dave Chinner wrote:
> On Thu, Mar 10, 2011 at 02:37:51AM -0500, Christoph Hellwig wrote:
> > On Thu, Mar 10, 2011 at 10:37:56AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > On 32 bit systems, vmalloc space is limited and XFS can chew through
> > > it quickly as the vmalloc space is lazily freed. This can result in
> > > failure to map buffers, even when there is apparently large amounts
> > > of vmalloc space available. Hence, if we fail to map a buffer, purge
> > > the aliases that have not yet been freed to hopefuly free up enough
> > > vmalloc space to allow a retry to succeed.
> > 
> > IMHO this should be done by vm_map_ram internally.  If we can't get the
> > core code fixes we can put this in as a last resort.
> 
> OK. The patch was done as part of the triage for this bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=27492
> 
> where the vmalloc space on 32 bit systems is getting exhausted. I
> can easily move this flush-and-retry into the vmap code.

The problem appears to be with the way vmap blocks are allocated.  It
would explain the symptoms perfectly: failing allocations long before
vmap space is exhausted.  I had the following test patch applied to a
vanilla -mmotm and a patched one:

---

diff --git a/init/main.c b/init/main.c
index 4a9479e..62f92f9 100644
--- a/init/main.c
+++ b/init/main.c
@@ -559,6 +559,9 @@ asmlinkage void __init start_kernel(void)
 	if (panic_later)
 		panic(panic_later, panic_param);

+	extern void vmalloc_test(void);
+	vmalloc_test();
+
 	lockdep_info();

 	/*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cbd9f9f..d6f75dc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1116,6 +1116,16 @@ void __init vmalloc_init(void)
 	vmap_initialized = true;
 }

+void vmalloc_test(void)
+{
+	struct page *pages[] = { ZERO_PAGE(0) };
+	unsigned long total = 0;
+
+	while (vm_map_ram(pages, 1, -1, PAGE_KERNEL))
+		total++;
+	panic("Vmapped %lu single pages\n", total);
+}
+
 /**
  * map_kernel_range_noflush - map kernel VM area with the specified pages
  * @addr: start of the VM area to map

---

where the results are:

	vanilla: Kernel panic - not syncing: Vmapped 15360 single pages
	patched: Kernel panic - not syncing: Vmapped 30464 single pages

The patch with a more accurate problem description is attached at the
end of this email.

> FWIW, while the VM folk might be paying attention about vmap realted
> stuff, this vmap BUG() also needs triage:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=27002

I stared at this bug and the XFS code for a while over the weekend.
What you are doing in there is really scary!

So xfs_buf_free() does vm_unmap_ram if the buffer has the XBF_MAPPED
flag set and spans multiple pages (b_page_count > 1).

In xlog_sync() you have that split case where you do XFS_BUF_SET_PTR
on that in-core log's l_xbuf which changes that buffer to, as far as I
could understand, linear kernel memory.  Later in xlog_dealloc_log you
call xfs_buf_free() on that buffer.

I was unable to determine if this can ever be more than one page in
the buffer for the split case.  But if this is the case, you end up
invoking vm_unmap_ram() on something you never vm_map_ram'd, which
could explain why this triggers the BUG_ON() for the dirty area map.

But even if this is all fine and working, this looks subtle as hell.

This BUG_ON() is not necessarily a sign of a faulty vmap allocator,
but could just as much indicate a faulty caller.

> And, finally, the mm-vmap-area-cache.patch in the current mmotm also
> needs to be pushed forward because we've been getting reports of
> excessive CPU time being spent walking the vmap area rbtree during
> vm_map_ram operations and this patch supposedly fixes that
> problem....

It looks good to me.  After Nick's original hole searching code did my
head in, I am especially fond of Hugh's simplifications in that area
;-)  So for what it's worth:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

And here is the patch that should improve on the vmap exhaustion
problems observed with XFS on 32-bit.

It removes the guard page allocation from the basic vmap area
allocator and leaves it to __get_vmap_area() and thus vmalloc to take
care of the guard page.  If it's deemed necessary to have guard pages
also for vm_map_ram(), I think it should be handled in there instead.
This patch does not do this.

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] mm: vmalloc: remove guard pages from between basic vmap areas

The vmap allocator is used, among other things, to allocate per-cpu
vmap blocks, where each vmap block is naturally aligned to its own
size.  Obviously, leaving a guard page after each vmap area forbids
packing vmap blocks efficiently and can make the kernel run out of
possible vmap blocks long before vmap space is exhausted.

The vmap code to map a user-supplied page array into linear vmalloc
space insists on using a vmap block (instead of falling back to a
custom area) when the area size is beneath a certain threshold.  With
heavy users of this interface (e.g. XFS) and limited vmalloc space on
32-bit, vmap block exhaustion is a real problem.

Remove the guard page from this allocator level.  It's still there for
vmalloc allocations, but enforced higher up.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmalloc.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cbd9f9f..5d8666b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -307,7 +307,7 @@ nocache:
 	/* find starting point for our search */
 	if (free_vmap_cache) {
 		first = rb_entry(free_vmap_cache, struct vmap_area, rb_node);
-		addr = ALIGN(first->va_end + PAGE_SIZE, align);
+		addr = ALIGN(first->va_end, align);
 		if (addr < vstart)
 			goto nocache;
 		if (addr + size - 1 < addr)
@@ -338,10 +338,10 @@ nocache:
 	}

 	/* from the starting point, walk areas until a suitable hole is found */
-	while (addr + size >= first->va_start && addr + size <= vend) {
+	while (addr + size > first->va_start && addr + size <= vend) {
 		if (addr + cached_hole_size < first->va_start)
 			cached_hole_size = first->va_start - addr;
-		addr = ALIGN(first->va_end + PAGE_SIZE, align);
+		addr = ALIGN(first->va_end, align);
 		if (addr + size - 1 < addr)
 			goto overflow;

-- 
1.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
  2011-03-21 12:25     ` Johannes Weiner
@ 2011-03-22 12:57       ` Johannes Weiner
  2011-03-27 23:54         ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2011-03-22 12:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Nick Piggin, Hugh Dickins, Andrew Morton, xfs,
	linux-mm

On Mon, Mar 21, 2011 at 01:25:26PM +0100, Johannes Weiner wrote:
> On Fri, Mar 11, 2011 at 09:49:45AM +1100, Dave Chinner wrote:
> > FWIW, while the VM folk might be paying attention about vmap realted
> > stuff, this vmap BUG() also needs triage:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=27002
> 
> I stared at this bug and the XFS code for a while over the weekend.
> What you are doing in there is really scary!
> 
> So xfs_buf_free() does vm_unmap_ram if the buffer has the XBF_MAPPED
> flag set and spans multiple pages (b_page_count > 1).
> 
> In xlog_sync() you have that split case where you do XFS_BUF_SET_PTR
> on that in-core log's l_xbuf which changes that buffer to, as far as I
> could understand, linear kernel memory.  Later in xlog_dealloc_log you
> call xfs_buf_free() on that buffer.
> 
> I was unable to determine if this can ever be more than one page in
> the buffer for the split case.  But if this is the case, you end up
> invoking vm_unmap_ram() on something you never vm_map_ram'd, which
> could explain why this triggers the BUG_ON() for the dirty area map.

Blech, that's bogus, please pardon my rashness.

I looked over the vmalloc side several times but could not spot
anything that would explain this crash.

However, when you switched from vunmap to vm_unmap_ram you had to add
the area size parameter.

I am guessing that the base address was always correct, vunmap would
have caught an error with it.  But the new size argument could be too
large and crash the kernel when it would reach into the next area that
had already been freed (and marked in the dirty bitmap).

I have given up on verifying that what xlog_sync() does to l_xbuf is
okay.  It would be good if you could confirm that it leaves the buffer
in a state so that its b_addr - b_offset, b_page_count are correctly
describing the exact vmap area.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
  2011-03-22 12:57       ` Johannes Weiner
@ 2011-03-27 23:54         ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2011-03-27 23:54 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Christoph Hellwig, Nick Piggin, Hugh Dickins, Andrew Morton, xfs,
	linux-mm

On Tue, Mar 22, 2011 at 01:57:36PM +0100, Johannes Weiner wrote:
> On Mon, Mar 21, 2011 at 01:25:26PM +0100, Johannes Weiner wrote:
> > On Fri, Mar 11, 2011 at 09:49:45AM +1100, Dave Chinner wrote:
> > > FWIW, while the VM folk might be paying attention about vmap realted
> > > stuff, this vmap BUG() also needs triage:
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=27002
> > 
> > I stared at this bug and the XFS code for a while over the weekend.
> > What you are doing in there is really scary!
> > 
> > So xfs_buf_free() does vm_unmap_ram if the buffer has the XBF_MAPPED
> > flag set and spans multiple pages (b_page_count > 1).
> > 
> > In xlog_sync() you have that split case where you do XFS_BUF_SET_PTR
> > on that in-core log's l_xbuf which changes that buffer to, as far as I
> > could understand, linear kernel memory.  Later in xlog_dealloc_log you
> > call xfs_buf_free() on that buffer.
> > 
> > I was unable to determine if this can ever be more than one page in
> > the buffer for the split case.  But if this is the case, you end up
> > invoking vm_unmap_ram() on something you never vm_map_ram'd, which
> > could explain why this triggers the BUG_ON() for the dirty area map.
> 
> Blech, that's bogus, please pardon my rashness.
> 
> I looked over the vmalloc side several times but could not spot
> anything that would explain this crash.
> 
> However, when you switched from vunmap to vm_unmap_ram you had to add
> the area size parameter.
> 
> I am guessing that the base address was always correct, vunmap would
> have caught an error with it.  But the new size argument could be too
> large and crash the kernel when it would reach into the next area that
> had already been freed (and marked in the dirty bitmap).
> 
> I have given up on verifying that what xlog_sync() does to l_xbuf is
> okay.  It would be good if you could confirm that it leaves the buffer
> in a state so that its b_addr - b_offset, b_page_count are correctly
> describing the exact vmap area.

Thanks for looking at this, Hannes. A fresh set of eyes always
helps. However, I don't think that l_xbuf is the only source of
potential problems w.r.t. the mapped region size when the buffer is
freed.  This was reported on #xfs overnight:

(http://pastebin.com/raw.php?i=P99pjDTn)

[  248.794327] XFS mounting filesystem md0
[  248.970190] Starting XFS recovery on filesystem: md0 (logdev: internal)
[  249.434782] ------------[ cut here ]------------
[  249.434962] kernel BUG at mm/vmalloc.c:942!
[  249.435053] invalid opcode: 0000 [#1] SMP 
[  249.435200] last sysfs file: /sys/devices/virtual/block/dm-5/dm/name
[  249.435291] CPU 1 
[  249.435324] Modules linked in: arc4 ecb ves1820 rt61pci crc_itu_t eeprom_93cx6 rt2x00pci rt2x00lib mac80211 budget budget_core saa7146 ttpci_eeprom dvb_core ftdi_sio usbserial evdev button cfg80211 shpchp pci_hotplug r8168 serio_raw pcspkr e1000e edac_core ohci_hcd
[  249.436509] 
[  249.436597] Pid: 2739, comm: mount Not tainted 2.6.38 #31 System manufacturer System Product Name/M4A785D-M PRO
[  249.436893] RIP: 0010:[<ffffffff810cb44e>]  [<ffffffff810cb44e>] vm_unmap_ram+0x9a/0x133
[  249.437078] RSP: 0018:ffff8801156fba88  EFLAGS: 00010246
[  249.437168] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  249.437260] RDX: 0000000000000000 RSI: 0000000000000041 RDI: 0000000000000001
[  249.437353] RBP: ffff8801156fbaa8 R08: 0000000000000000 R09: ffff8801125c4490
[  249.437445] R10: ffff880114ff5780 R11: dead000000200200 R12: 0000000000000006
[  249.437537] R13: ffffc900106e6000 R14: 0000000000040000 R15: ffff880114ff0dc0
[  249.437631] FS:  00007fba61615740(0000) GS:ffff8800dfd00000(0000) knlGS:0000000000000000
[  249.437777] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  249.437867] CR2: 00007fba61627000 CR3: 0000000114ff3000 CR4: 00000000000006e0
[  249.437959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  249.438051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  249.438144] Process mount (pid: 2739, threadinfo ffff8801156fa000, task ffff8801190c4290)
[  249.438288] Stack:
[  249.438374]  ffff880114ff0dc0 ffff880114ff0c80 0000000000000008 0000000000003590
[  249.438637]  ffff8801156fbac8 ffffffff811c167a ffff880119293800 ffff880114ff0c80
[  249.438677]  ffff8801156fbad8 ffffffff811b1b79 ffff8801156fbbf8 ffffffff811b4b37
[  249.438677] Call Trace:
[  249.438677]  [<ffffffff811c167a>] xfs_buf_free+0x38/0x78
[  249.438677]  [<ffffffff811b1b79>] xlog_put_bp+0x9/0xb
[  249.438677]  [<ffffffff811b4b37>] xlog_do_recovery_pass+0x5c8/0x5f4
[  249.438677]  [<ffffffff811b4bbb>] xlog_do_log_recovery+0x58/0x91
[  249.438677]  [<ffffffff811b300a>] ? xlog_find_tail+0x2a6/0x2fb
[  249.438677]  [<ffffffff811b4c07>] xlog_do_recover+0x13/0xed
[  249.438677]  [<ffffffff811b4e1b>] xlog_recover+0x7e/0x89
[  249.438677]  [<ffffffff811aedb0>] xfs_log_mount+0xdb/0x149
[  249.438677]  [<ffffffff811b714e>] xfs_mountfs+0x310/0x5c3
[  249.438677]  [<ffffffff811b7de1>] ? xfs_mru_cache_create+0x126/0x173
[  249.438677]  [<ffffffff811c8ecb>] xfs_fs_fill_super+0x183/0x2c4
[  249.438677]  [<ffffffff810e2d11>] mount_bdev+0x147/0x1ba
[  249.438677]  [<ffffffff811c8d48>] ? xfs_fs_fill_super+0x0/0x2c4
[  249.438677]  [<ffffffff811c7259>] xfs_fs_mount+0x10/0x12
[  249.438677]  [<ffffffff810e1f4f>] vfs_kern_mount+0x61/0x132
[  249.438677]  [<ffffffff810e207e>] do_kern_mount+0x48/0xda
[  249.438677]  [<ffffffff810f8aff>] do_mount+0x6ae/0x71b
[  249.438677]  [<ffffffff810f8dfd>] sys_mount+0x87/0xc8
[  249.438677]  [<ffffffff8102a8bb>] system_call_fastpath+0x16/0x1b
[  249.438677] Code: d1 e8 75 f8 48 be 00 00 00 00 00 37 00 00 48 c7 c7 b0 98 62 81 49 8d 74 35 00 48 c1 ee 16 e8 a2 8c 12 00 48 85 c0 48 89 c3 75 02 <0f> 0b 4b 8d 74 35 00 4c 89 ef e8 2d fa ff ff 48 89 df e8 21 3a 
[  249.438677] RIP  [<ffffffff810cb44e>] vm_unmap_ram+0x9a/0x133
[  249.438677]  RSP <ffff8801156fba88>
[  249.443421] ---[ end trace 2360f16b307700c6 ]---

Which is basically reading the log, not writing to it where l_xbuf
comes into play. The log reading code plays a _lot_ of tricks with
buffer offsets and sizes (if the simple l_xbuf tricks scare you, do
not look at this code ;). Hence it's definitely possible that the
size of the region being passed back to vm_unmap_ram() is wrong in
some of the error cases. I'll spend some more time to verify whether
they are restoring the buffer correctly or not before freeing it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: flush vmap aliases when mapping fails
  2011-03-10  7:37 ` [PATCH] xfs: flush vmap aliases when mapping fails Christoph Hellwig
  2011-03-10 22:49   ` Dave Chinner
@ 2011-03-18 14:24   ` Johannes Weiner
  1 sibling, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2011-03-18 14:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, xfs, npiggin, linux-mm

On Thu, Mar 10, 2011 at 02:37:51AM -0500, Christoph Hellwig wrote:
> On Thu, Mar 10, 2011 at 10:37:56AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > On 32 bit systems, vmalloc space is limited and XFS can chew through
> > it quickly as the vmalloc space is lazily freed. This can result in
> > failure to map buffers, even when there is apparently large amounts
> > of vmalloc space available. Hence, if we fail to map a buffer, purge
> > the aliases that have not yet been freed to hopefuly free up enough
> > vmalloc space to allow a retry to succeed.
> 
> IMHO this should be done by vm_map_ram internally.  If we can't get the
> core code fixes we can put this in as a last resort.

Agreed, this should be fixed in the vmalloc-ator.

It is already supposed to purge the lazy-freed mappings before it
fails an allocation, I am trying to figure out what's going on.

Your proposed workaround looks fine to me until vmalloc is fixed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-03-27 23:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1299713876-7747-1-git-send-email-david@fromorbit.com>
2011-03-10  7:37 ` [PATCH] xfs: flush vmap aliases when mapping fails Christoph Hellwig
2011-03-10 22:49   ` Dave Chinner
2011-03-17 14:24     ` Christoph Hellwig
2011-03-21 12:25     ` Johannes Weiner
2011-03-22 12:57       ` Johannes Weiner
2011-03-27 23:54         ` Dave Chinner
2011-03-18 14:24   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).