linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
	Greg Kroah-Hartman <gregkh@suse.de>
Subject: [54/74] mm: purge fragmented percpu vmap blocks
Date: Thu, 04 Feb 2010 09:12:25 -0800	[thread overview]
Message-ID: <20100204171518.697924361@linux.site> (raw)
In-Reply-To: <20100204171850.GA16539@kroah.com>

2.6.32-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Nick Piggin <npiggin@suse.de>

commit 02b709df817c0db174f249cc59e5f7fd01b64d92 upstream.

Improve handling of fragmented per-CPU vmaps.  We previously don't free
up per-CPU maps until all its addresses have been used and freed.  So
fragmented blocks could fill up vmalloc space even if they actually had
no active vmap regions within them.

Add some logic to allow all CPUs to have these blocks purged in the case
of failure to allocate a new vm area, and also put some logic to trim
such blocks of a current CPU if we hit them in the allocation path (so
as to avoid a large build up of them).

Christoph reported some vmap allocation failures when using the per CPU
vmap APIs in XFS, which cannot be reproduced after this patch and the
previous bug fix.

Cc: linux-mm@kvack.org
Tested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/vmalloc.c |   91 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 80 insertions(+), 11 deletions(-)

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -509,6 +509,9 @@ static unsigned long lazy_max_pages(void
 
 static atomic_t vmap_lazy_nr = ATOMIC_INIT(0);
 
+/* for per-CPU blocks */
+static void purge_fragmented_blocks_allcpus(void);
+
 /*
  * Purges all lazily-freed vmap areas.
  *
@@ -539,6 +542,9 @@ static void __purge_vmap_area_lazy(unsig
 	} else
 		spin_lock(&purge_lock);
 
+	if (sync)
+		purge_fragmented_blocks_allcpus();
+
 	rcu_read_lock();
 	list_for_each_entry_rcu(va, &vmap_area_list, list) {
 		if (va->flags & VM_LAZY_FREE) {
@@ -678,6 +684,7 @@ struct vmap_block {
 	DECLARE_BITMAP(dirty_map, VMAP_BBMAP_BITS);
 	struct list_head free_list;
 	struct rcu_head rcu_head;
+	struct list_head purge;
 };
 
 /* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
@@ -782,12 +789,61 @@ static void free_vmap_block(struct vmap_
 	call_rcu(&vb->rcu_head, rcu_free_vb);
 }
 
+static void purge_fragmented_blocks(int cpu)
+{
+	LIST_HEAD(purge);
+	struct vmap_block *vb;
+	struct vmap_block *n_vb;
+	struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
+
+		if (!(vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS))
+			continue;
+
+		spin_lock(&vb->lock);
+		if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
+			vb->free = 0; /* prevent further allocs after releasing lock */
+			vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
+			bitmap_fill(vb->alloc_map, VMAP_BBMAP_BITS);
+			bitmap_fill(vb->dirty_map, VMAP_BBMAP_BITS);
+			spin_lock(&vbq->lock);
+			list_del_rcu(&vb->free_list);
+			spin_unlock(&vbq->lock);
+			spin_unlock(&vb->lock);
+			list_add_tail(&vb->purge, &purge);
+		} else
+			spin_unlock(&vb->lock);
+	}
+	rcu_read_unlock();
+
+	list_for_each_entry_safe(vb, n_vb, &purge, purge) {
+		list_del(&vb->purge);
+		free_vmap_block(vb);
+	}
+}
+
+static void purge_fragmented_blocks_thiscpu(void)
+{
+	purge_fragmented_blocks(smp_processor_id());
+}
+
+static void purge_fragmented_blocks_allcpus(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		purge_fragmented_blocks(cpu);
+}
+
 static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 {
 	struct vmap_block_queue *vbq;
 	struct vmap_block *vb;
 	unsigned long addr = 0;
 	unsigned int order;
+	int purge = 0;
 
 	BUG_ON(size & ~PAGE_MASK);
 	BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);
@@ -800,24 +856,37 @@ again:
 		int i;
 
 		spin_lock(&vb->lock);
+		if (vb->free < 1UL << order)
+			goto next;
 		i = bitmap_find_free_region(vb->alloc_map,
 						VMAP_BBMAP_BITS, order);
 
-		if (i >= 0) {
-			addr = vb->va->va_start + (i << PAGE_SHIFT);
-			BUG_ON(addr_to_vb_idx(addr) !=
-					addr_to_vb_idx(vb->va->va_start));
-			vb->free -= 1UL << order;
-			if (vb->free == 0) {
-				spin_lock(&vbq->lock);
-				list_del_rcu(&vb->free_list);
-				spin_unlock(&vbq->lock);
+		if (i < 0) {
+			if (vb->free + vb->dirty == VMAP_BBMAP_BITS) {
+				/* fragmented and no outstanding allocations */
+				BUG_ON(vb->dirty != VMAP_BBMAP_BITS);
+				purge = 1;
 			}
-			spin_unlock(&vb->lock);
-			break;
+			goto next;
+		}
+		addr = vb->va->va_start + (i << PAGE_SHIFT);
+		BUG_ON(addr_to_vb_idx(addr) !=
+				addr_to_vb_idx(vb->va->va_start));
+		vb->free -= 1UL << order;
+		if (vb->free == 0) {
+			spin_lock(&vbq->lock);
+			list_del_rcu(&vb->free_list);
+			spin_unlock(&vbq->lock);
 		}
 		spin_unlock(&vb->lock);
+		break;
+next:
+		spin_unlock(&vb->lock);
 	}
+
+	if (purge)
+		purge_fragmented_blocks_thiscpu();
+
 	put_cpu_var(vmap_cpu_blocks);
 	rcu_read_unlock();
 



  parent reply	other threads:[~2010-02-04 17:22 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-04 17:18 [00/74] 2.6.32.8-stable review Greg KH
2010-02-04 17:11 ` [01/74] [SCSI] scsi_lib: Fix bug in completion of bidi commands Greg KH
2010-02-04 17:11 ` [02/74] [SCSI] mptsas: Fix issue with chain pools allocation on katmai Greg KH
2010-02-04 17:11 ` [03/74] mm: add new read_cache_page_gfp() helper function Greg KH
2010-02-04 17:11 ` [04/74] drm/i915: Selectively enable self-reclaim Greg KH
2010-02-04 17:11 ` [05/74] firewire: ohci: fix crashes with TSB43AB23 on 64bit systems Greg KH
2010-02-04 17:11 ` [06/74] S390: fix single stepped svcs with TRACE_IRQFLAGS=y Greg KH
2010-02-04 17:11 ` [07/74] x86: Set hotpluggable nodes in nodes_possible_map Greg KH
2010-02-04 17:11 ` [08/74] x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG) Greg KH
2010-02-04 17:11 ` [09/74] libata: retry FS IOs even if it has failed with AC_ERR_INVALID Greg KH
2010-02-04 17:11 ` [10/74] [S390] zcrypt: Do not remove coprocessor for error 8/72 Greg KH
2010-02-04 17:11 ` [11/74] [S390] dasd: fix possible NULL pointer errors Greg KH
2010-02-11 23:15   ` Bastian Blank
2010-02-11 23:38     ` Greg KH
2010-02-04 17:11 ` [12/74] ACPI: Add a generic API for _OSC -v2 Greg KH
2010-02-04 17:11 ` [13/74] ACPI: Add platform-wide _OSC support Greg KH
2010-02-04 17:11 ` [14/74] ACPI: fix OSC regression that caused aer and pciehp not to load Greg KH
2010-02-04 17:11 ` [15/74] ACPI: Advertise to BIOS in _OSC: _OST on _PPC changes Greg KH
2010-02-04 17:11 ` [16/74] UBI: fix volume creation input checking Greg KH
2010-02-04 17:11 ` [17/74] e1000: enhance frame fragment detection Greg KH
2010-02-04 17:11 ` [18/74] e1000e: " Greg KH
2010-02-04 17:11 ` [19/74] e1000/e1000e: dont use small hardware rx buffers Greg KH
2010-02-04 17:11 ` [20/74] drm/i915: Reload hangcheck timer too for Ironlake Greg KH
2010-02-04 17:11 ` [21/74] Fix a leak in affs_fill_super() Greg KH
2010-02-04 17:11 ` [22/74] Fix failure exits in bfs_fill_super() Greg KH
2010-02-04 17:11 ` [23/74] fix oops in fs/9p late mount failure Greg KH
2010-02-04 17:11 ` [24/74] fix leak in romfs_fill_super() Greg KH
2010-02-04 17:11 ` [25/74] Fix remount races with symlink handling in affs Greg KH
2010-02-04 17:11 ` [26/74] fix affs parse_options() Greg KH
2010-02-04 17:11 ` [27/74] Fix failure exit in ipathfs Greg KH
2010-02-04 17:11 ` [28/74] mm: fix migratetype bug which slowed swapping Greg KH
2010-02-04 17:12 ` [29/74] FDPIC: Respect PT_GNU_STACK exec protection markings when creating NOMMU stack Greg KH
2010-02-04 17:12 ` [30/74] Split flush_old_exec into two functions Greg KH
2010-02-04 17:12 ` [31/74] sparc: TIF_ABI_PENDING bit removal Greg KH
2010-02-04 17:12 ` [32/74] x86: get rid of the insane TIF_ABI_PENDING bit Greg KH
2010-02-04 17:12 ` [33/74] Input: winbond-cir - remove dmesg spam Greg KH
2010-02-04 17:12 ` [34/74] x86: Disable HPET MSI on ATI SB700/SB800 Greg KH
2010-02-04 17:12 ` [35/74] iwlwifi: set default aggregation frame count limit to 31 Greg KH
2010-02-04 17:12 ` [36/74] drm/i915: only enable hotplug for detected outputs Greg KH
2010-02-04 17:12 ` [37/74] firewire: core: add_descriptor size check Greg KH
2010-02-04 17:12 ` [38/74] SECURITY: selinux, fix update_rlimit_cpu parameter Greg KH
2010-02-04 17:12 ` [39/74] regulator: Specify REGULATOR_CHANGE_STATUS for WM835x LED constraints Greg KH
2010-02-04 17:12 ` [40/74] x86: Add Dell OptiPlex 760 reboot quirk Greg KH
2010-02-04 17:12 ` [41/74] x86: Add quirk for Intel DG45FC board to avoid low memory corruption Greg KH
2010-02-04 17:12 ` [42/74] x86/amd-iommu: Fix possible integer overflow Greg KH
2010-02-04 17:12 ` [43/74] clocksource: fix compilation if no GENERIC_TIME Greg KH
2010-02-04 17:12 ` [44/74] tcp: update the netstamp_needed counter when cloning sockets Greg KH
2010-02-04 17:12 ` [45/74] sky2: Fix oops in sky2_xmit_frame() after TX timeout Greg KH
2010-02-04 17:12 ` [46/74] net: restore ip source validation Greg KH
2010-02-05 10:16   ` Sven Joachim
2010-02-04 17:12 ` [47/74] af_packet: Dont use skb after dev_queue_xmit() Greg KH
2010-02-04 17:12 ` [48/74] ax25: netrom: rose: Fix timer oopses Greg KH
2010-02-04 17:12 ` [49/74] KVM: allow userspace to adjust kvmclock offset Greg KH
2010-02-04 17:12 ` [50/74] oprofile/x86: add Xeon 7500 series support Greg KH
2010-02-04 17:12 ` [51/74] oprofile/x86: fix crash when profiling more than 28 events Greg KH
2010-02-04 17:12 ` [52/74] libata: retry link resume if necessary Greg KH
2010-02-04 17:12 ` [53/74] mm: percpu-vmap fix RCU list walking Greg KH
2010-02-04 17:12 ` Greg KH [this message]
2010-02-04 17:12 ` [55/74] block: fix bio_add_page for non trivial merge_bvec_fn case Greg KH
2010-02-04 17:12 ` [56/74] Fix flush_old_exec()/setup_new_exec() split Greg KH
2010-02-04 17:12 ` [57/74] random: drop weird m_time/a_time manipulation Greg KH
2010-02-04 17:12 ` [58/74] random: Remove unused inode variable Greg KH
2010-02-04 17:12 ` [59/74] block: fix bugs in bio-integrity mempool usage Greg KH
2010-02-04 17:12 ` [60/74] usb: r8a66597-hdc disable interrupts fix Greg KH
2010-02-04 17:12 ` [61/74] connector: Delete buggy notification code Greg KH
2010-02-04 17:12 ` [62/74] be2net: Bug fix to support newer generation of BE ASIC Greg KH
2010-02-04 17:12 ` [63/74] be2net: Fix memset() arg ordering Greg KH
2010-02-04 17:12 ` [64/74] mm: flush dcache before writing into page to avoid alias Greg KH
2010-02-04 17:12 ` [65/74] mac80211: fix NULL pointer dereference when ftrace is enabled Greg KH
2010-02-04 17:12 ` [66/74] imxfb: correct location of callbacks in suspend and resume Greg KH
2010-02-04 17:12 ` [67/74] mx3fb: some debug and initialisation fixes Greg KH
2010-02-04 17:12 ` [68/74] starfire: clean up properly if firmware loading fails Greg KH
2010-02-04 17:12 ` [69/74] kernel/cred.c: use kmem_cache_free Greg KH
2010-02-04 17:12 ` [70/74] uartlite: fix crash when using as console Greg KH
2010-02-04 17:12 ` [71/74] pktcdvd: removing device does not remove its sysfs dir Greg KH
2010-02-04 17:12 ` [72/74] ath9k: fix eeprom INI values override for 2GHz-only cards Greg KH
2010-02-04 17:12 ` [73/74] ath9k: fix beacon slot/buffer leak Greg KH
2010-02-04 17:12 ` [74/74] powerpc: TIF_ABI_PENDING bit removal Greg KH
2010-02-05  7:36 ` [Stable-review] [00/74] 2.6.32.8-stable review Nikola Ciprich
2010-02-05 17:12   ` Greg KH
2010-02-07 10:26     ` Nikola Ciprich
2010-02-05 16:53 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100204171518.697924361@linux.site \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).