stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Dave Jones <davej@redhat.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Vladimir Davydov <vdavydov@parallels.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.15 61/61] slab: fix oops when reading /proc/slab_allocators
Date: Tue, 24 Jun 2014 11:51:44 -0400	[thread overview]
Message-ID: <20140624154955.567948901@linuxfoundation.org> (raw)
In-Reply-To: <20140624154952.751713761@linuxfoundation.org>

3.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

commit 03787301420376ae41fbaf4267f4a6253d152ac5 upstream.

Commit b1cb0982bdd6 ("change the management method of free objects of
the slab") introduced a bug on slab leak detector
('/proc/slab_allocators').  This detector works like as following
decription.

 1. traverse all objects on all the slabs.
 2. determine whether it is active or not.
 3. if active, print who allocate this object.

but that commit changed the way how to manage free objects, so the logic
determining whether it is active or not is also changed.  In before, we
regard object in cpu caches as inactive one, but, with this commit, we
mistakenly regard object in cpu caches as active one.

This intoduces kernel oops if DEBUG_PAGEALLOC is enabled.  If
DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
corrupt free memory in the slab.  It unmaps page table mapping if object
is free and map it if object is active.  When slab leak detector check
object in cpu caches, it mistakenly think this object active so try to
access object memory to retrieve caller of allocation.  At this point,
page table mapping to this object doesn't exist, so oops occurs.

Following is oops message reported from Dave.

It blew up when something tried to read /proc/slab_allocators
(Just cat it, and you should see the oops below)

  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in:
  [snip...]
  CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131
  task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000
  RIP: 0010:[<ffffffffaa1a8f4a>]  [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180
  RSP: 0018:ffff880076925de0  EFLAGS: 00010002
  RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7
  RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000
  RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000
  R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0
  FS:  00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0
  DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
  Call Trace:
    leaks_show+0xce/0x240
    seq_read+0x28e/0x490
    proc_reg_read+0x3d/0x80
    vfs_read+0x9b/0x160
    SyS_read+0x58/0xb0
    tracesys+0xd4/0xd9
  Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46
  RIP   handle_slab+0x8a/0x180

To fix the problem, I introduce an object status buffer on each slab.
With this, we can track object status precisely, so slab leak detector
would not access active object and no kernel oops would occur.  Memory
overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
which is mainly used for debugging, so memory overhead isn't big
problem.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/slab.c |   90 ++++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 19 deletions(-)

--- a/mm/slab.c
+++ b/mm/slab.c
@@ -386,6 +386,39 @@ static void **dbg_userword(struct kmem_c
 
 #endif
 
+#define OBJECT_FREE (0)
+#define OBJECT_ACTIVE (1)
+
+#ifdef CONFIG_DEBUG_SLAB_LEAK
+
+static void set_obj_status(struct page *page, int idx, int val)
+{
+	int freelist_size;
+	char *status;
+	struct kmem_cache *cachep = page->slab_cache;
+
+	freelist_size = cachep->num * sizeof(freelist_idx_t);
+	status = (char *)page->freelist + freelist_size;
+	status[idx] = val;
+}
+
+static inline unsigned int get_obj_status(struct page *page, int idx)
+{
+	int freelist_size;
+	char *status;
+	struct kmem_cache *cachep = page->slab_cache;
+
+	freelist_size = cachep->num * sizeof(freelist_idx_t);
+	status = (char *)page->freelist + freelist_size;
+
+	return status[idx];
+}
+
+#else
+static inline void set_obj_status(struct page *page, int idx, int val) {}
+
+#endif
+
 /*
  * Do not go above this order unless 0 objects fit into the slab or
  * overridden on the command line.
@@ -576,12 +609,30 @@ static inline struct array_cache *cpu_ca
 	return cachep->array[smp_processor_id()];
 }
 
+static size_t calculate_freelist_size(int nr_objs, size_t align)
+{
+	size_t freelist_size;
+
+	freelist_size = nr_objs * sizeof(freelist_idx_t);
+	if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
+		freelist_size += nr_objs * sizeof(char);
+
+	if (align)
+		freelist_size = ALIGN(freelist_size, align);
+
+	return freelist_size;
+}
+
 static int calculate_nr_objs(size_t slab_size, size_t buffer_size,
 				size_t idx_size, size_t align)
 {
 	int nr_objs;
+	size_t remained_size;
 	size_t freelist_size;
+	int extra_space = 0;
 
+	if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
+		extra_space = sizeof(char);
 	/*
 	 * Ignore padding for the initial guess. The padding
 	 * is at most @align-1 bytes, and @buffer_size is at
@@ -590,14 +641,15 @@ static int calculate_nr_objs(size_t slab
 	 * into the memory allocation when taking the padding
 	 * into account.
 	 */
-	nr_objs = slab_size / (buffer_size + idx_size);
+	nr_objs = slab_size / (buffer_size + idx_size + extra_space);
 
 	/*
 	 * This calculated number will be either the right
 	 * amount, or one greater than what we want.
 	 */
-	freelist_size = slab_size - nr_objs * buffer_size;
-	if (freelist_size < ALIGN(nr_objs * idx_size, align))
+	remained_size = slab_size - nr_objs * buffer_size;
+	freelist_size = calculate_freelist_size(nr_objs, align);
+	if (remained_size < freelist_size)
 		nr_objs--;
 
 	return nr_objs;
@@ -635,7 +687,7 @@ static void cache_estimate(unsigned long
 	} else {
 		nr_objs = calculate_nr_objs(slab_size, buffer_size,
 					sizeof(freelist_idx_t), align);
-		mgmt_size = ALIGN(nr_objs * sizeof(freelist_idx_t), align);
+		mgmt_size = calculate_freelist_size(nr_objs, align);
 	}
 	*num = nr_objs;
 	*left_over = slab_size - nr_objs*buffer_size - mgmt_size;
@@ -2032,13 +2084,16 @@ static size_t calculate_slab_order(struc
 			break;
 
 		if (flags & CFLGS_OFF_SLAB) {
+			size_t freelist_size_per_obj = sizeof(freelist_idx_t);
 			/*
 			 * Max number of objs-per-slab for caches which
 			 * use off-slab slabs. Needed to avoid a possible
 			 * looping condition in cache_grow().
 			 */
+			if (IS_ENABLED(CONFIG_DEBUG_SLAB_LEAK))
+				freelist_size_per_obj += sizeof(char);
 			offslab_limit = size;
-			offslab_limit /= sizeof(freelist_idx_t);
+			offslab_limit /= freelist_size_per_obj;
 
  			if (num > offslab_limit)
 				break;
@@ -2285,8 +2340,7 @@ __kmem_cache_create (struct kmem_cache *
 	if (!cachep->num)
 		return -E2BIG;
 
-	freelist_size =
-		ALIGN(cachep->num * sizeof(freelist_idx_t), cachep->align);
+	freelist_size = calculate_freelist_size(cachep->num, cachep->align);
 
 	/*
 	 * If the slab has been placed off-slab, and we have enough space then
@@ -2299,7 +2353,7 @@ __kmem_cache_create (struct kmem_cache *
 
 	if (flags & CFLGS_OFF_SLAB) {
 		/* really off slab. No need for manual alignment */
-		freelist_size = cachep->num * sizeof(freelist_idx_t);
+		freelist_size = calculate_freelist_size(cachep->num, 0);
 
 #ifdef CONFIG_PAGE_POISONING
 		/* If we're going to use the generic kernel_map_pages()
@@ -2625,6 +2679,7 @@ static void cache_init_objs(struct kmem_
 		if (cachep->ctor)
 			cachep->ctor(objp);
 #endif
+		set_obj_status(page, i, OBJECT_FREE);
 		set_free_obj(page, i, i);
 	}
 }
@@ -2833,6 +2888,7 @@ static void *cache_free_debugcheck(struc
 	BUG_ON(objnr >= cachep->num);
 	BUG_ON(objp != index_to_obj(cachep, page, objnr));
 
+	set_obj_status(page, objnr, OBJECT_FREE);
 	if (cachep->flags & SLAB_POISON) {
 #ifdef CONFIG_DEBUG_PAGEALLOC
 		if ((cachep->size % PAGE_SIZE)==0 && OFF_SLAB(cachep)) {
@@ -2966,6 +3022,8 @@ static inline void cache_alloc_debugchec
 static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
 				gfp_t flags, void *objp, unsigned long caller)
 {
+	struct page *page;
+
 	if (!objp)
 		return objp;
 	if (cachep->flags & SLAB_POISON) {
@@ -2996,6 +3054,9 @@ static void *cache_alloc_debugcheck_afte
 		*dbg_redzone1(cachep, objp) = RED_ACTIVE;
 		*dbg_redzone2(cachep, objp) = RED_ACTIVE;
 	}
+
+	page = virt_to_head_page(objp);
+	set_obj_status(page, obj_to_index(cachep, page, objp), OBJECT_ACTIVE);
 	objp += obj_offset(cachep);
 	if (cachep->ctor && cachep->flags & SLAB_POISON)
 		cachep->ctor(objp);
@@ -4232,21 +4293,12 @@ static void handle_slab(unsigned long *n
 						struct page *page)
 {
 	void *p;
-	int i, j;
+	int i;
 
 	if (n[0] == n[1])
 		return;
 	for (i = 0, p = page->s_mem; i < c->num; i++, p += c->size) {
-		bool active = true;
-
-		for (j = page->active; j < c->num; j++) {
-			/* Skip freed item */
-			if (get_free_obj(page, j) == i) {
-				active = false;
-				break;
-			}
-		}
-		if (!active)
+		if (get_obj_status(page, i) != OBJECT_ACTIVE)
 			continue;
 
 		if (!add_caller(n, (unsigned long)*dbg_userword(c, p)))



  parent reply	other threads:[~2014-06-24 15:51 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-24 15:50 [PATCH 3.15 00/61] 3.15.2-stable review Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 01/61] rtc: rtc-at91rm9200: fix infinite wait for ACKUPD irq Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 02/61] target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 03/61] iscsi-target: Reject mutual authentication with reflected CHAP_C Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 04/61] ima: audit log files opened with O_DIRECT flag Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 05/61] ima: introduce ima_kernel_read() Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 06/61] evm: prohibit userspace writing security.evm HMAC value Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 07/61] ipv6: Fix regression caused by efe4208 in udp_v6_mcast_next() Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 08/61] net: tunnels - enable module autoloading Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 09/61] sh_eth: use RNC mode for packet reception Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 10/61] sh_eth: fix SH7619/771x support Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 11/61] net: filter: fix typo in sparc BPF JIT Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 12/61] net: filter: fix sparc32 typo Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 14/61] net: force a list_del() in unregister_netdevice_many() Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 15/61] ipip, sit: fix ipv4_{update_pmtu,redirect} calls Greg Kroah-Hartman
2014-06-24 15:50 ` [PATCH 3.15 16/61] sfc: PIO:Restrict to 64bit arch and use 64-bit writes Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 17/61] ipv4: fix a race in ip4_datagram_release_cb() Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 18/61] sctp: Fix sk_ack_backlog wrap-around problem Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 19/61] rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 20/61] vxlan: use dev->needed_headroom instead of dev->hard_header_len Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 21/61] udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 22/61] ARM: at91: fix at91_sysirq_mask_rtc for sam9x5 SoCs Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 23/61] KVM: lapic: sync highest ISR to hardware apic on EOI Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 24/61] KVM: s390: Drop pending interrupts on guest exit Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 25/61] MIPS: KVM: Allocate at least 16KB for exception handlers Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 26/61] USB: cdc-acm: fix write and suspend race Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 27/61] USB: cdc-acm: fix write and resume race Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 28/61] USB: cdc-acm: fix broken runtime suspend Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 29/61] USB: cdc-acm: fix runtime PM for control messages Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 30/61] USB: cdc-acm: fix shutdown and suspend race Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 31/61] USB: cdc-acm: fix potential urb leak and PM imbalance in write Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 32/61] USB: cdc-acm: fix open and suspend race Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 33/61] USB: cdc-acm: fix failed open not being detected Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 34/61] USB: cdc-acm: fix I/O after failed open Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 35/61] USB: cdc-acm: fix runtime PM imbalance at shutdown Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 36/61] Drivers: hv: balloon: Ensure pressure reports are posted regularly Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 38/61] ASoC: dapm: Make sure to always update the DAPM graph in _put_volsw() Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 39/61] ASoC: max98090: Fix reset at resume time Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 40/61] ASoC: tlv320aci3x: Fix custom snd_soc_dapm_put_volsw_aic3x() function Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 41/61] iio:adc:max1363 incorrect resolutions for max11604, max11605, max11610 and max11611 Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 42/61] staging: iio: tsl2x7x_core: fix proximity treshold Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 43/61] iio: adc: checking for NULL instead of IS_ERR() in probe Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 44/61] iio: mxs-lradc: fix divider Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 45/61] iio: adc: at91: signedness bug in at91_adc_get_trigger_value_by_name() Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 46/61] iio: Fix endianness issue in ak8975_read_axis() Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 48/61] lzo: properly check for overruns Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 49/61] lz4: ensure length does not wrap Greg Kroah-Hartman
     [not found]   ` <CAFkuX4tQoRhsS2A5iJNWyMELs=sLhNx-m5Uq38R7fjSmGHfvvQ@mail.gmail.com>
2014-06-24 20:59     ` Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 50/61] ALSA: compress: Cancel the optimization of compiler and fix the size of struct for all platform Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 51/61] ALSA: hda/realtek - Add support of ALC891 codec Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 52/61] ALSA: hda/realtek - Add more entry for enable HP mute led Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 53/61] ALSA: hda - verify pin:converter connection on unsol event for HSW and VLV Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 55/61] ALSA: control: Protect user controls against concurrent access Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 56/61] ALSA: control: Fix replacing user controls Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 57/61] ALSA: control: Dont access controls outside of protected regions Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 58/61] ALSA: control: Handle numid overflow Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 59/61] ALSA: control: Make sure that id->index does not overflow Greg Kroah-Hartman
2014-06-24 15:51 ` [PATCH 3.15 60/61] tmpfs: ZERO_RANGE and COLLAPSE_RANGE not currently supported Greg Kroah-Hartman
2014-06-24 15:51 ` Greg Kroah-Hartman [this message]
2014-06-24 19:50 ` [PATCH 3.15 00/61] 3.15.2-stable review Shuah Khan
2014-06-24 19:58   ` Greg Kroah-Hartman
2014-06-25  9:00   ` Satoru Takeuchi
2014-06-26 19:09     ` Greg Kroah-Hartman
2014-06-24 23:31 ` Guenter Roeck
2014-06-26 19:09   ` Greg Kroah-Hartman
2014-06-26 20:34     ` Guenter Roeck
2014-06-25 14:20 ` Benjamin LaHaise
2014-06-25 14:27   ` Josh Boyer
2014-06-26 19:09   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140624154955.567948901@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=davej@redhat.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).