Re: [PATCH] zsmalloc: fix zs_can_compact() integer overflow

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	"[4.3+]" <stable@vger.kernel.org>
Subject: Re: [PATCH] zsmalloc: fix zs_can_compact() integer overflow
Date: Mon, 9 May 2016 17:41:55 +0900	[thread overview]
Message-ID: <20160509084155.GA507@swordfish> (raw)
In-Reply-To: <20160509080707.GB5434@blaptop>

Hello,

On (05/09/16 17:07), Minchan Kim wrote:
[..]
> > Depending on the circumstances, OBJ_ALLOCATED can become less
> > than OBJ_USED, which can result in either very high or negative
> > `total_scan' value calculated in do_shrink_slab().
> 
> So, do you see pr_err("shrink_slab: %pF negative objects xxxx)
> in vmscan.c and skip shrinking?

yes

 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62


> It would be better to explain what's the result without this patch
> and end-user effect for going -stable.

it seems that not every overflowed value returned from zs_can_compact()
is getting detected in do_shrink_slab():

	freeable = shrinker->count_objects(shrinker, shrinkctl);
	if (freeable == 0)
		return 0;

	/*
	 * copy the current shrinker scan count into a local variable
	 * and zero it so that other concurrent shrinker invocations
	 * don't also do this scanning work.
	 */
	nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);

	total_scan = nr;
	delta = (4 * nr_scanned) / shrinker->seeks;
	delta *= freeable;
	do_div(delta, nr_eligible + 1);
	total_scan += delta;
	if (total_scan < 0) {
		pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
		       shrinker->scan_objects, total_scan);
		total_scan = freeable;
	}

this calculation can hide the shrinker->count_objects() error. I added
some debugging code (on x86_64), and the output was:

[   59.041959] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.041963] vmscan: >> but total_scan > 0: 92679974445502
[   59.041964] vmscan: >> resulting total_scan: 92679974445502

[   59.192734] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.192737] vmscan: >> but total_scan > 0: 5830197242006811
[   59.192738] vmscan: >> resulting total_scan: 5830197242006811

[   59.259805] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.259809] vmscan: >> but total_scan > 0: 23649671889371219
[   59.259810] vmscan: >> resulting total_scan: 23649671889371219

[   76.279767] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   76.279770] vmscan: >> but total_scan > 0: 895907920044174
[   76.279771] vmscan: >> resulting total_scan: 895907920044174

[   84.807837] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   84.807841] vmscan: >> but total_scan > 0: 22634041808232578
[   84.807842] vmscan: >> resulting total_scan: 22634041808232578

so we can end up with insanely huge total_scan values.

[..]
> > @@ -2262,10 +2262,13 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)
> 
> It seems this patch is based on my old page migration work?
> It's not go to the mainline yet but your patch which fixes the bug should
> be supposed to go to the -stable. So, I hope this patch first.

oops... my fat fingers! good catch, thanks! I have two versions: for -next and
-mmots (with your LRU rework applied, indeed). somehow I managed to cd to the
wrong dir. sorry, will resend.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	"[4.3+]" <stable@vger.kernel.org>
Subject: Re: [PATCH] zsmalloc: fix zs_can_compact() integer overflow
Date: Mon, 9 May 2016 17:41:55 +0900	[thread overview]
Message-ID: <20160509084155.GA507@swordfish> (raw)
In-Reply-To: <20160509080707.GB5434@blaptop>

Hello,

On (05/09/16 17:07), Minchan Kim wrote:
[..]
> > Depending on the circumstances, OBJ_ALLOCATED can become less
> > than OBJ_USED, which can result in either very high or negative
> > `total_scan' value calculated in do_shrink_slab().
> 
> So, do you see pr_err("shrink_slab: %pF negative objects xxxx)
> in vmscan.c and skip shrinking?

yes

 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62


> It would be better to explain what's the result without this patch
> and end-user effect for going -stable.

it seems that not every overflowed value returned from zs_can_compact()
is getting detected in do_shrink_slab():

	freeable = shrinker->count_objects(shrinker, shrinkctl);
	if (freeable == 0)
		return 0;

	/*
	 * copy the current shrinker scan count into a local variable
	 * and zero it so that other concurrent shrinker invocations
	 * don't also do this scanning work.
	 */
	nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);

	total_scan = nr;
	delta = (4 * nr_scanned) / shrinker->seeks;
	delta *= freeable;
	do_div(delta, nr_eligible + 1);
	total_scan += delta;
	if (total_scan < 0) {
		pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
		       shrinker->scan_objects, total_scan);
		total_scan = freeable;
	}

this calculation can hide the shrinker->count_objects() error. I added
some debugging code (on x86_64), and the output was:

[   59.041959] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.041963] vmscan: >> but total_scan > 0: 92679974445502
[   59.041964] vmscan: >> resulting total_scan: 92679974445502

[   59.192734] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.192737] vmscan: >> but total_scan > 0: 5830197242006811
[   59.192738] vmscan: >> resulting total_scan: 5830197242006811

[   59.259805] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.259809] vmscan: >> but total_scan > 0: 23649671889371219
[   59.259810] vmscan: >> resulting total_scan: 23649671889371219

[   76.279767] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   76.279770] vmscan: >> but total_scan > 0: 895907920044174
[   76.279771] vmscan: >> resulting total_scan: 895907920044174

[   84.807837] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   84.807841] vmscan: >> but total_scan > 0: 22634041808232578
[   84.807842] vmscan: >> resulting total_scan: 22634041808232578

so we can end up with insanely huge total_scan values.

[..]
> > @@ -2262,10 +2262,13 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)
> 
> It seems this patch is based on my old page migration work?
> It's not go to the mainline yet but your patch which fixes the bug should
> be supposed to go to the -stable. So, I hope this patch first.

oops... my fat fingers! good catch, thanks! I have two versions: for -next and
-mmots (with your LRU rework applied, indeed). somehow I managed to cd to the
wrong dir. sorry, will resend.

	-ss

WARNING: multiple messages have this Message-ID (diff)

From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	"[4.3+]" <stable@vger.kernel.org>
Subject: Re: [PATCH] zsmalloc: fix zs_can_compact() integer overflow
Date: Mon, 9 May 2016 17:52:31 +0900	[thread overview]
Message-ID: <20160509084155.GA507@swordfish> (raw)
In-Reply-To: <20160509080707.GB5434@blaptop>

Hello,

On (05/09/16 17:07), Minchan Kim wrote:
[..]
> > Depending on the circumstances, OBJ_ALLOCATED can become less
> > than OBJ_USED, which can result in either very high or negative
> > `total_scan' value calculated in do_shrink_slab().
> 
> So, do you see pr_err("shrink_slab: %pF negative objects xxxx)
> in vmscan.c and skip shrinking?

yes

 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 : vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62


> It would be better to explain what's the result without this patch
> and end-user effect for going -stable.

the problem is that it seems that not every overflowed value returned from
zs_can_compact() is getting detected in do_shrink_slab():


	freeable = shrinker->count_objects(shrinker, shrinkctl);
	if (freeable == 0)
		return 0;

	/*
	 * copy the current shrinker scan count into a local variable
	 * and zero it so that other concurrent shrinker invocations
	 * don't also do this scanning work.
	 */
	nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);

	total_scan = nr;
	delta = (4 * nr_scanned) / shrinker->seeks;
	delta *= freeable;
	do_div(delta, nr_eligible + 1);
	total_scan += delta;
	if (total_scan < 0) {
		pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
		       shrinker->scan_objects, total_scan);
		total_scan = freeable;
	}


this calculation can hide the shrinker->count_objects() error. I added
some debugging code (on x86_64), and the output was:


[   59.041959] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.041963] vmscan: >> but total_scan > 0: 92679974445502
[   59.041964] vmscan: >> resulting total_scan: 92679974445502

[   59.192734] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.192737] vmscan: >> but total_scan > 0: 5830197242006811
[   59.192738] vmscan: >> resulting total_scan: 5830197242006811

[   59.259805] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   59.259809] vmscan: >> but total_scan > 0: 23649671889371219
[   59.259810] vmscan: >> resulting total_scan: 23649671889371219

[   76.279767] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   76.279770] vmscan: >> but total_scan > 0: 895907920044174
[   76.279771] vmscan: >> resulting total_scan: 895907920044174

[   84.807837] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[   84.807841] vmscan: >> but total_scan > 0: 22634041808232578
[   84.807842] vmscan: >> resulting total_scan: 22634041808232578


so we can end up with insanely huge total_scan, which is then used in
this while loop:

	while (total_scan >= batch_size ||
	       total_scan >= freeable) {
		unsigned long ret;
		unsigned long nr_to_scan = min(batch_size, total_scan);

		shrinkctl->nr_to_scan = nr_to_scan;
		ret = shrinker->scan_objects(shrinker, shrinkctl);
		if (ret == SHRINK_STOP)
			break;
		freed += ret;

		count_vm_events(SLABS_SCANNED, nr_to_scan);
		total_scan -= nr_to_scan;

		cond_resched();
	}

`total_scan >= batch_size' is true for a very-very long time, I guess.
'total_scan >= freeable' is also true for quite some time: freeable is `< 0'
and total_scan is 18446744073709551615, for example. so it's up to
shrinker->scan_objects() == SHRINK_STOP test, which is, I assume, a bit
too weak to rely on. so that's why I Cc'd -stable.

[..]
> > @@ -2262,10 +2262,13 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)
> 
> It seems this patch is based on my old page migration work?
> It's not go to the mainline yet but your patch which fixes the bug should
> be supposed to go to the -stable. So, I hope this patch first.

oops... my fat fingers! good catch, thanks! I have two versions: for -next and
-mmots (with your LRU rework applied, indeed). somehow I managed to cd to the
wrong dir. sorry, will resend.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-05-09  8:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-09  7:35 [PATCH] zsmalloc: fix zs_can_compact() integer overflow Sergey Senozhatsky
2016-05-09  7:35 ` Sergey Senozhatsky
2016-05-09  8:07 ` Minchan Kim
2016-05-09  8:07   ` Minchan Kim
2016-05-09  8:41   ` Sergey Senozhatsky [this message]
2016-05-09  8:52     ` Sergey Senozhatsky
2016-05-09  8:41     ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160509084155.GA507@swordfish \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.