From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753864AbcEIXls (ORCPT <rfc822;w@1wt.eu>);
	Mon, 9 May 2016 19:41:48 -0400
Received: from LGEAMRELO13.lge.com ([156.147.23.53]:35233 "EHLO
	lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753710AbcEIXlp (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 9 May 2016 19:41:45 -0400
X-Original-SENDERIP: 156.147.1.125
X-Original-MAILFROM: minchan@kernel.org
X-Original-SENDERIP: 165.244.98.203
X-Original-MAILFROM: minchan@kernel.org
X-Original-SENDERIP: 10.177.223.161
X-Original-MAILFROM: minchan@kernel.org
Date: Tue, 10 May 2016 08:42:05 +0900
From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>, <linux-kernel@vger.kernel.org>,
        <linux-mm@kvack.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
        "[4.3+]" <stable@vger.kernel.org>
Subject: Re: [PATCH v2] zsmalloc: fix zs_can_compact() integer overflow
Message-ID: <20160509234205.GB4426@bbox>
References: <20160509140052.3389-1-sergey.senozhatsky@gmail.com>
MIME-Version: 1.0
In-Reply-To: <20160509140052.3389-1-sergey.senozhatsky@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB02/LGE/LG Group(Release 8.5.3FP6|November
 21, 2013) at 2016/05/10 08:41:41,
	Serialize by Router on LGEKRMHUB02/LGE/LG Group(Release 8.5.3FP6|November
 21, 2013) at 2016/05/10 08:41:41,
	Serialize complete at 2016/05/10 08:41:41
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, May 09, 2016 at 11:00:52PM +0900, Sergey Senozhatsky wrote:
> zs_can_compact() has two race conditions in its core calculation:
> 
> unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> 				zs_stat_get(class, OBJ_USED);
> 
> 1) classes are not locked, so the numbers of allocated and used
>    objects can change by the concurrent ops happening on other CPUs
> 2) shrinker invokes it from preemptible context
> 
> Depending on the circumstances, thus, OBJ_ALLOCATED can become
> less than OBJ_USED, which can result in either very high or
> negative `total_scan' value calculated later in do_shrink_slab().
> 
> do_shrink_slab() has some logic to prevent those cases:
> 
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
> 
> However, due to the way `total_scan' is calculated, not every
> shrinker->count_objects() overflow can be spotted and handled.
> To demonstrate the latter, I added some debugging code to do_shrink_slab()
> (x86_64) and the results were:
> 
>  vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
>  vmscan: but total_scan > 0: 92679974445502
>  vmscan: resulting total_scan: 92679974445502
> [..]
>  vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
>  vmscan: but total_scan > 0: 22634041808232578
>  vmscan: resulting total_scan: 22634041808232578
> 
> Even though shrinker->count_objects() has returned an overflowed value,
> the resulting `total_scan' is positive, and, what is more worrisome, it
> is insanely huge. This value is getting used later on in
> shrinker->scan_objects() loop:
> 
>         while (total_scan >= batch_size ||
>                total_scan >= freeable) {
>                 unsigned long ret;
>                 unsigned long nr_to_scan = min(batch_size, total_scan);
> 
>                 shrinkctl->nr_to_scan = nr_to_scan;
>                 ret = shrinker->scan_objects(shrinker, shrinkctl);
>                 if (ret == SHRINK_STOP)
>                         break;
>                 freed += ret;
> 
>                 count_vm_events(SLABS_SCANNED, nr_to_scan);
>                 total_scan -= nr_to_scan;
> 
>                 cond_resched();
>         }
> 
> `total_scan >= batch_size' is true for a very-very long time and
> 'total_scan >= freeable' is also true for quite some time, because
> `freeable < 0' and `total_scan' is large enough, for example,
> 22634041808232578. The only break condition, in the given scheme of
> things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
> bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
> 
> To fix the issue, take a pool stat snapshot and use it instead of
> racy zs_stat_get() calls.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: <stable@vger.kernel.org>        [4.3+]
Acked-by: Minchan Kim <minchan@kernel.org>

Thanks!