From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753864AbcEIXls (ORCPT ); Mon, 9 May 2016 19:41:48 -0400 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:35233 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753710AbcEIXlp (ORCPT ); Mon, 9 May 2016 19:41:45 -0400 X-Original-SENDERIP: 156.147.1.125 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.203 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Tue, 10 May 2016 08:42:05 +0900 From: Minchan Kim To: Sergey Senozhatsky CC: Andrew Morton , , , Sergey Senozhatsky , "[4.3+]" Subject: Re: [PATCH v2] zsmalloc: fix zs_can_compact() integer overflow Message-ID: <20160509234205.GB4426@bbox> References: <20160509140052.3389-1-sergey.senozhatsky@gmail.com> MIME-Version: 1.0 In-Reply-To: <20160509140052.3389-1-sergey.senozhatsky@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB02/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/05/10 08:41:41, Serialize by Router on LGEKRMHUB02/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/05/10 08:41:41, Serialize complete at 2016/05/10 08:41:41 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 09, 2016 at 11:00:52PM +0900, Sergey Senozhatsky wrote: > zs_can_compact() has two race conditions in its core calculation: > > unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) - > zs_stat_get(class, OBJ_USED); > > 1) classes are not locked, so the numbers of allocated and used > objects can change by the concurrent ops happening on other CPUs > 2) shrinker invokes it from preemptible context > > Depending on the circumstances, thus, OBJ_ALLOCATED can become > less than OBJ_USED, which can result in either very high or > negative `total_scan' value calculated later in do_shrink_slab(). > > do_shrink_slab() has some logic to prevent those cases: > > vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 > vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 > vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64 > vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 > vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 > vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 > > However, due to the way `total_scan' is calculated, not every > shrinker->count_objects() overflow can be spotted and handled. > To demonstrate the latter, I added some debugging code to do_shrink_slab() > (x86_64) and the results were: > > vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] > vmscan: but total_scan > 0: 92679974445502 > vmscan: resulting total_scan: 92679974445502 > [..] > vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] > vmscan: but total_scan > 0: 22634041808232578 > vmscan: resulting total_scan: 22634041808232578 > > Even though shrinker->count_objects() has returned an overflowed value, > the resulting `total_scan' is positive, and, what is more worrisome, it > is insanely huge. This value is getting used later on in > shrinker->scan_objects() loop: > > while (total_scan >= batch_size || > total_scan >= freeable) { > unsigned long ret; > unsigned long nr_to_scan = min(batch_size, total_scan); > > shrinkctl->nr_to_scan = nr_to_scan; > ret = shrinker->scan_objects(shrinker, shrinkctl); > if (ret == SHRINK_STOP) > break; > freed += ret; > > count_vm_events(SLABS_SCANNED, nr_to_scan); > total_scan -= nr_to_scan; > > cond_resched(); > } > > `total_scan >= batch_size' is true for a very-very long time and > 'total_scan >= freeable' is also true for quite some time, because > `freeable < 0' and `total_scan' is large enough, for example, > 22634041808232578. The only break condition, in the given scheme of > things, is shrinker->scan_objects() == SHRINK_STOP test, which is a > bit too weak to rely on, especially in heavy zsmalloc-usage scenarios. > > To fix the issue, take a pool stat snapshot and use it instead of > racy zs_stat_get() calls. > > Signed-off-by: Sergey Senozhatsky > Cc: Minchan Kim > Cc: [4.3+] Acked-by: Minchan Kim Thanks!