public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.4 41/73] zsmalloc: fix zs_can_compact() integer overflow
Date: Mon, 16 May 2016 18:15:11 -0700	[thread overview]
Message-ID: <20160517011453.764919619@linuxfoundation.org> (raw)
In-Reply-To: <20160517011451.827433776@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

commit 44f43e99fe70833058482d183e99fdfd11220996 upstream.

zs_can_compact() has two race conditions in its core calculation:

unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
				zs_stat_get(class, OBJ_USED);

1) classes are not locked, so the numbers of allocated and used
   objects can change by the concurrent ops happening on other CPUs
2) shrinker invokes it from preemptible context

Depending on the circumstances, thus, OBJ_ALLOCATED can become
less than OBJ_USED, which can result in either very high or
negative `total_scan' value calculated later in do_shrink_slab().

do_shrink_slab() has some logic to prevent those cases:

 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62

However, due to the way `total_scan' is calculated, not every
shrinker->count_objects() overflow can be spotted and handled.
To demonstrate the latter, I added some debugging code to do_shrink_slab()
(x86_64) and the results were:

 vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
 vmscan: but total_scan > 0: 92679974445502
 vmscan: resulting total_scan: 92679974445502
[..]
 vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
 vmscan: but total_scan > 0: 22634041808232578
 vmscan: resulting total_scan: 22634041808232578

Even though shrinker->count_objects() has returned an overflowed value,
the resulting `total_scan' is positive, and, what is more worrisome, it
is insanely huge. This value is getting used later on in
shrinker->scan_objects() loop:

        while (total_scan >= batch_size ||
               total_scan >= freeable) {
                unsigned long ret;
                unsigned long nr_to_scan = min(batch_size, total_scan);

                shrinkctl->nr_to_scan = nr_to_scan;
                ret = shrinker->scan_objects(shrinker, shrinkctl);
                if (ret == SHRINK_STOP)
                        break;
                freed += ret;

                count_vm_events(SLABS_SCANNED, nr_to_scan);
                total_scan -= nr_to_scan;

                cond_resched();
        }

`total_scan >= batch_size' is true for a very-very long time and
'total_scan >= freeable' is also true for quite some time, because
`freeable < 0' and `total_scan' is large enough, for example,
22634041808232578. The only break condition, in the given scheme of
things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.

To fix the issue, take a pool stat snapshot and use it instead of
racy zs_stat_get() calls.

Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/zsmalloc.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1732,10 +1732,13 @@ static struct page *isolate_source_page(
 static unsigned long zs_can_compact(struct size_class *class)
 {
 	unsigned long obj_wasted;
+	unsigned long obj_allocated = zs_stat_get(class, OBJ_ALLOCATED);
+	unsigned long obj_used = zs_stat_get(class, OBJ_USED);
 
-	obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
-		zs_stat_get(class, OBJ_USED);
+	if (obj_allocated <= obj_used)
+		return 0;
 
+	obj_wasted = obj_allocated - obj_used;
 	obj_wasted /= get_maxobj_per_zspage(class->size,
 			class->pages_per_zspage);
 

  parent reply	other threads:[~2016-05-17  1:49 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-17  1:14 [PATCH 4.4 00/73] 4.4.11-stable review Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 01/73] decnet: Do not build routes to devices without decnet private data Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 02/73] route: do not cache fib route info on local routes with oif Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 03/73] packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 04/73] net: sched: do not requeue a NULL skb Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 05/73] bpf/verifier: reject invalid LD_ABS | BPF_DW instruction Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 08/73] net: use skb_postpush_rcsum instead of own implementations Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 09/73] vlan: pull on __vlan_insert_tag error path and fix csum correction Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 10/73] atl2: Disable unimplemented scatter/gather feature Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 11/73] openvswitch: use flow protocol when recalculating ipv6 checksums Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 12/73] net/mlx5e: Devices mtu field is u16 and not int Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 13/73] net/mlx5e: Fix minimum MTU Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 14/73] ipv4/fib: dont warn when primary address is missing if in_dev is dead Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 15/73] net/mlx4_en: fix spurious timestamping callbacks Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 16/73] bpf: fix double-fdput in replace_map_fd_with_map_ptr() Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 17/73] bpf: fix refcnt overflow Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 18/73] bpf: fix check_map_func_compatibility logic Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 19/73] samples/bpf: fix trace_output example Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 20/73] net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 21/73] gre: do not pull header in ICMP error processing Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 22/73] net_sched: introduce qdisc_replace() helper Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 23/73] net_sched: update hierarchical backlog too Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 24/73] sch_htb: update backlog as well Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 25/73] sch_dsmark: " Greg Kroah-Hartman
2016-05-17  1:14 ` [PATCH 4.4 26/73] netem: Segment GSO packets on enqueue Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 30/73] net/mlx4_en: Fix endianness bug in IPV6 csum calculation Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 31/73] VSOCK: do not disconnect socket when peer has shutdown SEND only Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 32/73] net: bridge: fix old ioctl unlocked net device walk Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 35/73] net: fix a kernel infoleak in x25 module Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 36/73] net: thunderx: avoid exposing kernel stack Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 37/73] tcp: refresh skb timestamp at retransmit time Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 38/73] net/route: enforce hoplimit max value Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 39/73] ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 40/73] ocfs2: fix posix_acl_create deadlock Greg Kroah-Hartman
2016-05-17  1:15 ` Greg Kroah-Hartman [this message]
2016-05-17  1:15 ` [PATCH 4.4 42/73] s390/mm: fix asce_bits handling with dynamic pagetable levels Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 43/73] crypto: qat - fix invalid pf2vf_resp_wq logic Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 44/73] crypto: hash - Fix page length clamping in hash walk Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 45/73] crypto: testmgr - Use kmalloc memory for RSA input Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 46/73] ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2) Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 47/73] ALSA: usb-audio: Yet another Phoneix Audio device quirk Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 48/73] ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 49/73] ALSA: hda - Fix white noise on Asus UX501VW headset Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 50/73] ALSA: hda - Fix broken reconfig Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 51/73] spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 52/73] spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 53/73] spi: spi-ti-qspi: Handle truncated frames properly Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 54/73] pinctrl: at91-pio4: fix pull-up/down logic Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 55/73] regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 56/73] perf/core: Disable the event on a truncated AUX record Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 57/73] vfs: add vfs_select_inode() helper Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 58/73] vfs: rename: check backing inode being equal Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 59/73] ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 61/73] regulator: s2mps11: Fix invalid selector mask and voltages for buck9 Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 62/73] regulator: axp20x: Fix axp22x ldo_io voltage ranges Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 63/73] atomic_open(): fix the handling of create_error Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 64/73] qla1280: Dont allocate 512kb of host tags Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 65/73] tools lib traceevent: Do not reassign parg after collapse_tree() Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 66/73] get_rock_ridge_filename(): handle malformed NM entries Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 67/73] Input: max8997-haptic - fix NULL pointer dereference Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 68/73] Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing" Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 69/73] drm/radeon: fix PLL sharing on DCE6.1 (v2) Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 70/73] drm/i915: Bail out of pipe config compute loop on LPT Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 72/73] drm/radeon: fix DP link training issue with second 4K monitor Greg Kroah-Hartman
2016-05-17  1:15 ` [PATCH 4.4 73/73] nf_conntrack: avoid kernel pointer value leak in slab name Greg Kroah-Hartman
2016-05-17 17:27 ` [PATCH 4.4 00/73] 4.4.11-stable review Guenter Roeck
2016-05-17 17:28 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160517011453.764919619@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox