stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Hugh Dickins <hughd@google.com>
Subject: [ 19/61] swap: fix shmem swapping when more than 8 areas
Date: Wed, 20 Jun 2012 10:30:39 -0700	[thread overview]
Message-ID: <20120620173022.597329751@linuxfoundation.org> (raw)
In-Reply-To: <20120620173033.GA5634@kroah.com>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit 9b15b817f3d62409290fd56fe3cbb076a931bb0a upstream.

Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.

Whoops.  Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there.  Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.

Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().

This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE.  It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.

Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check.  Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.

Reported-and-Reviewed-and-Tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/swapops.h |    8 +++++---
 mm/swapfile.c           |   12 ++++--------
 2 files changed, 9 insertions(+), 11 deletions(-)

--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -9,13 +9,15 @@
  * get good packing density in that tree, so the index should be dense in
  * the low-order bits.
  *
- * We arrange the `type' and `offset' fields so that `type' is at the five
+ * We arrange the `type' and `offset' fields so that `type' is at the seven
  * high-order bits of the swp_entry_t and `offset' is right-aligned in the
- * remaining bits.
+ * remaining bits.  Although `type' itself needs only five bits, we allow for
+ * shmem/tmpfs to shift it all up a further two bits: see swp_to_radix_entry().
  *
  * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
  */
-#define SWP_TYPE_SHIFT(e)	(sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT)
+#define SWP_TYPE_SHIFT(e)	((sizeof(e.val) * 8) - \
+			(MAX_SWAPFILES_SHIFT + RADIX_TREE_EXCEPTIONAL_SHIFT))
 #define SWP_OFFSET_MASK(e)	((1UL << SWP_TYPE_SHIFT(e)) - 1)
 
 /*
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1924,24 +1924,20 @@ static unsigned long read_swap_header(st
 
 	/*
 	 * Find out how many pages are allowed for a single swap
-	 * device. There are three limiting factors: 1) the number
+	 * device. There are two limiting factors: 1) the number
 	 * of bits for the swap offset in the swp_entry_t type, and
 	 * 2) the number of bits in the swap pte as defined by the
-	 * the different architectures, and 3) the number of free bits
-	 * in an exceptional radix_tree entry. In order to find the
+	 * different architectures. In order to find the
 	 * largest possible bit mask, a swap entry with swap type 0
 	 * and swap offset ~0UL is created, encoded to a swap pte,
 	 * decoded to a swp_entry_t again, and finally the swap
 	 * offset is extracted. This will mask all the bits from
 	 * the initial ~0UL mask that can't be encoded in either
 	 * the swp_entry_t or the architecture definition of a
-	 * swap pte.  Then the same is done for a radix_tree entry.
+	 * swap pte.
 	 */
 	maxpages = swp_offset(pte_to_swp_entry(
-			swp_entry_to_pte(swp_entry(0, ~0UL))));
-	maxpages = swp_offset(radix_to_swp_entry(
-			swp_to_radix_entry(swp_entry(0, maxpages)))) + 1;
-
+			swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1;
 	if (maxpages > swap_header->info.last_page) {
 		maxpages = swap_header->info.last_page + 1;
 		/* p->max is an unsigned int: don't overflow it */



  parent reply	other threads:[~2012-06-20 17:30 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 17:30 [ 00/61] 3.4.4-stable review Greg KH
2012-06-20 17:30 ` [ 01/61] ARM i.MX53: Fix PLL4 base address Greg KH
2012-06-20 17:30 ` [ 02/61] ARM: imx6: exit coherency when shutting down a cpu Greg KH
2012-06-20 17:30 ` [ 03/61] ARM i.MX imx21ads: Fix overlapping static i/o mappings Greg KH
2012-06-20 17:30 ` [ 04/61] Revert "drm/i915/dp: Use auxch precharge value of 5 everywhere" Greg KH
2012-06-20 18:51   ` Adam Jackson
2012-06-20 19:01     ` Greg KH
2012-06-21 11:48       ` Wouter M. Koolen
2012-06-20 17:30 ` [ 05/61] drm/radeon: add some additional 6xx/7xx/EG register init Greg KH
2012-06-20 17:30 ` [ 06/61] drm via: initialize object_idr Greg KH
2012-06-20 17:30 ` [ 07/61] drm/udl: only bind to the video devices on the hub Greg KH
2012-06-20 17:30 ` [ 08/61] drm sis: initialize object_idr Greg KH
2012-06-20 17:30 ` [ 09/61] xen/hvc: Collapse error logic Greg KH
2012-06-20 17:30 ` [ 10/61] xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN Greg KH
2012-06-20 17:30 ` [ 11/61] xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness Greg KH
2012-06-20 17:30 ` [ 12/61] xen/setup: filter APERFMPERF cpuid feature out Greg KH
2012-06-20 17:30 ` [ 13/61] NFSv4.1: Fix a request leak on the back channel Greg KH
2012-06-20 17:30 ` [ 14/61] NFSv4: Fix unnecessary delegation returns in nfs4_do_open Greg KH
2012-06-20 17:30 ` [ 15/61] nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels Greg KH
2012-06-20 17:30 ` [ 16/61] tracing: Have tracing_off() actually turn tracing off Greg KH
2012-06-20 17:30 ` [ 17/61] rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer Greg KH
2012-06-20 17:30 ` [ 18/61] SCSI: mpt2sas: Fix unsafe using smp_processor_id() in preemptible Greg KH
2012-06-20 17:30 ` Greg KH [this message]
2012-06-20 17:30 ` [ 20/61] USB: option: Add Vodafone/Huawei K5005 support Greg KH
2012-06-20 17:30 ` [ 21/61] USB: option: Updated Huawei K4605 has better id Greg KH
2012-06-20 17:30 ` [ 22/61] USB: option: add more YUGA device ids Greg KH
2012-06-20 17:30 ` [ 23/61] USB: option: fix memory leak Greg KH
2012-06-20 17:30 ` [ 24/61] USB: option: fix port-data abuse Greg KH
2012-06-20 17:30 ` [ 25/61] kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() Greg KH
2012-06-20 17:30 ` [ 26/61] hfsplus: fix overflow in sector calculations in hfsplus_submit_bio Greg KH
2012-06-20 17:30 ` [ 27/61] hfsplus: fix bless ioctl when used with hardlinks Greg KH
2012-06-20 17:30 ` [ 28/61] Make hard_irq_disable() actually hard-disable interrupts Greg KH
2012-06-20 17:30 ` [ 29/61] xhci: Fix invalid loop check in xhci_free_tt_info() Greg KH
2012-06-20 17:30 ` [ 30/61] xhci: Dont free endpoints in xhci_mem_cleanup() Greg KH
2012-06-20 17:30 ` [ 31/61] xHCI: Increase the timeout for controller save/restore state operation Greg KH
2012-06-20 17:30 ` [ 32/61] usb-storage: Add 090c:1000 to unusal-devs Greg KH
2012-06-20 17:30 ` [ 33/61] USB: mos7840: Fix compilation of usb serial driver Greg KH
2012-06-20 17:30 ` [ 34/61] USB: qcserial: Add Sierra Wireless device IDs Greg KH
2012-06-20 17:30 ` [ 35/61] USB: mct_u232: Fix incorrect TIOCMSET return Greg KH
2012-06-20 17:30 ` [ 36/61] usb: musb: davinci: Fix build breakage Greg KH
2012-06-20 17:30 ` [ 37/61] usb: musb_gadget: fix crash caused by dangling pointer Greg KH
2012-06-20 17:30 ` [ 38/61] USB: fix PS3 EHCI systems Greg KH
2012-06-20 17:30 ` [ 39/61] USB: serial: cp210x: add Optris MS Pro usb id Greg KH
2012-06-20 17:31 ` [ 40/61] USB: ftdi-sio: Add support for RT Systems USB-RTS01 serial adapter Greg KH
2012-06-20 17:31 ` [ 41/61] USB: add NO_D3_DURING_SLEEP flag and revert 151b61284776be2 Greg KH
2012-06-20 17:31 ` [ 42/61] USB: cdc-wdm: Add Vodafone/Huawei K5005 support Greg KH
2012-06-20 17:31 ` [ 43/61] usb: cdc-acm: fix devices not unthrottled on open Greg KH
2012-06-20 17:31 ` [ 44/61] USB: serial: sierra: Add support for Sierra Wireless AirCard 320U modem Greg KH
2012-06-20 17:31 ` [ 45/61] USB: serial: Enforce USB driver and USB serial driver match Greg KH
2012-06-20 17:31 ` [ 46/61] USB: fix gathering of interface associations Greg KH
2012-06-20 17:31 ` [ 47/61] ASoC: wm8904: Fix GPIO and MICBIAS initialisation for regmap conversion Greg KH
2012-06-20 17:31 ` [ 48/61] hwrng: atmel-rng - fix data valid check Greg KH
2012-06-20 17:31 ` [ 49/61] edac: avoid mce decoding crash after edac driver unloaded Greg KH
2012-06-20 17:31 ` [ 50/61] edac: fix the error about memory type detection on SandyBridge Greg KH
2012-06-20 17:31 ` [ 51/61] 9p: BUG before corrupting memory Greg KH
2012-06-20 17:31 ` [ 52/61] remoteproc/omap: fix dev_err typo Greg KH
2012-06-20 17:31 ` [ 53/61] remoteproc: fix print format warnings Greg KH
2012-06-20 17:31 ` [ 54/61] remoteproc: fix missing fault indication in error-path Greg KH
2012-06-20 17:31 ` [ 55/61] e1000e: Disable ASPM L1 on 82574 Greg KH
2012-06-20 17:31 ` [ 56/61] e1000e: Remove special case for 82573/82574 ASPM L1 disablement Greg KH
2012-06-20 17:31 ` [ 57/61] ntp: Correct TAI offset during leap second Greg KH
2012-06-20 17:31 ` [ 58/61] iwlwifi: fix the Transmit Frame Descriptor rings Greg KH
2012-06-20 17:31 ` [ 59/61] iwlwifi: use correct supported firmware for 6035 and 6000g2 Greg KH
2012-06-20 17:31 ` [ 60/61] iwlwifi: fix TX power antenna access Greg KH
2012-06-20 17:31 ` [ 61/61] target: Return error to initiator if SET TARGET PORT GROUPS emulation fails Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120620173022.597329751@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).