From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
Sasha Levin <sasha.levin@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Konstantin Khlebnikov <koct9i@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Lukas Czerner <lczerner@redhat.com>,
Dave Jones <davej@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.15 031/109] shmem: fix splicing from a hole while its punched
Date: Sat, 26 Jul 2014 12:01:53 -0700 [thread overview]
Message-ID: <20140726190224.764325109@linuxfoundation.org> (raw)
In-Reply-To: <20140726190223.834037485@linuxfoundation.org>
3.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins <hughd@google.com>
commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream.
shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.
But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.
shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.
shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.
We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?
The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.
Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.
Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.
But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/shmem.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -468,23 +468,20 @@ static void shmem_undo_range(struct inod
return;
index = start;
- for ( ; ; ) {
+ while (index < end) {
cond_resched();
pvec.nr = find_get_entries(mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE),
pvec.pages, indices);
if (!pvec.nr) {
- if (index == start || unfalloc)
+ /* If all gone or hole-punch or unfalloc, we're done */
+ if (index == start || end != -1)
break;
+ /* But if truncating, restart to make sure all gone */
index = start;
continue;
}
- if ((index == start || unfalloc) && indices[0] >= end) {
- pagevec_remove_exceptionals(&pvec);
- pagevec_release(&pvec);
- break;
- }
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -496,8 +493,12 @@ static void shmem_undo_range(struct inod
if (radix_tree_exceptional_entry(page)) {
if (unfalloc)
continue;
- nr_swaps_freed += !shmem_free_swap(mapping,
- index, page);
+ if (shmem_free_swap(mapping, index, page)) {
+ /* Swap was replaced by page: retry */
+ index--;
+ break;
+ }
+ nr_swaps_freed++;
continue;
}
@@ -506,6 +507,11 @@ static void shmem_undo_range(struct inod
if (page->mapping == mapping) {
VM_BUG_ON_PAGE(PageWriteback(page), page);
truncate_inode_page(mapping, page);
+ } else {
+ /* Page was replaced by swap: retry */
+ unlock_page(page);
+ index--;
+ break;
}
}
unlock_page(page);
next prev parent reply other threads:[~2014-07-26 19:35 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-26 19:01 [PATCH 3.15 000/109] 3.15.7-stable review Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 001/109] usb: Check if port status is equal to RxDetect Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 002/109] usb: chipidea: udc: Disable auto ZLP generation on ep0 Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 003/109] media: gspca_pac7302: Add new usb-id for Genius i-Look 317 Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 004/109] ALSA: hda - Revert stream assignment order for Intel controllers Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 005/109] ALSA: hda - Fix broken PM due to incomplete i915 initialization Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 006/109] Drivers: hv: hv_fcopy: fix a race condition for SMP guest Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 007/109] Drivers: hv: util: Fix a bug in the KVP code Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 008/109] Revert "Bluetooth: Add a new PID/VID 0cf3/e005 for AR3012." Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 009/109] Bluetooth: Ignore H5 non-link packets in non-active state Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 010/109] fuse: timeout comparison fix Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 011/109] fuse: avoid scheduling while atomic Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 012/109] fuse: handle large user and group ID Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 013/109] fuse: ignore entry-timeout on LOOKUP_REVAL Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 014/109] iio:core: Handle error when mask type is not separate Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 015/109] tracing: instance_rmdir() leaks ftrace_event_file->filter Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 016/109] tracing: Fix graph tracer with stack tracer on other archs Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 017/109] tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 018/109] tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 019/109] xen/balloon: set ballooned out pages as invalid in p2m Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 020/109] xen/manage: fix potential deadlock when resuming the console Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 021/109] hwmon: (da9055) Dont use dash in the name attribute Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 022/109] hwmon: (da9052) " Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 023/109] hwmon: (adt7470) Fix writes to temperature limit registers Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 024/109] igb: Workaround for i210 Errata 25: Slow System Clock Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 025/109] igb: do a reset on SR-IOV re-init if device is down Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 026/109] quota: missing lock in dqcache_shrink_scan() Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 027/109] iwlwifi: update the 7265 series HW IDs Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 028/109] iwlwifi: dvm: dont enable CTS to self Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 029/109] shmem: fix faulting into a hole while its punched Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 030/109] shmem: fix faulting into a hole, not taking i_mutex Greg Kroah-Hartman
2014-07-26 19:01 ` Greg Kroah-Hartman [this message]
2014-07-26 19:01 ` [PATCH 3.15 032/109] net/mlx4_core: Fix the error flow when probing with invalid VF configuration Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 033/109] net/mlx4_en: Dont configure the HW vxlan parser when vxlan offloading isnt set Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 034/109] ip_tunnel: fix ip_tunnel_lookup Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 035/109] slip: Fix deadlock in write_wakeup Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 036/109] slcan: Port write_wakeup deadlock fix from slip Greg Kroah-Hartman
2014-07-26 19:01 ` [PATCH 3.15 037/109] net: sctp: propagate sysctl errors from proc_do* properly Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 038/109] net: filter: fix upper BPF instruction limit Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 039/109] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 040/109] net: sctp: check proc_dointvec result in proc_sctp_do_auth Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 041/109] 8021q: fix a potential memory leak Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 042/109] drivers: net: cpsw: fix dual EMAC stall when connected to same switch Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 044/109] net: fix UDP tunnel GSO of frag_list GRO packets Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 045/109] ipv4: fix dst race in sk_dst_get() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 046/109] ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 047/109] net: fix sparse warning in sk_dst_set() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 048/109] vlan: free percpu stats in device destructor Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 049/109] bnx2x: fix possible panic under memory stress Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 050/109] tcp: Fix divide by zero when pushing during tcp-repair Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 051/109] ipv4: icmp: Fix pMTU handling for rare case Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 054/109] net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 055/109] igmp: fix the problem when mc leave group Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 056/109] tcp: fix false undo corner cases Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 057/109] appletalk: Fix socket referencing in skb Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 058/109] net: mvneta: fix operation in 10 Mbit/s mode Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 059/109] net: mvneta: Fix big endian issue in mvneta_txq_desc_csum() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 060/109] netlink: Fix handling of error from netlink_dump() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 061/109] be2net: set EQ DB clear-intr bit in be_open() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 062/109] r8152: fix r8152_csum_workaround function Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 063/109] tipc: clear next-pointer of message fragments before reassembly Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 064/109] net: sctp: fix information leaks in ulpevent layer Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 065/109] net: pppoe: use correct channel MTU when using Multilink PPP Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 066/109] bonding: fix ad_select module param check Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 067/109] net-gre-gro: Fix a bug that breaks the forwarding path Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 068/109] sunvnet: clean up objects created in vnet_new() on vnet_exit() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 069/109] net: ppp: fix creating PPP pass and active filters Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 071/109] net: ppp: dont call sk_chk_filter twice Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 073/109] dns_resolver: Null-terminate the right string Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 074/109] ipv4: fix buffer overflow in ip_options_compile() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 075/109] xen-netback: Fix handling frag_list on grant op error path Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 076/109] xen-netback: Fix releasing frag_list skbs in " Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 077/109] xen-netback: Fix releasing header slot on " Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 078/109] xen-netback: Fix pointer incrementation to avoid incorrect logging Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 079/109] perf: Do not allow optimized switch for non-cloned events Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 081/109] mwifiex: fix Tx timeout issue Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 082/109] ring-buffer: Fix polling on trace_pipe Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 083/109] irqchip: gic: Add support for cortex a7 compatible string Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 084/109] irqchip: gic: Add binding probe for ARM GIC400 Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 085/109] irqchip: gic: Fix core ID calculation when topology is read from DT Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 086/109] drm/radeon: set default bl level to something reasonable Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 087/109] drm/qxl: return IRQ_NONE if it was not our irq Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 088/109] drm/radeon: avoid leaking edid data Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 089/109] Revert "drm/i915: reverse dp link param selection, prefer fast over wide again" Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 090/109] alarmtimer: Fix bug where relative alarm timers were treated as absolute Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 091/109] hwrng: fetch randomness only after device init Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 092/109] x86, tsc: Fix cpufreq lockup Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 093/109] cpufreq: move policy kobj to policy->cpu at resume Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 094/109] random: check for increase of entropy_count because of signed conversion Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 095/109] mtd: devices: elm: fix elm_context_save() and elm_context_restore() functions Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 096/109] dm thin metadata: do not allow the data block size to change Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.15 097/109] dm cache " Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 098/109] RDMA/cxgb4: Initialize the device status page Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 099/109] PM / sleep: Fix request_firmware() error at resume Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 100/109] locking/mutex: Disable optimistic spinning on some architectures Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 101/109] sched: Fix possible divide by zero in avg_atom() calculation Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 103/109] IB/mlx5: Enable "block multicast loopback" for kernel consumers Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 104/109] aio: protect reqs_available updates from changes in interrupt handlers Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 105/109] gpio: dwapb: drop irq_setup_generic_chip() Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 106/109] ARM: dts: imx: Add alias for ethernet controller Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 107/109] iwlwifi: mvm: disable CTS to Self Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 108/109] Dont trigger congestion wait on dirty-but-not-writeout pages Greg Kroah-Hartman
2014-07-26 19:03 ` [PATCH 3.15 109/109] ARC: Implement ptrace(PTRACE_GET_THREAD_AREA) Greg Kroah-Hartman
2014-07-27 7:04 ` [PATCH 3.15 000/109] 3.15.7-stable review Satoru Takeuchi
2014-07-27 14:51 ` Greg Kroah-Hartman
2014-07-27 15:01 ` Guenter Roeck
2014-07-27 15:09 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140726190224.764325109@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=koct9i@gmail.com \
--cc=lczerner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sasha.levin@oracle.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox