From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
Sasha Levin <sasha.levin@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Konstantin Khlebnikov <koct9i@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Lukas Czerner <lczerner@redhat.com>,
Dave Jones <davej@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 3.4 04/23] shmem: fix splicing from a hole while its punched
Date: Sat, 26 Jul 2014 12:02:04 -0700 [thread overview]
Message-ID: <20140726190149.855623777@linuxfoundation.org> (raw)
In-Reply-To: <20140726190149.723570428@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins <hughd@google.com>
commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream.
shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.
But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.
shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.
shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.
We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?
The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.
Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.
Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.
But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/shmem.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -499,22 +499,19 @@ void shmem_truncate_range(struct inode *
}
index = start;
- for ( ; ; ) {
+ while (index <= end) {
cond_resched();
pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
pvec.pages, indices);
if (!pvec.nr) {
- if (index == start)
+ /* If all gone or hole-punch, we're done */
+ if (index == start || end != -1)
break;
+ /* But if truncating, restart to make sure all gone */
index = start;
continue;
}
- if (index == start && indices[0] > end) {
- shmem_deswap_pagevec(&pvec);
- pagevec_release(&pvec);
- break;
- }
mem_cgroup_uncharge_start();
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
@@ -524,8 +521,12 @@ void shmem_truncate_range(struct inode *
break;
if (radix_tree_exceptional_entry(page)) {
- nr_swaps_freed += !shmem_free_swap(mapping,
- index, page);
+ if (shmem_free_swap(mapping, index, page)) {
+ /* Swap was replaced by page: retry */
+ index--;
+ break;
+ }
+ nr_swaps_freed++;
continue;
}
@@ -533,6 +534,11 @@ void shmem_truncate_range(struct inode *
if (page->mapping == mapping) {
VM_BUG_ON(PageWriteback(page));
truncate_inode_page(mapping, page);
+ } else {
+ /* Page was replaced by swap: retry */
+ unlock_page(page);
+ index--;
+ break;
}
unlock_page(page);
}
next prev parent reply other threads:[~2014-07-26 20:19 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-26 19:02 [PATCH 3.4 00/23] 3.4.100-stable review Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 01/23] crypto: testmgr - update LZO compression test vectors Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 02/23] shmem: fix faulting into a hole while its punched Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 03/23] shmem: fix faulting into a hole, not taking i_mutex Greg Kroah-Hartman
2014-07-26 19:02 ` Greg Kroah-Hartman [this message]
2014-07-26 19:02 ` [PATCH 3.4 05/23] tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 06/23] 8021q: fix a potential memory leak Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 07/23] igmp: fix the problem when mc leave group Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 08/23] tcp: fix false undo corner cases Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 09/23] appletalk: Fix socket referencing in skb Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 10/23] be2net: set EQ DB clear-intr bit in be_open() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 11/23] tipc: clear next-pointer of message fragments before reassembly Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 12/23] net: sctp: fix information leaks in ulpevent layer Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 13/23] net: pppoe: use correct channel MTU when using Multilink PPP Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 14/23] sunvnet: clean up objects created in vnet_new() on vnet_exit() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 16/23] dns_resolver: Null-terminate the right string Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 17/23] ipv4: fix buffer overflow in ip_options_compile() Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 19/23] mwifiex: fix Tx timeout issue Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 20/23] drm/radeon: avoid leaking edid data Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 21/23] alarmtimer: Fix bug where relative alarm timers were treated as absolute Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 22/23] PM / sleep: Fix request_firmware() error at resume Greg Kroah-Hartman
2014-07-26 19:02 ` [PATCH 3.4 23/23] iommu/vt-d: Disable translation if already enabled Greg Kroah-Hartman
2014-07-27 14:58 ` [PATCH 3.4 00/23] 3.4.100-stable review Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140726190149.855623777@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=koct9i@gmail.com \
--cc=lczerner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sasha.levin@oracle.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox