linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michel Lespinasse <walken@google.com>,
	Lukas Czerner <lczerner@redhat.com>,
	Dave Jones <davej@redhat.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] shmem: fix splicing from a hole while it's punched
Date: Fri, 25 Jul 2014 16:33:44 +0200	[thread overview]
Message-ID: <20140725143344.GC4844@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1407150331170.2584@eggly.anvils>

On Tue 15-07-14 03:33:02, Hugh Dickins wrote:
> shmem_fault() is the actual culprit in trinity's hole-punch starvation,
> and the most significant cause of such problems: since a page faulted is
> one that then appears page_mapped(), needing unmap_mapping_range() and
> i_mmap_mutex to be unmapped again.
> 
> But it is not the only way in which a page can be brought into a hole in
> the radix_tree while that hole is being punched; and Vlastimil's testing
> implies that if enough other processors are busy filling in the hole,
> then shmem_undo_range() can be kept from completing indefinitely.
> 
> shmem_file_splice_read() is the main other user of SGP_CACHE, which
> can instantiate shmem pagecache pages in the read-only case (without
> holding i_mutex, so perhaps concurrently with a hole-punch).  Probably
> it's silly not to use SGP_READ already (using the ZERO_PAGE for holes):
> which ought to be safe, but might bring surprises - not a change to be
> rushed.
> 
> shmem_read_mapping_page_gfp() is an internal interface used by
> drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
> shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
> called internally by the kernel (perhaps for a stacking filesystem,
> which might rely on holes to be reserved): it's unclear whether it
> could be provoked to keep hole-punch busy or not.
> 
> We could apply the same umbrella as now used in shmem_fault() to
> shmem_file_splice_read() and the others; but it looks ugly, and use
> over a range raises questions - should it actually be per page?  can
> these get starved themselves?
> 
> The origin of this part of the problem is my v3.1 commit d0823576bf4b
> ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
> into shmem.c.  It seemed like a nice idea at the time, to ensure
> (barring RCU lookup fuzziness) that there's an instant when the entire
> hole is empty; but the indefinitely repeated scans to ensure that make
> it vulnerable.
> 
> Revert that "enhancement" to hole-punch from shmem_undo_range(), but
> retain the unproblematic rescanning when it's truncating; add a couple
> of comments there.
> 
> Remove the "indices[0] >= end" test: that is now handled satisfactorily
> by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
> to be worth avoiding here.
> 
> But if we do not always loop indefinitely, we do need to handle the case
> of swap swizzled back to page before shmem_free_swap() gets it: add a
> retry for that case, as suggested by Konstantin Khlebnikov; and for the
> case of page swizzled back to swap, as suggested by Johannes Weiner.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Konstantin Khlebnikov <koct9i@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Cc: Dave Jones <davej@redhat.com>
> Cc: <stable@vger.kernel.org>	[3.1+]

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
> Please replace mmotm's
> shmem-fix-faulting-into-a-hole-while-its-punched-take-2.patch
> by this patch: which is substantially the same as that, but with
> updated commit comment, and a page retry as indicated by Hannes.
> 
>  mm/shmem.c |   24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> --- 3.16-rc5+/mm/shmem.c	2014-07-14 20:34:28.196153828 -0700
> +++ 3.16-rc5++/mm/shmem.c	2014-07-14 20:35:14.156154916 -0700
> @@ -468,23 +468,20 @@ static void shmem_undo_range(struct inod
>  		return;
>  
>  	index = start;
> -	for ( ; ; ) {
> +	while (index < end) {
>  		cond_resched();
>  
>  		pvec.nr = find_get_entries(mapping, index,
>  				min(end - index, (pgoff_t)PAGEVEC_SIZE),
>  				pvec.pages, indices);
>  		if (!pvec.nr) {
> -			if (index == start || unfalloc)
> +			/* If all gone or hole-punch or unfalloc, we're done */
> +			if (index == start || end != -1)
>  				break;
> +			/* But if truncating, restart to make sure all gone */
>  			index = start;
>  			continue;
>  		}
> -		if ((index == start || unfalloc) && indices[0] >= end) {
> -			pagevec_remove_exceptionals(&pvec);
> -			pagevec_release(&pvec);
> -			break;
> -		}
>  		mem_cgroup_uncharge_start();
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
>  			struct page *page = pvec.pages[i];
> @@ -496,8 +493,12 @@ static void shmem_undo_range(struct inod
>  			if (radix_tree_exceptional_entry(page)) {
>  				if (unfalloc)
>  					continue;
> -				nr_swaps_freed += !shmem_free_swap(mapping,
> -								index, page);
> +				if (shmem_free_swap(mapping, index, page)) {
> +					/* Swap was replaced by page: retry */
> +					index--;
> +					break;
> +				}
> +				nr_swaps_freed++;
>  				continue;
>  			}
>  
> @@ -506,6 +507,11 @@ static void shmem_undo_range(struct inod
>  				if (page->mapping == mapping) {
>  					VM_BUG_ON_PAGE(PageWriteback(page), page);
>  					truncate_inode_page(mapping, page);
> +				} else {
> +					/* Page was replaced by swap: retry */
> +					unlock_page(page);
> +					index--;
> +					break;
>  				}
>  			}
>  			unlock_page(page);
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-07-25 14:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-15 10:28 [PATCH 0/2] shmem: fix faulting into a hole while it's punched, take 3 Hugh Dickins
2014-07-15 10:31 ` [PATCH 1/2] shmem: fix faulting into a hole, not taking i_mutex Hugh Dickins
2014-07-15 16:07   ` Vlastimil Babka
2014-07-15 19:26     ` Hugh Dickins
2014-07-16  7:18       ` Vlastimil Babka
2014-07-25 14:25   ` Michal Hocko
2014-07-15 10:33 ` [PATCH 2/2] shmem: fix splicing from a hole while it's punched Hugh Dickins
2014-07-25 14:33   ` Michal Hocko [this message]
2014-07-17 16:10 ` [PATCH 0/2] shmem: fix faulting into a hole while it's punched, take 3 Vlastimil Babka
2014-07-17 16:12   ` Sasha Levin
2014-07-18 10:44     ` Sasha Levin
2014-07-19 23:44       ` Hugh Dickins
2014-07-22  3:24         ` Sasha Levin
2014-07-22  8:07           ` Hugh Dickins
2014-07-22 10:06             ` Vlastimil Babka
2014-07-22 12:09               ` Vlastimil Babka
2014-07-22 18:42                 ` Hugh Dickins
2014-07-22 23:19             ` Sasha Levin
2014-07-22 23:58               ` Hugh Dickins
2014-07-17 23:34   ` Hugh Dickins
2014-07-18  8:05     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140725143344.GC4844@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=koct9i@gmail.com \
    --cc=lczerner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sasha.levin@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).