public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hugh Dickins <hughd@google.com>, Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: mm: shm: hang in shmem_fallocate
Date: Tue, 24 Jun 2014 18:31:20 +0200	[thread overview]
Message-ID: <53A9A7D8.2020703@suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1406151921070.2850@eggly.anvils>

On 06/16/2014 04:29 AM, Hugh Dickins wrote:
> On Thu, 12 Jun 2014, Sasha Levin wrote:
>> On 02/09/2014 08:41 PM, Sasha Levin wrote:
>>> On 02/08/2014 10:25 PM, Hugh Dickins wrote:
>>>> Would trinity be likely to have a thread or process repeatedly faulting
>>>> in pages from the hole while it is being punched?
>>>
>>> I can see how trinity would do that, but just to be certain - Cc davej.
>>>
>>> On 02/08/2014 10:25 PM, Hugh Dickins wrote:
>>>> Does this happen with other holepunch filesystems?  If it does not,
>>>> I'd suppose it's because the tmpfs fault-in-newly-created-page path
>>>> is lighter than a consistent disk-based filesystem's has to be.
>>>> But we don't want to make the tmpfs path heavier to match them.
>>>
>>> No, this is strictly limited to tmpfs, and AFAIK trinity tests hole
>>> punching in other filesystems and I make sure to get a bunch of those
>>> mounted before starting testing.
>>
>> Just pinging this one again. I still see hangs in -next where the hang
>> location looks same as before:
>>
> 
> Please give this patch a try.  It fixes what I can reproduce, but given
> your unexplained page_mapped() BUG in this area, we know there's more
> yet to be understood, so perhaps this patch won't do enough for you.
> 

Hi,

since this got a CVE, I've been looking at backport to an older kernel where
fallocate(FALLOC_FL_PUNCH_HOLE) is not yet supported, and there's also no
range notification mechanism yet. There's just madvise(MADV_REMOVE) and since
it doesn't guarantee anything, it seems simpler just to give up retrying to
truncate really everything. Then I realized that maybe it would work for
current kernel as well, without having to add any checks in the page fault
path. The semantics of fallocate(FALLOC_FL_PUNCH_HOLE) might look different
from madvise(MADV_REMOVE), but it seems to me that as long as it does discard
the old data from the range, it's fine from any information leak point of view.
If someone races page faulting, it IMHO doesn't matter if he gets a new zeroed
page before the parallel truncate has ended, or right after it has ended.
So I'm posting it here as a RFC. I haven't thought about the
i915_gem_object_truncate caller yet. I think that this path wouldn't satisfy
the new "lstart < inode->i_size" condition, but I don't know if it's "vulnerable"
to the problem.

-----8<-----
From: Vlastimil Babka <vbabka@suse.cz>
Subject: [RFC PATCH] shmem: prevent livelock between page fault and hole punching

---
 mm/shmem.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index f484c27..6d6005c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -476,6 +476,25 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		if (!pvec.nr) {
 			if (index == start || unfalloc)
 				break;
+                        /* 
+                         * When this condition is true, it means we were
+                         * called from fallocate(FALLOC_FL_PUNCH_HOLE).
+                         * To prevent a livelock when someone else is faulting
+                         * pages back, we are content with single pass and do
+                         * not retry with index = start. It's important that
+                         * previous page content has been discarded, and
+                         * faulter(s) got new zeroed pages.
+                         *
+                         * The other callsites are shmem_setattr (for
+                         * truncation) and shmem_evict_inode, which set i_size
+                         * to truncated size or 0, respectively, and then call
+                         * us with lstart == inode->i_size. There we do want to
+                         * retry, and livelock cannot happen for other reasons.
+                         *
+                         * XXX what about i915_gem_object_truncate?
+                         */
+                        if (lstart < inode->i_size)
+                                break;
 			index = start;
 			continue;
 		}
-- 
1.8.4.5





  parent reply	other threads:[~2014-06-24 16:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-16  4:01 mm: shm: hang in shmem_fallocate Sasha Levin
2014-02-08 19:46 ` Sasha Levin
2014-02-09  3:25   ` Hugh Dickins
2014-02-10  1:41     ` Sasha Levin
2014-06-12 20:38       ` Sasha Levin
2014-06-16  2:29         ` Hugh Dickins
2014-06-17 20:32           ` Sasha Levin
2014-06-24 16:31           ` Vlastimil Babka [this message]
2014-06-25 22:36             ` Hugh Dickins
2014-06-26  9:14               ` Vlastimil Babka
2014-06-26 15:19                 ` Vlastimil Babka
2014-06-27  5:36                 ` Hugh Dickins
2014-07-01 11:52                   ` Vlastimil Babka
2014-07-02  1:49                     ` Hugh Dickins
2014-07-09 21:59                   ` Johannes Weiner
2014-07-09 22:48                     ` Hugh Dickins
2014-07-10  0:51                       ` Hugh Dickins
2014-06-26 15:11               ` Sasha Levin
2014-06-27  5:59                 ` Hugh Dickins
2014-06-27 14:50                   ` Sasha Levin
2014-06-27 18:03                     ` Hugh Dickins
2014-06-28 21:41                       ` Sasha Levin
2014-07-01 22:37                       ` mm: shmem: hang in shmem_fault (WAS: mm: shm: hang in shmem_fallocate) Sasha Levin
2014-07-02  0:17                         ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A9A7D8.2020703@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox