From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9186E1B21AD
	for <linux-s390@vger.kernel.org>; Thu, 13 Feb 2025 10:02:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1739440978; cv=none; b=KRgr6wI97frWd0j8o1PkM36UTaJT/e5h+hqKDVVxGC7IlrZHqiob3Fu5ElxxZyQNK7GjDoHRmahuKiG8P+W1msx5Z2Mqi8rgorpHoo4MxDxk8VCoL1tkPoPY6W/wgN3Z1OYy+JChTlCfPQw+vZjNf1srcQHpvyXBVlCz8LWfYFk=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1739440978; c=relaxed/simple;
	bh=tX7AozPP6E5+MLf8cJoQS9mb5qbZJa5QwLF58GQEG44=;
	h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References:
	 In-Reply-To:Content-Type; b=PT/bC0Th2Oa8rygADOo3W1mNblQunLySVTszmNOYU+JS/5IOAuxFIpAJm6RMxRDn72D1EcZCQgWe4bDQBsd9RyCh2hYmOUxZRPaGaNknjPYI/Ln5qfPuBvP5u2xCE//MikOa4w0aOjv/GaVT55MIsQPrARrY9mL0PNAPHQTVN2c=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eBZ8fW0v; arc=none smtp.client-ip=170.10.133.124
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eBZ8fW0v"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1739440975;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:autocrypt:autocrypt;
	bh=3V20fc6va6aAlUHwEWCfoyPaQHzn8E4DTrNUGz56JdM=;
	b=eBZ8fW0voQLpvj0RqRFobObZazSu4uwiD/UocdcMnAFh1c7nUrpxUtDSiGM0IulCOV6weW
	K7u1vOHQQfs3jXkMsXHEcMpnDGByIaQbPjbevysNjlF3wyI3rRQ80xtW7yWLefMlL3mt4D
	HChG3wIjtuNZJiuHFU2yRkGoqMpvkGo=
Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com
 [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-368-Ccf_59VvOgipsLCZNdNHrA-1; Thu, 13 Feb 2025 05:02:54 -0500
X-MC-Unique: Ccf_59VvOgipsLCZNdNHrA-1
X-Mimecast-MFC-AGG-ID: Ccf_59VvOgipsLCZNdNHrA
Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-43933b8d9b1so3536935e9.3
        for <linux-s390@vger.kernel.org>; Thu, 13 Feb 2025 02:02:53 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1739440973; x=1740045773;
        h=content-transfer-encoding:in-reply-to:organization:autocrypt
         :content-language:references:cc:to:from:subject:user-agent
         :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=3V20fc6va6aAlUHwEWCfoyPaQHzn8E4DTrNUGz56JdM=;
        b=oB8m2RnDXYEvVzBemcljj6iXNDjfbRPRzrQHwNdwqnCfKWlQQZy3NiFJige7yUG511
         kPQacUHc92CkbGZ36vmPvh7HzPG37d/I0cDC08CRNj6un1xqndmgiK6gO6QR2oQhxZf5
         BkKcAd9Vk3vblg/Mc0V0H2si11VKqMkrq8i8z0ch9bKdkw6GA4tIYxK8rMmBfmqP6NB2
         EZhKJHx6Q/TqhCeYgMleXsqSflZ/3eStmSFjrUj7nuXoHKfuyREON3JKvAqLF+DxJdSt
         zUIuH9pdJwLCTmap6tbMVF3T/FDyjp9jlBGH5LYqQ6EbdqOlgoIJaBiZOi/lWKtqeFIP
         VrKA==
X-Forwarded-Encrypted: i=1; AJvYcCXYWq4cAoLq9D1MeTdr+wT1MdXkpCs9WphBqCqAj24oyDu/ETViV/B6fbwe4kN6LoDkI74ZnTJ8Jqld@vger.kernel.org
X-Gm-Message-State: AOJu0YzOK79BTP2G/sJwpZ5mn2p/a8UOq8d9P7yIN6EHP3hyPjD303uR
	fnj/Gw4oWn+y7rlZQ9GciNBcX/C4SpgaleLSJ/Lw9bH/yawwKh7Kp15ki8ekaNI/kYe3xoENRNy
	hO8YElUifKB44rUowqFxJLXEhshfZbCNtWrV9hmmLHLgNh8vyTE6+/75Zu9A=
X-Gm-Gg: ASbGncsAtFzLr3eVq9/pRT6/unGwgdlRf8QTkRsDEPNN0ceo6zzuSXXiZ+pSydXTx0V
	5IJVwVAfKTlpZ5+E9ZxFIuWxBY2vwPjsci580goyr7y0tH2kOc0axozUiZ9Q2PgSS+VWczwzO0f
	zyK7VWSQ2TC6XAd7st+b3AuJ6PBYN2zobrBEp4Met5yFC1dllFWgqVi6Ami7bJb+fh2eRR7WqoM
	Hlq21KbXlEgFBm7JGuGwrHsbgd94vYLBUOmJqdb1aNzkHrqUPF84VAsxeTRzFvwGgSplfp2sPhk
	2TpaFPGCW6POCcWNCWYhA+C+m/A8lQz+wOc5nw3HXbI9sAByIPf83oB39RGOLBfDUAtg1MbUqt5
	VXRNCbRmXkzNC4LBtspTGlyXsNsVbTw==
X-Received: by 2002:a05:600c:444f:b0:434:feb1:adcf with SMTP id 5b1f17b1804b1-439601ae212mr25394245e9.25.1739440972678;
        Thu, 13 Feb 2025 02:02:52 -0800 (PST)
X-Google-Smtp-Source: AGHT+IELEHASa1QH5FtD/ayjVuaAGldYnBlLGmpEmax0pKtSHomSJebxcqI8HhgHajVqv2/9dpsvjw==
X-Received: by 2002:a05:600c:444f:b0:434:feb1:adcf with SMTP id 5b1f17b1804b1-439601ae212mr25393615e9.25.1739440972064;
        Thu, 13 Feb 2025 02:02:52 -0800 (PST)
Received: from ?IPV6:2003:cb:c718:100:347d:db94:161d:398f? (p200300cbc7180100347ddb94161d398f.dip0.t-ipconnect.de. [2003:cb:c718:100:347d:db94:161d:398f])
        by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43961884251sm12557855e9.31.2025.02.13.02.02.49
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 13 Feb 2025 02:02:50 -0800 (PST)
Message-ID: <7c0b5675-a070-4248-bd29-5c27d07a4c5b@redhat.com>
Date: Thu, 13 Feb 2025 11:02:49 +0100
Precedence: bulk
X-Mailing-List: linux-s390@vger.kernel.org
List-Id: <linux-s390.vger.kernel.org>
List-Subscribe: <mailto:linux-s390+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-s390+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [GIT PULL v2 09/20] KVM: s390: move pv gmap functions into kvm
From: David Hildenbrand <david@redhat.com>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org, linux-s390@vger.kernel.org,
 frankja@linux.ibm.com, borntraeger@de.ibm.com
References: <20250131112510.48531-1-imbrenda@linux.ibm.com>
 <20250131112510.48531-10-imbrenda@linux.ibm.com>
 <d5ef124a-d353-4074-925e-a2721be3ce5d@redhat.com>
 <20250212184538.3c79d608@p-imbrenda>
 <f9a6c330-2721-40ed-a8f4-95192e8312a8@redhat.com>
Content-Language: en-US
Autocrypt: addr=david@redhat.com; keydata=
 xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ
 dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL
 QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp
 XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK
 Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9
 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt
 WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc
 UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv
 jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb
 B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk
 ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW
 AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q
 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp
 rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf
 wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4
 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l
 pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd
 KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE
 BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs
 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF
 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9
 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz
 Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb
 T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A
 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk
 CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G
 NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75
 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx
 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS
 lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv
 AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa
 N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3
 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB
 boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq
 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f
 XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ
 a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq
 Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6
 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8
 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E
 th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr
 jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt
 WNyWQQ==
Organization: Red Hat
In-Reply-To: <f9a6c330-2721-40ed-a8f4-95192e8312a8@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 12.02.25 19:14, David Hildenbrand wrote:
> On 12.02.25 18:45, Claudio Imbrenda wrote:
>> On Wed, 12 Feb 2025 17:55:18 +0100
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 31.01.25 12:24, Claudio Imbrenda wrote:
>>>> Move gmap related functions from kernel/uv into kvm.
>>>>
>>>> Create a new file to collect gmap-related functions.
>>>>
>>>> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
>>>> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
>>>> [fixed unpack_one(), thanks mhartmay@linux.ibm.com]
>>>> Link: https://lore.kernel.org/r/20250123144627.312456-6-imbrenda@linux.ibm.com
>>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>>> Message-ID: <20250123144627.312456-6-imbrenda@linux.ibm.com>
>>>> ---
>>>
>>> This patch breaks large folio splitting because you end up un-refing
>>> the wrong folios after a split; I tried to make it work, but either
>>> because of other changes in this patch (or in others), I
>>> cannot get it to work and have to give up for today.
>>
>> yes, I had also noticed that and I already have a fix ready. In fact my
>> fix was exactly like yours, except that I did not pass the struct folio
>> anymore to kvm_s390_wiggle_split_folio(), but instead I only pass a
>> page and use page_folio() at the beginning, and I use
>> split_huge_page_to_list_to_order() directly instead of split_folio()
>>    
>> unfortunately the fix does not fix the issue I'm seeing....
>>
>> but putting printks everywhere seems to solve the issue, so it seems to
>> be a race somewhere
> 
> It also doesn't work with a single vCPU for me. The VM is stuck in
> 
> With a two vCPUs (so one can report the lockup), I get:
> 
> [   62.645168] rcu: INFO: rcu_sched self-detected stall on CPU
> [   62.645181] rcu:     0-....: (5999 ticks this GP) idle=0104/1/0x4000000000000002 softirq=2/2 fqs=2997
> [   62.645186] rcu:     (t=6000 jiffies g=-1199 q=62 ncpus=2)
> [   62.645191] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-427.33.1.el9_4.s390x #1
> [   62.645194] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> [   62.645195] Krnl PSW : 0704c00180000000 0000000024b3e776 (set_memory_decrypted+0x66/0xa0)
> [   62.645206]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   62.645208] Krnl GPRS: 00000000ca004000 0000037f00000001 000000008092f000 0000000000000000
> [   62.645210]            0000037fffb1bbc0 0000000000000001 0000000025e75208 000000008092f000
> [   62.645211]            0000000080873808 0000037fffb1bcd8 0000000000001000 0000000025e75220
> [   62.645213]            0000000080281500 00000000258aa480 0000000024c0b17a 0000037fffb1bb20
> [   62.645220] Krnl Code: 0000000024b3e76a: a784000f            brc     8,0000000024b3e788
> [   62.645220]            0000000024b3e76e: a7210fff            tmll    %r2,4095
> [   62.645220]           #0000000024b3e772: a7740017            brc     7,0000000024b3e7a0
> [   62.645220]           >0000000024b3e776: b9a40034            uvc     %r3,%r4,0
> [   62.645220]            0000000024b3e77a: b2220010            ipm     %r1
> [   62.645220]            0000000024b3e77e: 8810001c            srl     %r1,28
> [   62.645220]            0000000024b3e782: ec12fffa017e        cij     %r1,1,2,0000000024b3e776
> [   62.645220]            0000000024b3e788: a72b1000            aghi    %r2,4096
> [   62.645232] Call Trace:
> [   62.645234]  [<0000000024b3e776>] set_memory_decrypted+0x66/0xa0
> [   62.645238]  [<0000000024c0b17a>] dma_direct_alloc+0x16a/0x2d0
> [   62.645242]  [<0000000024c09b92>] dma_alloc_attrs+0x62/0x80
> [   62.645243]  [<000000002546c950>] cio_gp_dma_create+0x60/0xa0
> [   62.645248]  [<0000000025ebb712>] css_bus_init+0x102/0x1b8
> [   62.645252]  [<0000000025ebb7ea>] channel_subsystem_init+0x22/0xf8
> [   62.645254]  [<0000000024b149ac>] do_one_initcall+0x3c/0x200
> [   62.645256]  [<0000000025e777be>] do_initcalls+0x11e/0x148
> [   62.645260]  [<0000000025e77a34>] kernel_init_freeable+0x1cc/0x208
> [   62.645262]  [<00000000254ad01e>] kernel_init+0x2e/0x170
> [   62.645264]  [<0000000024b16fdc>] __ret_from_fork+0x3c/0x60
> [   62.645266]  [<00000000254bb07a>] ret_from_fork+0xa/0x40
> 

I can only suspect that it is related to the following: if we split a non-anon
folio, we unmap it from the page tables, and don't remap it again -- the next
fault will do that. Maybe, for some reason that behavior is incompatible with your changes.

I don't quit see how, because we should just trigger another fault to look up
the page in gmap_make_secure()->gfn_to_page() when we re-enter gmap_make_secure() after a split.

> 
> The removed PTE lock would only explain it if we would have a concurrent GUP etc.
> from QEMU I/O ? Not sure.
> 
> To fix the wrong refcount freezing, doing exactly what folio splitting does
> (migration PTEs, locking the pagecache etc., freezing->converting,
> removing migration ptes) should work, but requires a bit of work.

I played with the following abomination to see if I could fix the refcount freezing somehow.

It doesn't work, because the UVC keeps failing: I assume because it actually
needs the page to be mapped into that particular page table for the UVC to complete.


To fix refcount freezing with that (folio still mapped), we'd have to make sure that
folio_mapcount()==1 while we hold the PTL, and doing something similar to below,
except that the rmap/anon locking and unmap/remap handling would not apply. The
pagecache most likely would have to be locked to prevent new references from that while
we freeze the refcount.

In case we would have folio_mapcount() != 1 on an anon page, we would have to give up:
impossible if it is mapped writable -- so no problem.

In case we would have folio_mapcount() != 1 on a pagecache page, we would have to
force an unmap of the all page table mappings using e.g., try_to_unmap(), to then retry
again.

But the PTL seems unavoidable in that case to prevent concurrent GUP-slow etc, so we
can safely freeze the refcount.


 From c2555fc34801ca9ba49f93ee1249ecd25248377a Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Thu, 13 Feb 2025 09:49:54 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  arch/s390/kernel/uv.c | 139 +++++++++++++++++++++++++++++++++++-------
  include/linux/rmap.h  |  17 ++++++
  mm/internal.h         |  16 -----
  3 files changed, 133 insertions(+), 39 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 9f05df2da2f73..d6ea8951fa53b 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -15,6 +15,7 @@
  #include <linux/pagemap.h>
  #include <linux/swap.h>
  #include <linux/pagewalk.h>
+#include <linux/rmap.h>
  #include <asm/facility.h>
  #include <asm/sections.h>
  #include <asm/uv.h>
@@ -227,6 +228,45 @@ static int expected_folio_refs(struct folio *folio)
  	return res;
  }
  
+static void unmap_folio(struct folio *folio)
+{
+	enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SYNC |
+		TTU_BATCH_FLUSH;
+
+	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
+
+	if (folio_test_pmd_mappable(folio))
+		ttu_flags |= TTU_SPLIT_HUGE_PMD;
+
+	/*
+	 * Anon pages need migration entries to preserve them, but file
+	 * pages can simply be left unmapped, then faulted back on demand.
+	 * If that is ever changed (perhaps for mlock), update remap_page().
+	 */
+	if (folio_test_anon(folio))
+		try_to_migrate(folio, ttu_flags);
+	else
+		try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK);
+
+	try_to_unmap_flush();
+}
+
+static void remap_page(struct folio *folio, unsigned long nr, int flags)
+{
+	int i = 0;
+
+	/* If unmap_folio() uses try_to_migrate() on file, remove this check */
+	if (!folio_test_anon(folio))
+		return;
+	for (;;) {
+		remove_migration_ptes(folio, folio, RMP_LOCKED | flags);
+		i += folio_nr_pages(folio);
+		if (i >= nr)
+			break;
+		folio = folio_next(folio);
+	}
+}
+
  /**
   * make_folio_secure() - make a folio secure
   * @folio: the folio to make secure
@@ -247,35 +287,88 @@ static int expected_folio_refs(struct folio *folio)
   */
  int make_folio_secure(struct folio *folio, struct uv_cb_header *uvcb)
  {
-	int expected, cc = 0;
+	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
+	struct address_space *mapping = NULL;
+	struct anon_vma *anon_vma = NULL;
+	int ret, cc = 0;
+	int expected;
  
  	if (folio_test_large(folio))
  		return -E2BIG;
  	if (folio_test_writeback(folio))
  		return -EBUSY;
-	expected = expected_folio_refs(folio) + 1;
-	if (!folio_ref_freeze(folio, expected))
+
+	/* Does it make sense to try at all? */
+	if (folio_ref_count(folio) != expected_folio_refs(folio) + 1)
  		return -EBUSY;
-	set_bit(PG_arch_1, &folio->flags);
-	/*
-	 * If the UVC does not succeed or fail immediately, we don't want to
-	 * loop for long, or we might get stall notifications.
-	 * On the other hand, this is a complex scenario and we are holding a lot of
-	 * locks, so we can't easily sleep and reschedule. We try only once,
-	 * and if the UVC returned busy or partial completion, we return
-	 * -EAGAIN and we let the callers deal with it.
-	 */
-	cc = __uv_call(0, (u64)uvcb);
-	folio_ref_unfreeze(folio, expected);
-	/*
-	 * Return -ENXIO if the folio was not mapped, -EINVAL for other errors.
-	 * If busy or partially completed, return -EAGAIN.
-	 */
-	if (cc == UVC_CC_OK)
-		return 0;
-	else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
-		return -EAGAIN;
-	return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
+
+	/* See split_huge_page_to_list_to_order() on the nasty details. */
+	if (folio_test_anon(folio)) {
+		anon_vma = folio_get_anon_vma(folio);
+		if (!anon_vma)
+			return -EBUSY;
+		anon_vma_lock_write(anon_vma);
+	} else {
+		mapping = folio->mapping;
+		if (!mapping)
+			return -EBUSY;
+		/* Hmmm, do we need filemap_release_folio()? */
+		i_mmap_lock_read(mapping);
+	}
+
+	unmap_folio(folio);
+
+	local_irq_disable();
+	if (mapping) {
+		xas_lock(&xas);
+		xas_reset(&xas);
+		if (xas_load(&xas) != folio) {
+			ret = -EBUSY;
+			goto fail;
+		}
+	}
+
+	expected = expected_folio_refs(folio) + 1;
+	if (!folio_mapped(folio) &&
+	    folio_ref_freeze(folio, expected)) {
+		set_bit(PG_arch_1, &folio->flags);
+		/*
+		 * If the UVC does not succeed or fail immediately, we don't want to
+		 * loop for long, or we might get stall notifications.
+		 * On the other hand, this is a complex scenario and we are holding a lot of
+		 * locks, so we can't easily sleep and reschedule. We try only once,
+		 * and if the UVC returned busy or partial completion, we return
+		 * -EAGAIN and we let the callers deal with it.
+		 */
+		cc = __uv_call(0, (u64)uvcb);
+		folio_ref_unfreeze(folio, expected);
+		/*
+		 * Return -ENXIO if the folio was not mapped, -EINVAL for other errors.
+		 * If busy or partially completed, return -EAGAIN.
+		 */
+		if (cc == UVC_CC_OK)
+			ret = 0;
+		else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
+			ret = -EAGAIN;
+		else
+			ret = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
+	} else {
+		ret = -EBUSY;
+	}
+
+	if (mapping)
+		xas_unlock(&xas);
+fail:
+	local_irq_enable();
+	remap_page(folio, 1, 0);
+	if (anon_vma) {
+		anon_vma_unlock_write(anon_vma);
+		put_anon_vma(anon_vma);
+	}
+	if (mapping)
+		i_mmap_unlock_read(mapping);
+	xas_destroy(&xas);
+	return ret;
  }
  EXPORT_SYMBOL_GPL(make_folio_secure);
  
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 683a04088f3f2..2d241ab48bf08 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -663,6 +663,23 @@ int folio_referenced(struct folio *, int is_locked,
  void try_to_migrate(struct folio *folio, enum ttu_flags flags);
  void try_to_unmap(struct folio *, enum ttu_flags flags);
  
+#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
+void try_to_unmap_flush(void);
+void try_to_unmap_flush_dirty(void);
+void flush_tlb_batched_pending(struct mm_struct *mm);
+#else
+static inline void try_to_unmap_flush(void)
+{
+}
+static inline void try_to_unmap_flush_dirty(void)
+{
+}
+static inline void flush_tlb_batched_pending(struct mm_struct *mm)
+{
+}
+#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */
+
+
  int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,
  				unsigned long end, struct page **pages,
  				void *arg);
diff --git a/mm/internal.h b/mm/internal.h
index 109ef30fee11f..5338906163ca7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1202,22 +1202,6 @@ struct tlbflush_unmap_batch;
   */
  extern struct workqueue_struct *mm_percpu_wq;
  
-#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
-void try_to_unmap_flush(void);
-void try_to_unmap_flush_dirty(void);
-void flush_tlb_batched_pending(struct mm_struct *mm);
-#else
-static inline void try_to_unmap_flush(void)
-{
-}
-static inline void try_to_unmap_flush_dirty(void)
-{
-}
-static inline void flush_tlb_batched_pending(struct mm_struct *mm)
-{
-}
-#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */
-
  extern const struct trace_print_flags pageflag_names[];
  extern const struct trace_print_flags vmaflag_names[];
  extern const struct trace_print_flags gfpflag_names[];
-- 
2.48.1


-- 
Cheers,

David / dhildenb