From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF052C54798 for ; Tue, 27 Feb 2024 14:59:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2663D6B01F7; Tue, 27 Feb 2024 09:59:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F0E56B01F8; Tue, 27 Feb 2024 09:59:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01A1B6B01F9; Tue, 27 Feb 2024 09:59:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DE8D86B01F7 for ; Tue, 27 Feb 2024 09:59:46 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7F3C0A0C0D for ; Tue, 27 Feb 2024 14:59:46 +0000 (UTC) X-FDA: 81837893172.24.DF3148B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id DE7E3140007 for ; Tue, 27 Feb 2024 14:59:43 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="YyR4kj/P"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709045984; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0mkEdjfzddad8xEkEZypdVNSiWCr+WxVca8Ei5UPo9o=; b=t0GxNuX52hf04Xo/g25UpK3hc8b0ks8QeHE3zcOM6ANSy3Bux4pMBt3yYcrflwO0DPoOp4 YSgXOSPf+OSBU6bTU2hkK+erOyZMCZjGiPBokh6m9eUAjCJV0wrlsa7eZym5pxl+ZVhK5A B5MRHq0DNuu5/JznkOrc2OtHwepWrRg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="YyR4kj/P"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709045984; a=rsa-sha256; cv=none; b=qLtpjh2CxGXj0YWkzvRvO4cOF/c6vEIzryckHeOtJ5iLd8Jwl+8ONLuGaWGAK99Y6PWZAq CzvIcjZmsaqDzeQ9wM/jNpDFJAdmGqjtezhdGCsh/9AEV2k5EdtSOoWT+xq6mRPECDic46 G/Igii/usXL6UDYdFKFHIJEg/JtP8dA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709045983; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=0mkEdjfzddad8xEkEZypdVNSiWCr+WxVca8Ei5UPo9o=; b=YyR4kj/P9lgkAQZj6hLUNLbA/5/I2afCZyw/wQHfr1Tc/O/cYalJIQnqT2gK73y3tvraW4 bXgQi/wkiOPxNt2EVuFKf3hWTARKb3cXZ2/apbd7rt7tzap9a102G06J4/XZe9mmix8HjJ V6btNKl8zpD81wIvN0/RZ8VstzfWzv8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-257-BYo9XLO2O0uuALhiNDvGfw-1; Tue, 27 Feb 2024 09:59:41 -0500 X-MC-Unique: BYo9XLO2O0uuALhiNDvGfw-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-33d0d313b81so2399611f8f.3 for ; Tue, 27 Feb 2024 06:59:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709045980; x=1709650780; h=content-transfer-encoding:in-reply-to:organization:autocrypt:from :references:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0mkEdjfzddad8xEkEZypdVNSiWCr+WxVca8Ei5UPo9o=; b=JBLmXrHQAdHqqHUwbZgHzfKcqUKyQvuoAMtUqihnm+dhzKThIyXFwS8+rkU+GxVWT8 VJXWORaRAx/cC1l44RbDuwk6LxZnzs4i4muMVlq4WZeEAyPm4PdUSJloY/GqAarJbuuh 9SsHzgZ0b5UtSZlv9WGYyTkk5ke3BIA39+9ov3kOljBBqOif+4wZhB2PoiJktOJDn1qf pXxsfuC6MjO7YRKwCRxVROXerSas5Mk0VLDToykDSHm2BsoNDuf+jn9/wKWbJUmZLjcc xD/10Ah9BxZzDG53WD5CiJKicXkOe8ceIhfS4pKFBLn9+Niz4xBCErumfFXGL7IQxxeY gtLA== X-Forwarded-Encrypted: i=1; AJvYcCUKexgVGjX9L7yAu8MNAn+Ztn94OP5p/y9bwf7zH8GrXNKdBfogMqwNDF+zVSl60yrX3wi+KxfCaS13ItbbvMNVPB8= X-Gm-Message-State: AOJu0YyR8Kt7A3rTvaqfnT0WXQtTZqDi72+NPitUHT0OWdrH/+PoCHCE cAIC7yBvA5JEHQO3oGxR32LFWMTPJPiTUcJETsr0xKdZBcR1faAJhmpiQOxJ0KKOTpoutd4/rUd kTQ2TYO7HYpAcY7XQSe92xkzmqcKqj8rLnRbAZ6TXUVmoYtga X-Received: by 2002:a5d:6d85:0:b0:33d:e174:2232 with SMTP id l5-20020a5d6d85000000b0033de1742232mr4239768wrs.6.1709045980608; Tue, 27 Feb 2024 06:59:40 -0800 (PST) X-Google-Smtp-Source: AGHT+IHXKt/iu5/JyDE8hOZDKGMdvaigiSh38DTgHejbtH23fqlG5C9NB56kpO/tenntxgkUP7MHXQ== X-Received: by 2002:a5d:6d85:0:b0:33d:e174:2232 with SMTP id l5-20020a5d6d85000000b0033de1742232mr4239741wrs.6.1709045980202; Tue, 27 Feb 2024 06:59:40 -0800 (PST) Received: from ?IPV6:2003:cb:c707:7600:5c18:5a7d:c5b7:e7a9? (p200300cbc70776005c185a7dc5b7e7a9.dip0.t-ipconnect.de. [2003:cb:c707:7600:5c18:5a7d:c5b7:e7a9]) by smtp.gmail.com with ESMTPSA id ck12-20020a5d5e8c000000b0033d9f0dcb35sm11920501wrb.87.2024.02.27.06.59.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Feb 2024 06:59:39 -0800 (PST) Message-ID: <925f8f5d-c356-4c20-a6a5-dd7efde5ee86@redhat.com> Date: Tue, 27 Feb 2024 15:59:37 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: folio_mmapped To: Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, linux-mm@kvack.org References: <20240222161047.402609-1-tabba@google.com> <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com> <40a8fb34-868f-4e19-9f98-7516948fc740@redhat.com> <20240226105258596-0800.eberman@hu-eberman-lv.qualcomm.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <20240226105258596-0800.eberman@hu-eberman-lv.qualcomm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: DE7E3140007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: t9ph9h77co9yia5cfmhca3zn6z6bs9oc X-HE-Tag: 1709045983-209611 X-HE-Meta: U2FsdGVkX1/iddzFuOvL/N6g2GG/8lOR8fzdlCDkeCLpwZRycloNWu/xgO3H6hEsCWG//ymO3LPFpaL78sQZT+Nu8j1Rbuo4jXqEOi499/60SegvrIHRLBzzYR6delXmPFOPJr3VTyMVeJJHcKmRsiSAu2K6lTlHQGaWwKBwTn0S4qQwKQj/tIeVAvNrTSw/k7xGESMVMPAqiKy7Wfer96VfLwN9tYrrcWip/TvOzSYrTzB3y6F5MB8B748VO55PcyRIGaNknAfqybAoNqh2kS4gywn8K1SuawZiar0u8YmaBTWenaDgLB2G3zd4x00IbLW3FeblfM6CDiJ4g0oPtZdB0GO+hnsp3DCU2daVl3h0D0iO2hYq8kt0s+Jsoxvet2VCfcGRhdxYhOprqQhJ/+VPXE89DH9lctu2D4eSewjwK3eNuFC9UF8swjohI25t5Dik5z7m/fsegCLjngUpmlKPGKZUGqyYNA6n1GEG4C/+1RNjjcV/2uYPdbKrQWwU/WwK9BbDYQNuPsq9NKxLAF6V3sTQYTzLtVqL0eJmte1fssrevibSXQgTL1CHeDZnT3KJqlfsh2OR3jxcSqkjoUxQUFiGCGb4mt9uZTyKHeChqhHRx5b/esOXo8JB3Mxg8TOr5waKYOmuaiGIUbM1PkpM6CGpUdIMxMo3nIB1ehV7n4QsGWcBKWpxlfj6nt0fUEAVt4CjuIiJggulFW3ULxmE1zaY3ArHU6Og/DEwOSUk72tqS8Ig4tSjdBi2zcvPAMCtkEUKw2nD+x4H3lvdcKZEap09S/8TH+uJXFYXguZlXzapgVdZo2ns9l0IjIvBJfCuGJDJm49cco42iMhXlkoh3S9sllQb/lxqng4y6in/LHG9CwHXnO7yqCnTAib//nEHUuQgHjwnXw4ww/Zb0kPd+pp2gtDT5eP9J7oftGXZIxd1hIt/ZX4I2chx62Fsr1TMbDVhhFOB8emsjhA jnfQm4fl sbBT4TmCuHLTGpgt8KpOmlJyz1+dU7WQFqJsgI47ilTsdrE3XU8rsFfDwQ7Lj8jSQp1+ZnRFg6BLgjzz0Hdlkf/Tli0rPcvOSe2jktsaK31ySYRFkvy+XjEDIzlKMy3kGs6aeQokXaUo1gR2zHdBdKSoXauqcxOYb9BXIvJlMBr1U9viuvJIMq/kZLZVazUPikWtyOhK5V0eDQhFTgAP1GDHUqyhmnD0B2P/9UMSdJTfMLRgzvfgANXLk6CSQKx+NaD9SrhW0+VUAUZKV7lfvzOtbp/KqbAdJuvRq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > Ah, this was something I hadn't thought about. I think both Fuad and I > need to update our series to check the refcount rather than mapcount > (kvm_is_gmem_mapped for Fuad, gunyah_folio_lend_safe for me). An alternative might be !folio_mapped() && !folio_maybe_dma_pinned(). But checking for any unexpected references might be better (there are still some GUP users that don't use FOLL_PIN). At least concurrent migration/swapout (that temporarily unmaps a folio and can give you folio_mapped() "false negatives", which both take a temporary folio reference and hold the page lock) should not be a concern because guest_memfd doesn't support that yet. > >> >> Now, regarding the original question (disallow mapping the page), I see the >> following approaches: >> >> 1) SIGBUS during page fault. There are other cases that can trigger >> SIGBUS during page faults: hugetlb when we are out of free hugetlb >> pages, userfaultfd with UFFD_FEATURE_SIGBUS. >> >> -> Simple and should get the job done. >> >> 2) folio_mmapped() + preventing new mmaps covering that folio >> >> -> More complicated, requires an rmap walk on every conversion. >> >> 3) Disallow any mmaps of the file while any page is private >> >> -> Likely not what you want. >> >> >> Why was 1) abandoned? I looks a lot easier and harder to mess up. Why are >> you trying to avoid page faults? What's the use case? >> > > We were chatting whether we could do better than the SIGBUS approach. > SIGBUS/FAULT usually crashes userspace, so I was brainstorming ways to > return errors early. One difference between hugetlb and this usecase is > that running out of free hugetlb pages isn't something we could detect With hugetlb reservation one can try detecting it at mmap() time. But as reservations are not NUMA aware, it's not reliable. > at mmap time. In guest_memfd usecase, we should be able to detect when > SIGBUS becomes possible due to memory being lent to guest. > > I can't think of a reason why userspace would want/be able to resume > operation after trying to access a page that it shouldn't be allowed, so > SIGBUS is functional. The advantage of trying to avoid SIGBUS was > better/easier reporting to userspace. To me, it sounds conceptually easier and less error-prone to 1) Converting a page to private only if there are no unexpected references (no mappings, GUP pins, ...) 2) Disallowing mapping private pages and failing the page fault. 3) Handling that small race window only (page lock?) Instead of 1) Converting a page to private only if there are no unexpected references (no mappings, GUP pins, ...) and no VMAs covering it where we could fault it in later 2) Disallowing mmap when the range would contain any private page 3) Handling races between mmap and page conversion -- Cheers, David / dhildenb