From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4A24C48292 for ; Mon, 5 Feb 2024 21:54:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F30DA6B007D; Mon, 5 Feb 2024 16:54:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE13B6B007E; Mon, 5 Feb 2024 16:54:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA95E6B0080; Mon, 5 Feb 2024 16:54:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C9DAF6B007D for ; Mon, 5 Feb 2024 16:54:58 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 93ADA14016B for ; Mon, 5 Feb 2024 21:54:58 +0000 (UTC) X-FDA: 81759105876.20.F175217 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by imf27.hostedemail.com (Postfix) with ESMTP id BBD854001C for ; Mon, 5 Feb 2024 21:54:56 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wxgZEiQJ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=lokeshgidra@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707170096; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=htTuvVyAX4wJ7U4/j+fiU4mrRuOGdWA0fijTiSjZLM4=; b=73yyFoILB4obMpob2JEzyKAex0+rUq484Uqu6r7C1rZWH8jyH5Vjy/j9ChC3DtPrLhdiKt 52DaASTB6jeGnl9MN3aSBP4BMWwoK6rROU8IF8DVBpsqfyX1qv7wCnevt9Qx2pNvwHx24p gkGi+liuZuZd96mex/jolcxZTNoBddc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wxgZEiQJ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of lokeshgidra@google.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=lokeshgidra@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707170096; a=rsa-sha256; cv=none; b=rcjmMYZvUpmf9jqOLgLBv53+zdfrRHgPZVoABeAc7++5L4dLbAAvIF1vEkL4wg1sdxocb3 x5BaYERI1Q0Cbq5rkRTkVcQM+2HoVODEnHQrPN6xnXyyQQRMoF3UNdUIIMK2tgOdfQp8lT y73OYVYsNewInCU9+BkyuyQO8VoyWgI= Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-33aea66a31cso3069502f8f.1 for ; Mon, 05 Feb 2024 13:54:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707170095; x=1707774895; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=htTuvVyAX4wJ7U4/j+fiU4mrRuOGdWA0fijTiSjZLM4=; b=wxgZEiQJne2KmB17t/2VMoGA5AD6To5FiLgHPGSuSUKprdfiwv0N5vy7qZcO2fbGax /yMNyOiuWdaIOzNO445eduZ3RXUSKq42ZHSsK2nDncqtUqFMKg8KKwSX396FEiDyTJSQ 9GYBlf6hgKk541Xfl+KeRyWg04jslqcxeowPsWWvStw09vp5N5b4hU8cJes2/6LJ6qkF S+U+5OPkc6kKCHh7KDHne9Ju2uJxBtNBE97gJE/6f+S2BC8S2fElRdcWN7ednlBF9Ske smOVy/8n0mi6+l5i3qv2HmBTWK9oL9YkzwV03gQdfoie6Q6rYqO58UUiysyWXgjjvoMk 3WRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707170095; x=1707774895; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=htTuvVyAX4wJ7U4/j+fiU4mrRuOGdWA0fijTiSjZLM4=; b=hP0nepFGM9cdCopOAP2Sms7pajr2i4hKTYA2nAMmr5PkZ4hp4qymxgtM3WXTToxQc+ CaRCDTF/+9HyZKmQJVYwCnN0swe0FhnD5Opzf94XHwLiavN4YuRILd8W5Z2ezLRKpzdy 9G5XV+MoU1iv3X8Pgs3dcwbAL9ZbTo+AcByQDjJNS+0UfByZLwTyLbPezZDRfAem/3XC cJyCNe+/uzGC5t0y+x4FJMyFFbLXBRaMXxF9lwCkZnfM4ZYAKDrDyJEBzGZeCg1QOm5N beYoKwTbqK2/6mAnAYDp0SHzFnvKiBMeGxQz4OAuD3qlZzCiGVnefYIJtGU3ui3m8SaG cYsA== X-Gm-Message-State: AOJu0YxUEsNxO1v0HbWo8ZC/dC9d9LchVdKk+5zeGwytUe3VY9gQ6+Rc Bu4rdEMga8GH7a1nIMd9G0wL6IdSiG4CTr7HlIB5Q542kkbgBIroDwicaVL8OVLG/duPWiBnnA6 Jv12pV0uC4jmXZHhU+eeiKDy9YMXTJw4O9HjP X-Google-Smtp-Source: AGHT+IHXIaoM8v/s7uf1cZjr4pYirqJZMsxVSKJ5GaStD8mOFri7ZDNfcWpdu/4lTUQiJ7WEfOKXZEoHRA9d/Yu57xc= X-Received: by 2002:a5d:56cc:0:b0:33b:178a:6715 with SMTP id m12-20020a5d56cc000000b0033b178a6715mr181170wrw.24.1707170095131; Mon, 05 Feb 2024 13:54:55 -0800 (PST) MIME-Version: 1.0 References: <20240129193512.123145-1-lokeshgidra@google.com> <20240129193512.123145-4-lokeshgidra@google.com> <20240129203626.uq5tdic4z5qua5qy@revolver> <20240130025803.2go3xekza5qubxgz@revolver> <20240131214104.rgw3x5vuap43xubi@revolver> In-Reply-To: From: Lokesh Gidra Date: Mon, 5 Feb 2024 13:54:43 -0800 Message-ID: Subject: Re: [PATCH v2 3/3] userfaultfd: use per-vma locks in userfaultfd operations To: Suren Baghdasaryan Cc: "Liam R. Howlett" , akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, kernel-team@android.com, aarcange@redhat.com, peterx@redhat.com, david@redhat.com, axelrasmussen@google.com, bgeffon@google.com, willy@infradead.org, jannh@google.com, kaleshsingh@google.com, ngeoffray@google.com, timmurray@google.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: BBD854001C X-Stat-Signature: gjrzy5bseu4oa6rfn58b8d7m34dr1dbp X-HE-Tag: 1707170096-39365 X-HE-Meta: U2FsdGVkX1+RlOhYQUaABRnw1Q1jEqLTGyC6Hofo/8CN4srd37wZ24bWpFhoLsykcuMq7GyYmfXkPzkxq02gZ35ZNE0DV9nfhSB0emoTuN/HZBLm1oe/8XVO5H+dh/ZZ9JeIrq+BwC+VULobqC5ryHbO3gM3QWZyyNaK8MLz+6975gc3ppHdvtRcWARtXbd7ZpSOJloGWAYmZgPygHz3Vi4wT16GWeNYp26/AGUjoLyFkdoEiclW1pJTTGQIRWKga4Z6ORCadgClJuo7vUyA0v6uF/hBFdPvnDonfGyW+ZM9qweejzJlaMsJQSjtRHUaXesSUR9wuDuDeTdh0qopKWVzCydMkj9PFBpZ7ke2W9ObzTGqYSosae8QoECm45+I3MBWnalDQYjUiNFZIaE1q4drQ8Sq49ImkhUGiM7yhYg5doQPrPpg6lL3BP5Z76gvRx0xuVNOzpisx6wXdXJuZrERjB6MvfvkwQOBCP9vhP7d1UBrZ39MrdkdolQ1OcxNRQK/zBO0gTONhq7IzMkZg+h+GEiByzvSRXEIbbtP6LPfIlqooccA3wgih29yVecWb0aBjKhIHOrvwZi5BB9Mchp74VnhCx8uSEZvxJXaLlY0GSAZ92lTVCytBLJuI6YVtIJl542H3mNU+yqvxSqFm6Sy7m9YN7HmeWzgnSoEImfd67/SpZHQwwMtf4caZMemiyDfIIE0PEV1oFVBdowKht5LcGmHyjyngtGG1I/Z+Vz0QysGTv9omAR5aKAy19vMowommXsekxdXWU8DhfWAUYcJ8F8msM/4JN0eO4n/bFicfF1lgqYI4TLFrymrHy1Bzqylc/pJUiPKQuJPayDuLOPIZ906X6IbbdM+gH4zF7RG2SinxdX/euYfQbYwJcF0mxptLJO4wJhmrrQHuDsAoW/6USeRnG0aDZYfK+wILr1+Yke7wm7LuE4a2+E8z7rh8ADOhU5V0bnIJvo8bda 2kQAW4uY 81hQRvD0lQmbpxq+NwMuTElWUV1yrxEitY62z4y0OVttJj7IeAniR3XMF0nwJy5ku0v43LFMIG28umtFFWOKC2OkSw11IuUM4pGfxGGke/fR2zxq3Z6YD+/rQ0iwvhTpT03H00SGGAR30Xmpu+7tXj30fSM90rMNvh0J1k8f0Ow3QFQKzUbUyNpIN0sDG7C+WjPVdHDC/yN3ttt/bHdRZo6Idtcrq/0QXlXAK/NoMKVHXM9xNuxUwUmG+kFHDzy8h01GE99fBTgX4CPBcHh1bnQcagQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 5, 2024 at 1:47=E2=80=AFPM Suren Baghdasaryan wrote: > > On Wed, Jan 31, 2024 at 1:41=E2=80=AFPM Liam R. Howlett wrote: > > > > * Lokesh Gidra [240130 21:49]: > > > On Mon, Jan 29, 2024 at 6:58=E2=80=AFPM Liam R. Howlett wrote: > > > > > > > > * Lokesh Gidra [240129 19:28]: > > > > > On Mon, Jan 29, 2024 at 12:53=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > > > > > > > ... > > > > > > > > > > > Your suggestion is definitely simpler and easier to follow, but d= ue to > > > > > the overflow situation that Suren pointed out, I would still need= to > > > > > keep the locking/boolean dance, no? IIUC, even if I were to retur= n > > > > > EAGAIN to the userspace, there is no guarantee that subsequent io= ctls > > > > > on the same vma will succeed due to the same overflow, until some= one > > > > > acquires and releases mmap_lock in write-mode. > > > > > Also, sometimes it seems insufficient whether we managed to lock = vma > > > > > or not. For instance, lock_vma_under_rcu() checks if anon_vma (fo= r > > > > > anonymous vma) exists. If not then it bails out. > > > > > So it seems to me that we have to provide some fall back in > > > > > userfaultfd operations which executes with mmap_lock in read-mode= . > > > > > > > > Fair enough, what if we didn't use the sequence number and just loc= ked > > > > the vma directly? > > > > > > Looks good to me, unless someone else has any objections. > > > > > > > > /* This will wait on the vma lock, so once we return it's locked */ > > > > void vma_aquire_read_lock(struct vm_area_struct *vma) > > > > { > > > > mmap_assert_locked(vma->vm_mm); > > > > down_read(&vma->vm_lock->lock); > > > > } > > > > > > > > struct vm_area_struct *lock_vma(struct mm_struct *mm, > > > > unsigned long addr)) /* or some better name.. */ > > > > { > > > > struct vm_area_struct *vma; > > > > > > > > vma =3D lock_vma_under_rcu(mm, addr); > > > > if (vma) > > > > return vma; > > > > > > > > mmap_read_lock(mm); > > > > /* mm sequence cannot change, no mm writers anyways. > > > > * find_mergeable_anon_vma is only a concern in the page fa= ult > > > > * path > > > > * start/end won't change under the mmap_lock > > > > * vma won't become detached as we have the mmap_lock in re= ad > > > > * We are now sure no writes will change the VMA > > > > * So let's make sure no other context is isolating the vma > > > > */ > > > > vma =3D lookup_vma(mm, addr); > > > > if (vma) > > > We can take care of anon_vma as well here right? I can take a bool > > > parameter ('prepare_anon' or something) and then: > > > > > > if (vma) { > > > if (prepare_anon && vma_is_anonymous(vma)) && > > > !anon_vma_prepare(vma)) { > > > vma =3D ERR_PTR(-ENOMEM); > > > goto out_unlock; > > > } > > > > vma_aquire_read_lock(vma); > > > } > > > out_unlock: > > > > mmap_read_unlock(mm); > > > > return vma; > > > > } > > > > Do you need this? I didn't think this was happening in the code as > > written? If you need it I would suggest making it happen always and > > ditch the flag until a user needs this variant, but document what's > > going on in here or even have a better name. > > I think yes, you do need this. I can see calls to anon_vma_prepare() > under mmap_read_lock() protection in both mfill_atomic_hugetlb() and > in mfill_atomic(). This means, just like in the pagefault path, we > modify vma->anon_vma under mmap_read_lock protection which guarantees > that adjacent VMAs won't change. This is important because > __anon_vma_prepare() uses find_mergeable_anon_vma() that needs the > neighboring VMAs to be stable. Per-VMA lock guarantees stability of > the VMA we locked but not of its neighbors, therefore holding per-VMA > lock while calling anon_vma_prepare() is not enough. The solution > Lokesh suggests would call anon_vma_prepare() under mmap_read_lock and > therefore would avoid the issue. > Thanks, Suren. anon_vma_prepare() is also called in validate_move_areas() via move_pages()= . > > > > > Thanks, > > Liam