From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44583C3ABC0 for ; Thu, 8 May 2025 15:47:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 375FA6B000A; Thu, 8 May 2025 11:47:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 326126B0082; Thu, 8 May 2025 11:47:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C6136B0083; Thu, 8 May 2025 11:47:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F37346B000A for ; Thu, 8 May 2025 11:47:35 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 11B71C9AE7 for ; Thu, 8 May 2025 15:47:37 +0000 (UTC) X-FDA: 83420170554.18.7C8394E Received: from mailout2.w2.samsung.com (mailout2.w2.samsung.com [211.189.100.12]) by imf06.hostedemail.com (Postfix) with ESMTP id 210B6180013 for ; Thu, 8 May 2025 15:47:33 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=M+5ML4In; dmarc=pass (policy=none) header.from=partner.samsung.com; spf=pass (imf06.hostedemail.com: domain of p.antoniou@partner.samsung.com designates 211.189.100.12 as permitted sender) smtp.mailfrom=p.antoniou@partner.samsung.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746719254; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AZ7rC7nq5/XQGIiqENKZmFdFFqcOSeECRbuGkKQAtKk=; b=by8ZqZFC13M7Qu97qZfVoLo9dHepxwTrDjh3IEQkF997wByji049E5cKIbB0CO3IEhAvto GDgD02ido0RL1oMEKBxuGQFDcCdX202KEajYNV0BDghvw3MDkAIoMD4BMQUw99OnTIT12q F42g8DYE9+jxFbDVz8bO/BN5uAksOK0= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=samsung.com header.s=mail20170921 header.b=M+5ML4In; dmarc=pass (policy=none) header.from=partner.samsung.com; spf=pass (imf06.hostedemail.com: domain of p.antoniou@partner.samsung.com designates 211.189.100.12 as permitted sender) smtp.mailfrom=p.antoniou@partner.samsung.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746719254; a=rsa-sha256; cv=none; b=j9t2VaLIgZThvySGoe6sBXX/2ima8qWWsFRHmexyulqvU7Mr+dfZgwwdHjGIEkSGFFol/z 4dka19Gce1IetNOhXa8488aibYQ8kb9kbs3rHZa1j0CpWrVXJXl1QdqegVgMCllLpW2Jfz T6Q+e3wOFArmpwWx3SBqs0ZG43wbVbI= Received: from uscas1p2.samsung.com (unknown [182.198.245.207]) by mailout2.w2.samsung.com (KnoxPortal) with ESMTP id 20250508154732usoutp02758e6f1cd4846bbcf3fa106fd112e37b~9l_fgb1W11687616876usoutp027; Thu, 8 May 2025 15:47:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.w2.samsung.com 20250508154732usoutp02758e6f1cd4846bbcf3fa106fd112e37b~9l_fgb1W11687616876usoutp027 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1746719252; bh=AZ7rC7nq5/XQGIiqENKZmFdFFqcOSeECRbuGkKQAtKk=; h=Date:From:To:CC:Subject:In-Reply-To:References:From; b=M+5ML4Inh/gnlfsIcvGy2DU1EK9eGmnSFpo7sZmN+dNMGjlcnOhNMRfUC6UCDwnhC iYXxGmFcEeVIaTvhHP+oYGfC0Xc1dhiDpk+Tbb88MDTaINjDSr4vK3cdmyn/Q69ksv kDKVdjnQVVljXZR1glM78FJd9nz4vRQlobUd25Jk= Received: from ussmtxp2.samsung.com (u137.gpu85.samsung.co.kr [203.254.195.137]) by uscas1p2.samsung.com (KnoxPortal) with ESMTP id 20250508154732uscas1p24412e43fb0c0b24a602414e95e883a4a~9l_fYiOeu1359613596uscas1p2m; Thu, 8 May 2025 15:47:32 +0000 (GMT) Received: from ATXPVPPTAGT03.sarc.samsung.com (unknown [105.148.161.7]) by ussmtxp2.samsung.com (KnoxPortal) with ESMTP id 20250508154732ussmtxp2c9945dce0ec9e1f4fac4867a7dd72bc8~9l_fPMVIZ2911529115ussmtxp2E; Thu, 8 May 2025 15:47:32 +0000 (GMT) Received: from pps.filterd (ATXPVPPTAGT03.sarc.samsung.com [127.0.0.1]) by ATXPVPPTAGT03.sarc.samsung.com (8.18.1.2/8.18.1.2) with ESMTP id 548EQ55b046772; Thu, 8 May 2025 10:47:31 -0500 Received: from webmail.sarc.samsung.com ([172.30.39.9]) by ATXPVPPTAGT03.sarc.samsung.com (PPS) with ESMTP id 46df5wbvp3-1; Thu, 08 May 2025 10:47:31 -0500 Received: from sarc.samsung.com (105.148.145.5) by au1ppexchange01.sarc.samsung.com (105.148.32.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.4; Thu, 8 May 2025 10:47:24 -0500 Date: Thu, 8 May 2025 18:47:20 +0300 From: Pantelis Antoniou To: David Hildenbrand CC: Andrew Morton , , , Artem Krupotkin , Charles Briere , Wade Farnsworth , Peter Xu Subject: Re: [PATCH 1/1] Fix zero copy I/O on __get_user_pages allocated pages Message-ID: <20250508184720.17bd1f62@sarc.samsung.com> In-Reply-To: Organization: SARC X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-ClientProxiedBy: au1ppexchange01.sarc.samsung.com (105.148.32.81) To au1ppexchange01.sarc.samsung.com (105.148.32.81) X-CFilter-Loop: Reflected Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: vwgU-4LG88mKsejoz9mkTyqoIqFm2ioq X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwNTA4MDEzNyBTYWx0ZWRfXxuZQ+m+xzxIQ 63Yg+1ktOzcCTPQVk2xZpihGkl4fF6GL5TeJH+Z1BTyLK7hwBqH51UJkw7QH8gNSG+1v0U0plqW To68jWn1964z2xxTlhaXw2BBvjxl5Rv9edP+Al2VQxTJe9qbdqX+0o2a9H4uEMlUTggrCJmCvH9 J5zg0mEZX0HgkfpCb8NPO4T85gU5ZLEid9WDndssOnWxvUqvfzhFhDgnRaTIB4Sh9fzI7NiuzC+ 1WT5y6lIsuuttlkWD/5aAESPSCOKKlppALv5XZ+4xSHlC38igDZvdouoeNv6COvGb6H6RH+906x Q+jFg3fHR/P9dQGXQ9N7GV1PotC2DNk15Jp7hIogX3WFSLJPisf5bLyfBROHbHPHinl/OHaBbWP LiElIzZd X-Proofpoint-ORIG-GUID: vwgU-4LG88mKsejoz9mkTyqoIqFm2ioq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1099,Hydra:6.0.736,FMLib:17.12.80.40 definitions=2025-05-08_05,2025-05-08_01,2025-02-21_01 X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 malwarescore=0 clxscore=1015 priorityscore=1501 mlxscore=0 lowpriorityscore=0 impostorscore=0 spamscore=0 phishscore=0 bulkscore=0 mlxlogscore=988 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2504070000 definitions=main-2505080137 X-CMS-MailID: 20250508154732uscas1p24412e43fb0c0b24a602414e95e883a4a X-CMS-RootMailID: 20250507154119uscas1p17799fe7589e4f1bd53d2d3dc7f44cb8c References: <20250507154105.763088-1-p.antoniou@partner.samsung.com> <20250507154105.763088-2-p.antoniou@partner.samsung.com> <99ed92b7-c1b2-4e12-a7ee-776a7f890b47@redhat.com> <20250508182356.45dbfd40@sarc.samsung.com> X-Stat-Signature: 9ac6t13b7c1senkjshoe8bojj8ci1seo X-Rspamd-Queue-Id: 210B6180013 X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1746719253-461521 X-HE-Meta: U2FsdGVkX1+W362ahJPumxCN74xXPcZwZF2Q9aCcYtbkaB/k/ocJyPa9PnupdXpeClqAc0BdMgqBEs+gRLzkWPnlAM+p9Lq4ZNTCnBZJVv7fzX3JuS6jMNCeLdHebOeW5TMlapepoibxtcLmDtOECB14m/kqRDM8qGFLNWLCJTz43HQILgKuuczAiML0rpxBVfZmsoSTi+N31gtOdOsHrGElxtNe2NHKQ9dgPDOAABfkH5O1iihm6kybS6sLLnKuPIND6QTCR1OR5wJ8b2n73pCVCbUXvAnxOOvKADOHzKKIFrbtafdp3uVnklKDwDdkj1Ck45cOtDSJCSBWaGwsQAcZ3w7oo+rrQro4q9brv+FWV1k+RhwNRSBPA6tvMLjSc+lqzzsrsv+Lhw7Bvo2e3ALMytTfFEiPyhneKp80jHsXMGhiVUb30tlAIm8GO1F3L224KWa2Qb4D6x6Zurz8sE2y9WFdQvHE8sj84ljwBzk4KVCxm9WalFFCchIuID6W6SaPv/MYlOmT9JR3EjDtjT8xUpXJ8Mz8v1MRwVDkobryohbOEES1Zcw5pdPraZoisPLFPcXpYCLA8bTHDW/sMaG6sRwfIAkm/560UebFzKRHauYYwq8TfqBHOmCX9nMSBZsppeLj8WYiQxVa+6JXqKy0zBIGrNU6ng4e6evSsf/P52m+M5u03lWMmA0hKNeJyFLHqZ3EGhy8Z/c3hP55Em0jX1teg6SsyR2QGuLTDrZc8RArKxVbNxS6gbuWpbJ+RxXNiHOb5m18E3P7+5/19LQnZ8/4cth/aoRg5mHwhqoE+fZ9yHc72ig+M17JQQlCv94dJf0JzHxKRIRbotuf7qCLLnZKK038j9235kXFt+m1DAv4ZkwkTcW2OofRs/WB5Cpb1OqyB91eQBL20GUNdbJSWCn+7poStPQhpsABvMhuzNy3irtUwqVGUFaHJ301qIYQpnkwhyuwpOkOwu8 AyDxCs/O 8CuMMU/XpUO1314drqyMFoVUEB2lEUT5JPuqPXKjN5Fry3AHO4NIA9yoBbW7a6Fm/OTzWMFfwUp0Scgc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 8 May 2025 17:37:04 +0200 David Hildenbrand wrote: > On 08.=E2=80=8A05.=E2=80=8A25 17:=E2=80=8A23, Pantelis Antoniou wrote: > = On Thu, 8 May 2025 > 17:=E2=80=8A03:=E2=80=8A46 +0200 > David Hildenbrand wrote: > > > Hi there, > >> On 07. 05. 25 17: 41, Pantelis Antoniou wrote: Hi, > > Recent updates=20 > On 08.05.25 17:23, Pantelis Antoniou wrote: > > On Thu, 8 May 2025 17:03:46 +0200 > > David Hildenbrand wrote: > >=20 > > Hi there, > >=20 > >> On 07.=E2=80=8A05.=E2=80=8A25 17:=E2=80=8A41, Pantelis Antoniou wrote:= Hi, > Recent updates > >> to net filesystems enabled zero copy operations, > which require > >> getting a user space page pinned. > > This does not work for pages > >> that were allocated via __get_user_pages > >> On 07.05.25 17:41, Pantelis Antoniou wrote: > >> > >> Hi, > >> > >>> Recent updates to net filesystems enabled zero copy operations, > >>> which require getting a user space page pinned. > >>> > >>> This does not work for pages that were allocated via > >>> __get_user_pages and then mapped to user-space via remap_pfn_rage. > >> > >> Right. Because the struct page of a VM_PFNMAP *must not be > >> touched*. It has to be treated like it doesn't exist. > >> > >=20 > > Well, that's not exactly the case. For pages mapped to user space > > via remap_pfn_range() the VM_PFNMAP bit is set even though the > > pages do have a struct page. >=20 > Yes. And VM_PFNMAP is the big flag that these pages shall not be=20 > touched. Even if it exists. Even if you say please. :) >=20 > See the comment above vm_normal_page(): >=20 > "Special" mappings do not wish to be associated with a "struct > page" (either it doesn't exist, or it exists but they don't > want to touch it) >=20 > VM_MIXEDMAP could maybe be used for that purpose: possibly GUP also > has to be updated to make use of that. (I was hoping we can get rid > of VM_MIXEDMAP at some point) >=20 >=20 > >=20 > > The details of how it happens are at the cover page of this patch > > but let me paste the relevant bits here. > >=20 > > "In our emulation environment we have noticed failing writes when > > performing I/O from a userspace mapped DRM GEM buffer object. > > The platform does not use VRAM, all graphics memory is regular DRAM > > memory, allocated via __get_free_pages > >=20 > > The same write was successful from a heap allocated bounce buffer. > >=20 > > The sequence of events is as follows. > >=20 > > 1. A BO (Buffer Object) is created, and it's backing memory is > > allocated via __get_user_pages() >=20 > __get_user_pages() only allocates memory via page faults. Are you > sure you meant __get_user_pages() here? >=20 Oops, yeah, __get_free_pages(). Apologies for the confusion. > >=20 > > 2. Userspace mmaps a BO (Buffer Object) via a mmap call on the > > opened file handle of a DRM driver. The mapping is done via the > > drm_gem_mmap_obj() call. > >=20 > > 3. Userspace issues a write to a file copying the contents of the > > BO. > >=20 > > 3a. If the file is located on regular filesystem (like ext4), the > > write completes successfully. > >=20 > > 3b. If the file is located on a network filesystem, like 9p the > > write fails. > >=20 > > The write fails because v9fs_file_write_iter() will call > > netfs_unbuffered_write_iter(), netfs_unbuffered_write_iter_locked() > > which will call netfs_extract_user_iter() > >=20 > > netfs_extract_user_iter() will in turn call > > iov_iter_extract_pages() which for a user backed iterator will call > > iov_iter_extract_user_pages which will call pin_user_pages_fast() > > which finally will call __gup_longterm_locked(). > >=20 > > __gup_longterm_locked() will call __get_user_pages_locked() which > > will fail because the VMA is marked with the VM_IO and VM_PFNMAP > > flags." >=20 > So, drm_gem_mmap_obj() ends up using remap_pfn_rage()? >=20 Yes. > I can spot that drm_gem_mmap_obj() has a path where it explicitly sets >=20 > vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | > VM_DONTDUMP); >=20 > Which is a clear sign to core-MM (incl. GUP) to never mess with the=20 > mapped pages. >=20 Well, let just say this not quite right for pages that are normally allocated via __get_free_pages(). DRM has to handle both VRAM and regular system memory maps so maybe it's playing it safe here. > >=20 > >>> > >>> remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma > >>> bits. VM_PFNMAP in particular mark the pages as not having > >>> struct_page associated with them, which is not the case for > >>> __get_user_pages() > >>> > >>> This in turn makes any attempt to lock a page fail, and breaking > >>> I/O from that address range. > >>> > >>> This patch address it by special casing pages in those VMAs and > >>> not calling vm_normal_page() for them. > >>> > >>> Signed-off-by: Pantelis Antoniou > >>> --- > >>> mm/gup.c | 22 ++++++++++++++++++---- > >>> 1 file changed, 18 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/mm/gup.c b/mm/gup.c > >>> index 84461d384ae2..e185c18c0c81 100644 > >>> --- a/mm/gup.c > >>> +++ b/mm/gup.c > >>> @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(pte_t > >>> pte, struct page *page, return !userfaultfd_pte_wp(vma, pte); > >>> } > >>>=20=20=20=20 > >>> +static struct page *gup_normal_page(struct vm_area_struct *vma, > >>> + unsigned long address, pte_t pte) > >>> +{ > >>> + unsigned long pfn; > >>> + > >>> + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) { > >>> + pfn =3D pte_pfn(pte); > >>> + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn > > >>> highest_memmap_pfn) > >>> + return NULL; > >>> + return pfn_to_page(pfn); > >>> + } > >>> + return vm_normal_page(vma, address, pte); > >> > >> I enjoy seeing vm_normal_page() checks in GUP code. > >> > >> I don't enjoy seeing what you added before that :) > >> > >> If vm_normal_page() tells you "this is not a normal", then we > >> should not touch it. There is one exception: the shared zeropage. > >> > >> > >> So, unfortunately, this is wrong. > >> > >=20 > > Well, lets talk about a proper fix then for the previously mentioned > > user-space regression. >=20 > You really have to find out the responsible commit. GUP has been=20 > behaving like that forever I'm afraid. >=20 > And even the VM_PFNMAP was in drm_gem_mmap_obj() already at least in=20 > 2012 if I am not wrong. >=20 There is no single responsible commit, it was broken forever. It is just that no-one has ever tried to pin the pages to perform I/O before now. Regards -- Pantelis