From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83F66C001E0 for ; Thu, 27 Jul 2023 11:58:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24F9F6B0072; Thu, 27 Jul 2023 07:58:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DB366B0074; Thu, 27 Jul 2023 07:58:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0528F6B0078; Thu, 27 Jul 2023 07:58:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E3BAF6B0072 for ; Thu, 27 Jul 2023 07:58:11 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B594B1404A3 for ; Thu, 27 Jul 2023 11:58:11 +0000 (UTC) X-FDA: 81057243582.20.83208CE Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2078.outbound.protection.outlook.com [40.107.102.78]) by imf03.hostedemail.com (Postfix) with ESMTP id B8EEA2000A for ; Thu, 27 Jul 2023 11:58:08 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=PbD+o5uC; spf=pass (imf03.hostedemail.com: domain of jgg@nvidia.com designates 40.107.102.78 as permitted sender) smtp.mailfrom=jgg@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690459088; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uut7uMdmYBjtt1hPuoVQKl5dpMrnn2AWjQ1CpNvDx74=; b=RCU1GXXyqhz6LdkBP9rsWdaEqYD43vuesDS7loCHuI6rg7JKlTWmPqnnJJj1JvWGTVh1ok xoeyuSY09tSznoAMX81T/h/xvzTuMRygmLq4BUTWHgJ/bhvjW4zArjxHgMne6c7v7rRakg 24QtNgO9757bx2nKHzrhHW7NmjiJ0EA= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1690459088; a=rsa-sha256; cv=pass; b=Tncbptk6Y7vRSCKzvXtt4rnwO8Z8c5MjecyH/vVo6BBG+nsgaY07ShBLvVq8geku0zSSTt tX2E0lvlupWBxD7UxQaiHj5HCf0B/DKycn0j+ANir/8vMEgMqOsIG89sblv/d2g2eLPbkz rM1kELN9O4WVkpnYAckInUeZoKlZSwM= ARC-Authentication-Results: i=2; imf03.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=PbD+o5uC; spf=pass (imf03.hostedemail.com: domain of jgg@nvidia.com designates 40.107.102.78 as permitted sender) smtp.mailfrom=jgg@nvidia.com; dmarc=pass (policy=reject) header.from=nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=e7CqUcgtvQTXgS2sNk7Kn99WS8n/KmrvVlm79rBDGeT5NR25FFmf9lFPGeGQU8Z5Cxl6h4YooypA3LdtHUgSs6v30AL60sivOCJCQv5yCvvpz+g9kVWUT8YjYrbfcxQcYtdV5QNgJpuJd895EXemyVuXvEY+ByhMzjWD0qa/L5DDk5sHtrgLU3K6fRZA7yL3uGC0ANQ8kHn+xF117wtKvmRpB1pLQIhPRwopSalPyuK4YZtvHJQ6Ma28zDhXbVUqAiNq86giKUpkgSSj30HsI4ysjt+rtX+0UyxDz6bocbinjLKbx2ZPbrh/UyBCZAXjSKFXtRFDb6yTJ56fEANNVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uut7uMdmYBjtt1hPuoVQKl5dpMrnn2AWjQ1CpNvDx74=; b=CIwTZNBPZBY/0rEGIOUfMaLtMSIHd31za/GMchvhRLbLSGxPwvJB45yq8TOrBpHIxTCmzl80y0CRCwVdNPVczsi+S/TEtg40P1cKgGOViLzDm82m5A1y/v/2HK6oiPly5HE4mQGlFgwgqfuJhoyOimd7XYZBPrPHGd8VaBMEz6Pe3uaInWt2Fv4apVIVzU+cv1gF74OUkR9emvPzldFs7SJH4GvtGSNumE7slK9eI//G0jtOxU+g8/18/8wUsNuyhOfBwL3jCHjAIq3ZKTQ70mb6EkB6yCeW6VmdJVM4jqRoDmAp6ZJTthXjqtdlCYwtCnqzswnCEBybGu9SHBJbiQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uut7uMdmYBjtt1hPuoVQKl5dpMrnn2AWjQ1CpNvDx74=; b=PbD+o5uC4pOTMOkO5STcYkcMyPy0fevHaffs+5xlQPuuboZGv1Rzl7hXz3fRaFuWNG2CQTgAAd7sHOOnuvwk+PH6kElbQj1CldneHVuuJDk35c8Lnn5P/9UFhXSEG4F0ADowQcyyc0GVj94nxxO+FSxA2apgzZVbs47KtSEBlYkMAHQ2BhAqqvs01AnkT5/SX1a3oZuSB5iPF1i5bBjb4x4rb9cJximjpQOYaVJCUchh/CzZfBXqcXwMTlUpxtSzZoZc3SK6SU5O1txGJo2s8DCBoRSZYCzFcX2rV+ML+z/Exn6y3LCy6B3s4W3j9TN9Vk2AfZGpGkH7VdweY80oVQ== Received: from LV2PR12MB5869.namprd12.prod.outlook.com (2603:10b6:408:176::16) by LV8PR12MB9181.namprd12.prod.outlook.com (2603:10b6:408:18d::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6631.29; Thu, 27 Jul 2023 11:58:06 +0000 Received: from LV2PR12MB5869.namprd12.prod.outlook.com ([fe80::5111:16e8:5afe:1da1]) by LV2PR12MB5869.namprd12.prod.outlook.com ([fe80::5111:16e8:5afe:1da1%6]) with mapi id 15.20.6631.026; Thu, 27 Jul 2023 11:58:06 +0000 Date: Thu, 27 Jul 2023 08:58:03 -0300 From: Jason Gunthorpe To: "Kasireddy, Vivek" Cc: Alistair Popple , Gerd Hoffmann , "Kim, Dongwon" , David Hildenbrand , "Chang, Junxiao" , Hugh Dickins , Peter Xu , "linux-mm@kvack.org" , "dri-devel@lists.freedesktop.org" , Mike Kravetz Subject: Re: [RFC v1 1/3] mm/mmu_notifier: Add a new notifier for mapping updates (new pages) Message-ID: References: <87jzuwlkae.fsf@nvdebian.thelocal> <87pm4nj6s5.fsf@nvdebian.thelocal> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: BYAPR07CA0102.namprd07.prod.outlook.com (2603:10b6:a03:12b::43) To LV2PR12MB5869.namprd12.prod.outlook.com (2603:10b6:408:176::16) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV2PR12MB5869:EE_|LV8PR12MB9181:EE_ X-MS-Office365-Filtering-Correlation-Id: 2b4c8d33-bbec-4f6b-7293-08db8e98bd3c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qgrh5fFCc7Nx844YlJFzFc98hj2gv4Uwb3zLy0wNSNyGE66aLfPf/LJHmWY0a0bo03bvPe2+BhKiskp6dLItA9w1CP8e8rQo7GJMhFlIZK1GnpLgGgqgOW/PPR+Yq/7GJy9ong7P1S1/qiTzTbg5DvSTS888l7bJ2WthmOhKARy2ocqgun0aw6N2pZguH8trsSWmvGH0gS8HsTLqF8yN8koUJ56l9XFbavE93txe4iHvHNp35dg0VgKUZEPJrTes8yJDdGVjbYxHAlFkR+w+iw1T1fJMRXBPJW3pnQdtS08lXzKsgWzaCffpUbHp8W/efw28hUQsuQoK2q0Qhe6WtHaHtQ8IyvQmmaplcH9ts+0twe1zsRCnwfbEFOX6OEhyd3RCUBPed5FZLh6wYjdrw1yDkrcmqLsUHDhf9F9IkBKlgDLLG5VTR5ligpFI8SM6NS40/YIjSbyzRIsMeHWaryEkR31yvEIxrgCkZSlnyi1ndzOEj5Aqn/zx+AKmWWxh2Vf3ySZLM35bS5mO/SvjtNC1jInNlSDkghrDJbYN55nJ6UtnjJamibCKgVYoB3aE X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV2PR12MB5869.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(396003)(39860400002)(366004)(346002)(376002)(136003)(451199021)(54906003)(6512007)(38100700002)(6666004)(6486002)(478600001)(316002)(41300700001)(5660300002)(8676002)(4326008)(6916009)(8936002)(66476007)(66556008)(66946007)(2616005)(186003)(83380400001)(26005)(6506007)(86362001)(2906002)(15650500001)(7416002)(36756003)(66899021);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?i6DcuQReTwKCsyKCtg6RD/0pxJCCxrTx+SMUTJqfl4HnMduxd2Ek0UJIaDMH?= =?us-ascii?Q?wPakHwmDG8zeSjE0bMIlGFBrMnV5BWnmwr1tY6FfgObLZWsfI2eYcsiPEDse?= =?us-ascii?Q?PLsIbet4Dec/LEIw5wR+GF/dgxNfcaIE8+LaCJqSW/S2OecgyA/1rbwMpkSB?= =?us-ascii?Q?gEThhNwc+20WusWNeF7QsoNDrvBqG4mxzDTwh3O8YCllaJu+y9Qy218N+eac?= =?us-ascii?Q?icGkOF9abNAU/UKj35tqTDQIjX4AdIcPsMlpoSl5seu7VhQjeudjM9ZE2zl0?= =?us-ascii?Q?l6HHtrx5g02RxWKRrrc+R2Erqvj9co+jygFfQ4ryUklx0P4AYAX15qJ1qn/h?= =?us-ascii?Q?OB3olNja+T5LRVCXEgsqGl9ZtfTg8ZEn8aG6oOcAqXEBcL3j18sTNmuQluLs?= =?us-ascii?Q?gxck0pKw/j1rKqpH3HSO5mVGGgocgAv56RxWh/YmoaZhyNbiEPKRUFehhbMw?= =?us-ascii?Q?i+9IfGlfOsgUdQt69H+ZQmjPocoMFEt90Q0mi8Tp5PIHkb/6iO9fLcqVXHsV?= =?us-ascii?Q?UANwEUgXToqJ55NG5nc87dzEHnKVVQ7A2J+xt07cFuchu3/uLG2JbvUZS5vL?= =?us-ascii?Q?JfBn71A+Ocs9gsZD+yYSmkBczGTFbMXYJJzqTjRL+Iv2rN0D+IRa+PNirY5+?= =?us-ascii?Q?kvG3Crac5ybvaGaa6DcWwXN+TKC+WXeM+HGqVSWjuUrlAuB135v/SvVMormM?= =?us-ascii?Q?IyP/2EIl3ztB+hvWbE12Jep+SKEbiyv3QMp0avrvCi81xwH62zGbXL6fSi+Q?= =?us-ascii?Q?g4TCESfq3FBWlgHwqJG4g1Bd9748wSBqNm+mpLpuCYv0daeV95SGBEC8nSB+?= =?us-ascii?Q?lK6AZ2tNw5iljYuNRtlTfHQYXrhXunAnlFuJRudt8illnRq1Qg2j5D/nET1s?= =?us-ascii?Q?WvzxtUADWU+YPGm0EcL1fVtY8jUAOGRe1DdA5ZOSuvDAxmTqA40YNZ6KDx+Y?= =?us-ascii?Q?sTups0ejrrbmn92AvfCiF69dkFvXoJdESiMFtuf7rfriStLP1/VA4sA0KwrA?= =?us-ascii?Q?fpJCv9hWA5eOznPcDcsM4dXoCJcJ6PO6lLKpQKhbG+8MEg6w/KWKk+74MgH8?= =?us-ascii?Q?jb5r+2iOn5PeFnXJWpcT0rLZoacDqUxAAZnDLXJl1wNNVgUwmuYL44+6/3wI?= =?us-ascii?Q?CaB+LDKQdVNZO/iUwwbNAbrzogj/IJKIBynQ1uo17oTS/IKq4LyucPVAZuDT?= =?us-ascii?Q?pSB9Yl+6LsTqtzTISI9uvM8MdeVCEFcWQMK+ZlbE8h0Py9xj4Zf4kWnmf+9J?= =?us-ascii?Q?W24/LSs9O2Iz8wZrEzrFP1P41dcpT/v1zlDaNHT9y+dIc/cs3MwTPRQuzrtA?= =?us-ascii?Q?cFfCz1FaE1E7ZIRtkk4pfA3szTPFXQSHKuyvM7zrjkmRipDJIoUzumqx/YJh?= =?us-ascii?Q?LvSGFUilI9zLiKb1bzzpVjlITpTCOKHrFSMoIdKCOYsTBIYYMR4yUYg4322X?= =?us-ascii?Q?QsavGcHRh+XNGi3zw3i1/5G/mUCp6OYNMAc978HScvnG0K5IBNY2eEg5hQ8+?= =?us-ascii?Q?TR5RMV/Q8rdvMlZCShyse+tddiKhN92rkHWUgyn7loBBzDNZUvKNXkj6uaVj?= =?us-ascii?Q?oZbxoxtJtn4/kC7yhasPnDj9Ut2Ktl306vb7Zquh?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2b4c8d33-bbec-4f6b-7293-08db8e98bd3c X-MS-Exchange-CrossTenant-AuthSource: LV2PR12MB5869.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Jul 2023 11:58:06.5740 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 3fmmcVGZQUk7vWIWHfodG95ECckc0csTboJhLyTDL8b37Ojjpd0UkPsyugVW1pmU X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9181 X-Rspamd-Queue-Id: B8EEA2000A X-Rspam-User: X-Stat-Signature: gnsc1sqpj84716ww97d1swsabx5qbrsh X-Rspamd-Server: rspam03 X-HE-Tag: 1690459088-290941 X-HE-Meta: U2FsdGVkX19ytKOySruQC5M+Fxd4VoLOVY00iU6UBdDxsVfgTkB5yTt8gGP4b33wpgHhjp5WIYNtE3IkZGplQjZP6PA72NZQr4b8O+fC/xUmJtwEyzFKoxJpijuM5MLjlDuBNDJ4agvBNlvWRN1OIYQLtVjg/LWQ5R0sOBzDug2IrpoFeMMFsANFqgtSxWhUOw73NlCueXxMqGmALuZteCj6DS4YoYx/JLPP77SnNl84Nk3XFzibwwbdzCvowP7VgPb+2BTtGcBc4+GTHkX2/PDUsKxXr2khSJbqabEL/x9QSUQjsELaAUteHIbPyHCHwSckZWl0sr54OrH9kzffOJG1vR1wEE4u7O5Io3zwpvffpJf7cY6q0E8nrCusF55cwrQfO84qGi0qubvGUmMMHgR5+fBZdyA50mcY9ruSuGOYM74FYnKpuuSXIByNFY5S46naaVQLroext4J3r91fsKC4XhPEWL+E0j5RfG0GjHBfdkhlpujsNjqs3jl9TaRi//5ZZLQUXv8ja8fQp6jyeypr57PTfHWOs3IDjBtJ2ZT2kaz9B96ncLYFXt2VLOdEXlf7g+vV5d4ICaWXRWzQrF8YDZw013nB5ZX2Zw79yKp+MYyh08nic4BOT/QRM9xtE/r6NCHnCQ7C+u2pyphJKTxuDPlZf6qmBhN8vOyR6fwRrJu4M6T8KHG2DV4QadNkmhdI9tLlb2MWWb5kiw2Cci8L0wTqm52WSfMLPu57BQW5hYeAGmAzV9oMONZBFJk7ofxQBxpKI72wvRMny0m12tvFoVJJ1xn0r6Q0a957LhBp+EAZMivXDG8N1ZkvLFtQN1QEtZSFG4tVpfzcqQILtxoz5umybH53/L+bpnAQ3vaCYYnaqy97XOvexZQ9KSf0fZlinYJe/KXBxm8zhYrQSZR0qwpsXAmfABv6GUi58bGdrYodG5K7a/6PgjbbsWml6pf323wfB/y8iKXmUmO 5jW20vZY FuQI715zWAab+Q0ILZmrZRwUB4fWX32yv1c7+xiSMFprBlO1HnS101f6Nba5FvFhIxI9LGjzvoNNITxS8uJDoqSkK8F4fNCUPQkC1F/PLGS/CjgBk0WzikZHYEmhklR4+cz4PtYaLlwk1I8R0b2CaJJ5v6MhicZENrxSqRUOP9htTAmul1bS5Z7DB0iyOV4ztlwpLi0vGAQRlg+vq5yn6ds5m1GtWzHXujM+EGAmReQH3/Z8BKpVmEEIrf7esr289VFWXeTmmu7S3GEWUw0VgTqWk5YRTllXR81xsG+eAOASrTO1p+ENNpPX6rhUADqXqw83X7pJbe+Zb1A8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 27, 2023 at 07:34:30AM +0000, Kasireddy, Vivek wrote: > Hi Jason, > > > > > On Tue, Jul 25, 2023 at 10:44:09PM +0000, Kasireddy, Vivek wrote: > > > > If you still need the memory mapped then you re-call hmm_range_fault > > > > and re-obtain it. hmm_range_fault will resolve all the races and you > > > > get new pages. > > > > > IIUC, for my udmabuf use-case, it looks like calling hmm_range_fault > > > immediately after an invalidate (range notification) would preemptively > > fault in > > > new pages before a write. The problem with that is if a read occurs on > > those > > > new pages, then the data is incorrect as a write may not have > > > happened yet. > > > > It cannot be, if you use hmm_range_fault correctly you cannot get > > corruption no matter what is done to the mmap'd memfd. If there is > > otherwise it is a hmm_range_fault bug plain and simple. > > > > > Ideally, what I am looking for is for getting new pages at the time of or after > > > a write; until then, it is ok to use the old pages given my use-case. > > > > It is wrong, if you are synchronizing the vma then you must use the > > latest copy. If your use case can tolerate it then keep a 'not > > present' indication for the missing pages until you actually need > > them, but dmabuf doesn't really provide an API for that. > > > > > I think the difference comes down to whether we (udmabuf driver) want to > > > grab the new pages after getting notified about a PTE update because > > > of a fault > > > > Why? You still haven't explained why you want this. > Ok, let me explain using one of the udmabuf selftests (added in patch #3) > to describe the problem (sorry, I'd have to use the terms memfd, hole, etc) > I am trying to solve: > size = MEMFD_SIZE * page_size; > memfd = create_memfd_with_seals(size, false); > addr1 = mmap_fd(memfd, size); > write_to_memfd(addr1, size, 'a'); > buf = create_udmabuf_list(devfd, memfd, size); > addr2 = mmap_fd(buf, NUM_PAGES * NUM_ENTRIES * getpagesize()); > punch_hole(memfd, MEMFD_SIZE / 2); > -> At this point, if I were to read addr1, it'd still have "a" in relevant areas > because a new write hasn't happened yet. And, since this results in an > invalidation (notification) of the associated VMA range, I could register > a callback in udmabuf driver and get notified but I am not sure how or > why that would be useful. When you get an invalidation you trigger dmabuf move, which revokes the importes use of the dmabuf because the underlying memory has changed. This is exactly the same as a GPU driver migrating memory to/fro CPU memory. > > write_to_memfd(addr1, size, 'b'); > -> Here, the hole gets refilled as a result of the above writes which trigger > faults and the PTEs are updated to point to new pages. When this happens, > the udmabuf driver needs to be made aware of the new pages that were > faulted in because of the new writes. You only need this because you are not processing the invalidate. > a way to get notified when the hole is written to, the solution I came up > with is to either add a new notifier or add calls to change_pte() when the > PTEs do get updated. However, considering your suggestion to use > hmm_range_fault(), it is not clear to me how it would help while the hole > is being written to as the writes occur outside of the > udmabuf driver. You have the design backwards. When a dmabuf importer asks for the dmabuf to be present you call hmm_range_fault() and you get back whatever memory is appropriate. The importer can then use it. If the underlying memory changes then you get the invalidation and you trigger move. The importer stops using the memory and the underlying pages change. Later the importer decides it needs the memory again so it again asks for the dmabuf to be present, which does hmm_range_fault and gets whatever is appropriate at the time. Jason