From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3B5B4CA0ED3 for ; Mon, 2 Sep 2024 12:50:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E8C6C10E2F1; Mon, 2 Sep 2024 12:50:45 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (1024-bit key; secure) header.d=ffwll.ch header.i=@ffwll.ch header.b="UGSjVYaD"; dkim-atps=neutral Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9FEEF10E2EA for ; Mon, 2 Sep 2024 12:50:44 +0000 (UTC) Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-42bac9469e8so35126345e9.3 for ; Mon, 02 Sep 2024 05:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; t=1725281443; x=1725886243; darn=lists.freedesktop.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=0/sWF3gWzpFjPB/JWKyIkm6uG6SQDLOpW+j7huRVCBo=; b=UGSjVYaDzQy3iMiay8X2qdZgXje5E8K2pyIZSR5OaWkCuTWcC6LfZAErGMrmrmAS/8 qhCyM13bQ/b94MFMJ82uf40ByeT1Z+WWojf+o6kgmZpfcdW3hWxqJHYQiyFdUxe+XBLN 96XBZxsg9dA/gxvUbHS7E+TmVzAdgELQguWQw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725281443; x=1725886243; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0/sWF3gWzpFjPB/JWKyIkm6uG6SQDLOpW+j7huRVCBo=; b=ouTpbzk1tW23U0ffeCiLXKBpJ3eeRc6/mesRGewDUCAAErmwf/zAP+e9MAuMaxWXOc lUsGARkb8YiiVVOajJH41wdJ5pRR7R+hhWQnNpoRkJFGHSYqclHpAMUUdC4bbDmm0rwH ZnB5vrex9AivVMxds9Rk8tX6DMXILHkcdr20aPjHnLkAH3SrTf62OGob+J8Wn/JFKKkP jwZjLmGPK742hXtAsPKhLdP2w3br5MEZ5xeTwQDaBrpibWJ8ujp98qsaPy1UPxc0Wrwv fM11oPCFPeg5ypA1BKHrO0KcUaFb7z25yLIkKAxBqlOlTkLRiLO7pEMI0mYCeZkpBndE qx2Q== X-Forwarded-Encrypted: i=1; AJvYcCXXdgSWxgNWvuvrZsZDX8plZ93stwvPKasmuS0pO4TQREahrR98S+wBBBDpjmQkRE3e1Xtmn+EP2w==@lists.freedesktop.org X-Gm-Message-State: AOJu0YwvGLcSwUnRwyDpjfYCrHViFrgTQP7aV5wgtiQLla9bN8tF8+gl GmBpt2hsFcrdi5lBrqH33PZsuLbOPDJbKCppPk7Eoi2p01+m9wRQs4MGnL+UT44= X-Google-Smtp-Source: AGHT+IGidCLS066LrUOqR1DbKPT3XWtfavESnDIt+QH0VMlb2OPfJIQhJs8yE3Gdbqll+XiV8FfpBQ== X-Received: by 2002:adf:fd0a:0:b0:374:c5e9:623e with SMTP id ffacd0b85a97d-374c9471877mr2373948f8f.43.1725281442825; Mon, 02 Sep 2024 05:50:42 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:5485:d4b2:c087:b497]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42bb6e273e3sm136525265e9.30.2024.09.02.05.50.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Sep 2024 05:50:42 -0700 (PDT) Date: Mon, 2 Sep 2024 14:50:40 +0200 From: Daniel Vetter To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: Matthew Brost , Daniel Vetter , Thomas =?iso-8859-1?Q?Hellstr=F6m?= , intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, airlied@gmail.com, matthew.auld@intel.com, daniel@ffwll.ch, "Paneer Selvam, Arunpravin" Subject: Re: [RFC PATCH 23/28] drm/xe: Add SVM VRAM migration Message-ID: References: <20240828024901.2582335-1-matthew.brost@intel.com> <20240828024901.2582335-24-matthew.brost@intel.com> <368ee71bd5e39d4e26947de9cc417f4abe8d1f3b.camel@linux.intel.com> <5043c12a-e44a-416d-b2ce-70c07609f25e@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5043c12a-e44a-416d-b2ce-70c07609f25e@amd.com> X-Operating-System: Linux phenom 6.9.12-amd64 X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Mon, Sep 02, 2024 at 01:01:45PM +0200, Christian König wrote: > Am 30.08.24 um 00:12 schrieb Matthew Brost: > > On Thu, Aug 29, 2024 at 01:02:54PM +0200, Daniel Vetter wrote: > > > On Thu, Aug 29, 2024 at 11:53:58AM +0200, Thomas Hellström wrote: > > > > But as Sima pointed out in private communication, exhaustive eviction > > > > is not really needed for faulting to make (crawling) progress. > > > > Watermarks and VRAM trylock shrinking should suffice, since we're > > > > strictly only required to service a single gpu page granule at a time. > > > > > > > > However, ordinary bo-based jobs would still like to be able to > > > > completely evict SVM vram. Whether that is important enough to strive > > > > for is ofc up for discussion. > > > My take is that you don't win anything for exhaustive eviction by having > > > the dma_resv somewhere in there for svm allocations. Roughly for split lru > > > world, where svm ignores bo/dma_resv: > > > > > > When evicting vram from the ttm side we'll fairly switch between selecting > > > bo and throwing out svm pages. With drm_exec/ww_acquire_ctx selecting bo > > > will eventually succeed in vacuuming up everything (with a few retries > > > perhaps, if we're not yet at the head of the ww ticket queue). > > > > > > svm pages we need to try to evict anyway - there's no guarantee, becaue > > > the core mm might be holding temporary page references (which block > > Yea, but think you can could kill the app then - not suggesting we > > should but could. To me this is akin to a CPU fault and not being able > > to migrate the device pages - the migration layer doc says when this > > happens kick this to user space and segfault the app. > > That's most likely a bad idea. That the core holds a temporary page > reference can happen any time without any bad doing from the application. > E.g. for direct I/O, swapping etc... > > So you can't punish the application with a segfault if you happen to not be > able to migrate a page because it has a reference. See my other reply, it even happens as a direct consequence of a 2nd thread trying to migrate the exact same page from vram to sram. And that really is a core use case. RESo yeah, we really can't SIGBUS on this case. -Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch