From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8802754765 for ; Mon, 20 Jan 2025 03:12:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737342761; cv=none; b=Vkm8AtURiWEpsW9rfezO7aGqrOgUf1ETp1aZjz74zGn7WnPOs9BlidmRK3td0s8xFDg/08lbeKkL3fEpiXPB3lt0WcLFOUriJ1SAi+3jdIl/Xcs1yKIMYMLuYp7AgZMFUvzN0xph4xVP2NVqDl8He0P2QWqLtKgfBOMbHK+2UH4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737342761; c=relaxed/simple; bh=tq7myIfl3TcZgZZJ3sRivpjIxH2CZKY8mwVTxxhjyPk=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Zif6Ni8LU/dIAcIk3PM2rfLZKZXsm5kci76UYewM9pcw46wHYm/b7trnj+9/SoOMsQGro9ilV6iiqwwWKV0PISOCPewpUMS95YRZrzQKOU4qt4tT51iAE3u76rHKiiACgJSvDnnq/AwU8TpkFRbo5zeoLbdyTSM1lMhKZm9COns= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=P1xSZA1u; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P1xSZA1u" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737342758; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8SSz2m2UHXoywkbuHm0D2iZ3hzZMwymukCUceWBx0+g=; b=P1xSZA1uBBaOoLiQVSda1R/fI3zPrOJg3w6YsXwP7LEYu9iiXbel2utHzaxbSWg/N/9gdm ayNaww0f8ury+HRhux/5LIRQ3my3dsr/kq8BXGONUkWDogezYIENsBJrFJjsv0BxMZx3cJ ZNL1e10a3z0VAXgerNEqm0/UoNcOUKU= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-433-BrzzBxGnM3iwLMc4U4IgHA-1; Sun, 19 Jan 2025 22:12:36 -0500 X-MC-Unique: BrzzBxGnM3iwLMc4U4IgHA-1 X-Mimecast-MFC-AGG-ID: BrzzBxGnM3iwLMc4U4IgHA Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-3ce865eff29so3674045ab.1 for ; Sun, 19 Jan 2025 19:12:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737342756; x=1737947556; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=8SSz2m2UHXoywkbuHm0D2iZ3hzZMwymukCUceWBx0+g=; b=KWIg0Ou0S4Y/iQHa5kTTxd0zmpw2+YvLCHRpw2fCN8R+cv1c54FD9gciItErmOgUk6 2TBqM5r0OGyXdvCi2b98SS9GIHSZ5J2w6/RUkwacSniVECuxLXhwCb1VUegHt3b1d02V XgtD0wEqxGE3yKzD8z73RvsWxGPSyGBZ4zwCXrRg1WcNL8zm9nTB7UtviyvMuAvr5+h5 COpeCX3+/DpWOsTSsm6khfaYPe4GjimUh8+eey8bpzsXsMjRqhquawdKZuT8JfoAi4Kt D/xbvmDtaf9PvCi5sQ5cA+qkeZSvhtC3oFnUJmpyKJjEset9t7aonHLkcaanh7N3QkAw w2nw== X-Forwarded-Encrypted: i=1; AJvYcCXZIpFeGMtesfTg/1McUNNwnWm90EBbxBmYrZaFFxCe1iRPJM8lp5ty3nEpgbGs5ASx7HdTQ1Iroykob/Y=@vger.kernel.org X-Gm-Message-State: AOJu0YxGr6nUa3oKzGiev2e/A0BhcBoWLONRHZgCtlhfK2OOboi8dlzA BGyjBHvgtQ5q5ZmKatZpYI0aGuF9ZPqKzEvlHFD/ZtPP1ZlMSKZlhh41h+e0YKXbASNdFICO8j+ sYcgI0iznNq3Z5neLg38yBvNQciIlhGml9dffJVuM/OaL0j9zGymEEGDbm4wIEQ== X-Gm-Gg: ASbGncuTIr7DAnhgFDjHF2SZrn4J5Z6B308S6lHj075+7H+hmY1rLKEHEsEhpo5CM3d NtIaMRp0rHIRaF9s/B7ZwMfEZTwQc/NLbtXnEQqN6QFad9eLTKnq2fN+VfUMFWF2KsnufpVeWpM 21ck5mTWYZAYvfIMzocVx7VBVU01PAldPQwBQGc8j6RIwkoK9ouhpK9RSnuJNvgjrWcZYORlrUk 2rprKxycKlH8jYf6g+4gwvGKXAlhOxOtoWt9FcByM/VC4G5T89Ubv/5u7pWc8cwHXoJ5rlmiA== X-Received: by 2002:a05:6e02:3cc2:b0:3a7:bc95:bae5 with SMTP id e9e14a558f8ab-3cf744be621mr24525585ab.5.1737342756128; Sun, 19 Jan 2025 19:12:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IHNTBZIaFLTE7PuNCZzva1bLy+WZP4NoUoy1CTG9U8NFfZDRhz1B2bCtRhjagGm2bC0t7ZdKg== X-Received: by 2002:a05:6e02:3cc2:b0:3a7:bc95:bae5 with SMTP id e9e14a558f8ab-3cf744be621mr24525545ab.5.1737342755869; Sun, 19 Jan 2025 19:12:35 -0800 (PST) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ea7566e096sm2223384173.116.2025.01.19.19.12.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jan 2025 19:12:35 -0800 (PST) Date: Sun, 19 Jan 2025 20:12:32 -0700 From: Alex Williamson To: Ankit Agrawal Cc: Jason Gunthorpe , Yishai Hadas , "shameerali.kolothum.thodi@huawei.com" , "kevin.tian@intel.com" , Zhi Wang , Aniket Agashe , Neo Jia , Kirti Wankhede , "Tarun Gupta (SW-GPU)" , Vikram Sethi , Andy Currid , Alistair Popple , John Hubbard , Dan Williams , "Anuj Aggarwal (SW-GPU)" , Matt Ochs , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status Message-ID: <20250119201232.04af85b2.alex.williamson@redhat.com> In-Reply-To: References: <20250117233704.3374-1-ankita@nvidia.com> <20250117233704.3374-4-ankita@nvidia.com> <20250117205232.37dbabe3.alex.williamson@redhat.com> Organization: Red Hat Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, 20 Jan 2025 02:24:14 +0000 Ankit Agrawal wrote: > >> +EXPORT_SYMBOL_GPL(vfio_pci_memory_lock_and_enable); > >> > >>=C2=A0 void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_dev= ice *vdev, u16 cmd) > >>=C2=A0 { > >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pci_write_config_word(vdev->pdev, = PCI_COMMAND, cmd); > >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 up_write(&vdev->memory_lock); > >>=C2=A0 } > >> +EXPORT_SYMBOL_GPL(vfio_pci_memory_unlock_and_restore); > >> > >>=C2=A0 static unsigned long vma_to_pfn(struct vm_area_struct *vma) > >>=C2=A0 { =20 > > > > The access is happening before the device is exposed to the user, the > > above are for handling conditions while there may be races with user > > access, this is totally unnecessary. =20 >=20 > Right. What I could do to reuse the code is to take out the part > related to locking/unlocking as new functions and export that. > The current vfio_pci_memory_lock_and_enable() would take the lock > and call the new function. Same for vfio_pci_memory_unlock_and_restore(). > The nvgrace module could also call that new function. Does that sound > reasonable? No, this is standard PCI driver stuff, everything you need is already there. Probably pci_enable_device() and some variant of pci_request_regions(). > > Does this delay even need to happen in the probe function, or could it > > happen in the open_device callback?=C2=A0 That would still be before us= er > > access, but if we expect it to generally work, it would allow the > > training to happen in the background up until the user tries to open > > the device.=C2=A0 Thanks, > > > > Alex =20 >=20 > The thought process is that since it is purely bare metal coming to proper > state while boot, the nvgrace module should probably wait for the startup > to complete during probe() instead of delaying until open() time. If the driver is statically loaded, that might mean you're willing to stall boot for up to 30s. In practice is this ever actually going to fail? Thanks, Alex