From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A89A912D1F1 for ; Mon, 20 Jan 2025 03:22:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737343382; cv=none; b=N3CpCzDh+RJpPbI3zfFRqe78aXxdORaJyCxFVesBKHGbgZNdv7M/bt1LtWZZ3gr0T0FgL3IMC+KEmqy+YJWNb3cPTfguwyWUGTjkM30ynm2WY3awrOOyQjjRj4bzXXWLBMs8nYVZcDWw5Amgu+zMfp4zA8yUbs8rYFVpPYBE6YU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737343382; c=relaxed/simple; bh=F6ciFg3hAuwdX0ZKemRnI7x9YzSzb9QiCt4f93x+6OI=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OOuBXv77Fz7n89eK5KWCm9eGOvi5AijK23cJBlClN50RN/UGCvXVdPIS7KsfjB6jXViFsMtC513TQz3V7v/VbbxcDltX8qJNmciCyJtdOPFHUp6vRaAo05+JyUEtTmv1rxWSB6VUuEC1Z6A2BPr9CvFg4zgLSnch1UnUzhoTz3s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=IUp1giwJ; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IUp1giwJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737343378; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pwwl19AaaKu4HzuuwrsgT5HElZKZa/OIvc4KEOt9e80=; b=IUp1giwJ7qtcJSp4lSmFmaHNwDZ6ODPGLDpb9Dz/hn0yauBt1qKbt90yhD3l6HkWPrB09p MxBDIWcMOFX5qtxUUyChxH1dUJ4mo6DnjJJjVEpjCF1Ojf0E88rJRVRJKh9EWYGYL9BHhn mS/6QF0suIAXE+7f8areEt/iOq+6Fmw= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-612-cZXBi5JcPIiM2XkF94RF9A-1; Sun, 19 Jan 2025 22:22:56 -0500 X-MC-Unique: cZXBi5JcPIiM2XkF94RF9A-1 X-Mimecast-MFC-AGG-ID: cZXBi5JcPIiM2XkF94RF9A Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-3ce8ab703e4so3545355ab.0 for ; Sun, 19 Jan 2025 19:22:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737343376; x=1737948176; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=pwwl19AaaKu4HzuuwrsgT5HElZKZa/OIvc4KEOt9e80=; b=MALCXlw8KIhhby3Fif4Z05KjeEgYW32+Ok/0ZW4M6eqeQTdfIG7LNRG4gMtrUtG3Ic 03HuSnGbJfsA0ugHJlIvbWkEY++QcfJhDLndzAk17j2u6zwzKRx1eq/jmFz6bCQjCDaj QrFmU/N59MukfhDwi0CXwEexGvcZRoRXY9cFokbSGm9yPchJgrxUIeWxxusPPO0iUtId 1oj/Xg4bxnNXIQsl0GIoaXLxMmwHOedhgqJxb69guxrmcRGBDjzHfsV3Wjsv8cQ2H0NT NCRmdfk/AK3ByJXKWJDDKmsZwystacvC6uMu/BMtmnQuZeO2+RbQdTiOWrjaOxKromf3 dfAQ== X-Forwarded-Encrypted: i=1; AJvYcCVb7d6ESUgXrkpykdqNK89qN1p/fvPJnES5VqEak3t+brEgXFBBrURvf1N62ZeCfUTwjEdcsbwzdUKutOU=@vger.kernel.org X-Gm-Message-State: AOJu0YzOfSN02Sz+QWMMjjmk0bVlj4XY9Qx+7JcYCzg/n5Qp3Q0Dao0m Ji1FR67ZboCJ1tv7Whu36FnmrXV+Nw1qy2sFTR6Vg78j4MNGgVO2kQ5YpG5z18CBKoicgXt7fzk UdBsdCINe+pT3uunxHLiPWrokt5BAWZkUJngua60cqhpAMcU3Typg0hJJ4y+p6g== X-Gm-Gg: ASbGncszJoSvvPq7Y8YgrvRUpDfkCaRzRWfNy9wrn3SKhN+OPvw/AaWVTqrB4qvCBS9 aZUgVminZx3CEDxa08HZp5SnwZ5wn3KDmaFzH3hyHSViJuc1bSQDeSG/nmtQ18NQQlVTyvlGnVl d4+u+zs1lH4kMDFlCRg0/2cC01hP9rCOXsq4BTyfcfy0itNZarESdLStP4t/EMFIGaNajWaWyNV y3oTl3zOX52yiwTbO7oGVOHw+hHE5ddZXhSAbx+JsVe8hg8Jc2KqJTAJ0FdtM23wOMG1BDJcg== X-Received: by 2002:a92:c24b:0:b0:3ce:7881:8e4f with SMTP id e9e14a558f8ab-3cf74491fc3mr23599235ab.4.1737343376248; Sun, 19 Jan 2025 19:22:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IHFoy+f7ncR9DdVnV/msIVRo2MH5BTJNYOLGY5guqVOfx8Pb5teqSByGFWH5WOALwTRkCpuUQ== X-Received: by 2002:a92:c24b:0:b0:3ce:7881:8e4f with SMTP id e9e14a558f8ab-3cf74491fc3mr23599155ab.4.1737343375891; Sun, 19 Jan 2025 19:22:55 -0800 (PST) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ea753f65e0sm2220375173.13.2025.01.19.19.22.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jan 2025 19:22:54 -0800 (PST) Date: Sun, 19 Jan 2025 20:22:52 -0700 From: Alex Williamson To: Ankit Agrawal Cc: Jason Gunthorpe , Yishai Hadas , "shameerali.kolothum.thodi@huawei.com" , "kevin.tian@intel.com" , Zhi Wang , Aniket Agashe , Neo Jia , Kirti Wankhede , "Tarun Gupta (SW-GPU)" , Vikram Sethi , Andy Currid , Alistair Popple , John Hubbard , Dan Williams , "Anuj Aggarwal (SW-GPU)" , Matt Ochs , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status Message-ID: <20250119202252.4fcd2c49.alex.williamson@redhat.com> In-Reply-To: <20250119201232.04af85b2.alex.williamson@redhat.com> References: <20250117233704.3374-1-ankita@nvidia.com> <20250117233704.3374-4-ankita@nvidia.com> <20250117205232.37dbabe3.alex.williamson@redhat.com> <20250119201232.04af85b2.alex.williamson@redhat.com> Organization: Red Hat Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, 19 Jan 2025 20:12:32 -0700 Alex Williamson wrote: > On Mon, 20 Jan 2025 02:24:14 +0000 > Ankit Agrawal wrote: >=20 > > >> +EXPORT_SYMBOL_GPL(vfio_pci_memory_lock_and_enable); > > >> > > >>=C2=A0 void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_d= evice *vdev, u16 cmd) > > >>=C2=A0 { > > >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pci_write_config_word(vdev->pdev= , PCI_COMMAND, cmd); > > >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 up_write(&vdev->memory_lock); > > >>=C2=A0 } > > >> +EXPORT_SYMBOL_GPL(vfio_pci_memory_unlock_and_restore); > > >> > > >>=C2=A0 static unsigned long vma_to_pfn(struct vm_area_struct *vma) > > >>=C2=A0 { =20 > > > > > > The access is happening before the device is exposed to the user, the > > > above are for handling conditions while there may be races with user > > > access, this is totally unnecessary. =20 > >=20 > > Right. What I could do to reuse the code is to take out the part > > related to locking/unlocking as new functions and export that. > > The current vfio_pci_memory_lock_and_enable() would take the lock > > and call the new function. Same for vfio_pci_memory_unlock_and_restore(= ). > > The nvgrace module could also call that new function. Does that sound > > reasonable? =20 >=20 > No, this is standard PCI driver stuff, everything you need is already > there. Probably pci_enable_device() and some variant of > pci_request_regions(). >=20 > > > Does this delay even need to happen in the probe function, or could it > > > happen in the open_device callback?=C2=A0 That would still be before = user > > > access, but if we expect it to generally work, it would allow the > > > training to happen in the background up until the user tries to open > > > the device.=C2=A0 Thanks, > > > > > > Alex =20 > >=20 > > The thought process is that since it is purely bare metal coming to pro= per > > state while boot, the nvgrace module should probably wait for the start= up > > to complete during probe() instead of delaying until open() time. =20 >=20 > If the driver is statically loaded, that might mean you're willing to > stall boot for up to 30s. In practice is this ever actually going to > fail? Thanks, On second thought, I guess a vfio-pci variant driver can't automatically bind to a device, whether statically built or not, so maybe this isn't a concern. I'm not sure if there are other concerns with busy waiting for up to 30s at driver probe. Thanks, Alex