From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 525D4C77B72 for ; Fri, 14 Apr 2023 17:10:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7A49F10EE32; Fri, 14 Apr 2023 17:10:52 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8379D10EE32 for ; Fri, 14 Apr 2023 17:10:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681492249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K34W6RnfQ6nKdpvXaQKTZxAcJlVcKYDwWEBLIMPWzBs=; b=IlTEwJOM6Xv1iF7kUiKEPFxjOZBNTrflypNvomNO3/xXxuVe2HGga2JD2OaPmctCq6AzQp 2sJRD9yFmphipzgEt4h+TPoNYlXOrbtutaPuTBBj93dV/jyOEqgXD0a55b8Mroz4thDRuc 5nyk8PejTw2FbFRWZX2lkPRY9Lk4sNs= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-194-OfNiJsNrO82R8w-piiTP_w-1; Fri, 14 Apr 2023 13:10:48 -0400 X-MC-Unique: OfNiJsNrO82R8w-piiTP_w-1 Received: by mail-io1-f72.google.com with SMTP id b190-20020a6bb2c7000000b00760a35f250aso4388770iof.4 for ; Fri, 14 Apr 2023 10:10:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681492247; x=1684084247; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=K34W6RnfQ6nKdpvXaQKTZxAcJlVcKYDwWEBLIMPWzBs=; b=NDh7PWLSU+HuuDVAHffC2bptBxHlvw2BQO6m0j1TuKn2uG4Em4pi3cUYIJ8hH5vqEa trO3eSe2GdkLwCPHJz9pA22a2kHBbEF0GhK60ZcV5thCLA3O85Ysoz7aPLtFV8E4t+5b OgZZ0iCOvHkNkmexbiDEL2uuuYzF393phTYJdNDhdoYBxokVrkfIX0HhxCViCh7J17qw rthgnabIo6q4M+LbxSJeW7Et5FxVyyhCd4V2WcZsKG0Tz6X3S9dDY0vMxftklnr23VfN pqNmmFcI++6t344rtP7KGojGVgOOD3PQc0+KWLmdZeISkb3W1CYmr16mSWwqWi0gSJLZ 7vZg== X-Gm-Message-State: AAQBX9f0kDrCnmaTVmR9YIZQsr1o4Jipf2BUs6hYUboW1PPYe05IjIiF ShwQApkcyCkOhgh99IdMM802uid6N0ti1hZcwz4blK7bgo7Mg2qb98MwHtZke+zYt9r4ggHmjVv B0MgWj9Ow7KkZBNSF+JYf3rS3vC5f X-Received: by 2002:a05:6602:48b:b0:760:a0be:e63c with SMTP id y11-20020a056602048b00b00760a0bee63cmr7725254iov.4.1681492247470; Fri, 14 Apr 2023 10:10:47 -0700 (PDT) X-Google-Smtp-Source: AKy350ZcHLvlPpmCixL+MNuyf2o//jjEeGR7snNUTvyb2rQNazK5gJMh2gLv10tmHPumFeHdoAM1QA== X-Received: by 2002:a05:6602:48b:b0:760:a0be:e63c with SMTP id y11-20020a056602048b00b00760a0bee63cmr7725218iov.4.1681492247165; Fri, 14 Apr 2023 10:10:47 -0700 (PDT) Received: from redhat.com ([38.15.36.239]) by smtp.gmail.com with ESMTPSA id z22-20020a05660200d600b0074ca5ac5037sm1270271ioe.26.2023.04.14.10.10.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Apr 2023 10:10:46 -0700 (PDT) Date: Fri, 14 Apr 2023 11:10:43 -0600 From: Alex Williamson To: "Liu, Yi L" Message-ID: <20230414111043.40c15dde.alex.williamson@redhat.com> In-Reply-To: References: <20230406115347.7af28448.alex.williamson@redhat.com> <20230411095417.240bac39.alex.williamson@redhat.com> <20230411111117.0766ad52.alex.williamson@redhat.com> <20230411155827.3489400a.alex.williamson@redhat.com> <20230412105045.79adc83d.alex.williamson@redhat.com> <20230413120712.3b9bf42d.alex.williamson@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "mjrosato@linux.ibm.com" , "jasowang@redhat.com" , "Hao, Xudong" , "Duan, Zhenzhong" , "peterx@redhat.com" , "Xu, Terrence" , "chao.p.peng@linux.intel.com" , "linux-s390@vger.kernel.org" , "kvm@vger.kernel.org" , "lulu@redhat.com" , "Jiang, Yanting" , "joro@8bytes.org" , "nicolinc@nvidia.com" , Jason Gunthorpe , "Zhao, Yan Y" , "intel-gfx@lists.freedesktop.org" , "eric.auger@redhat.com" , "intel-gvt-dev@lists.freedesktop.org" , "yi.y.sun@linux.intel.com" , "cohuck@redhat.com" , "shameerali.kolothum.thodi@huawei.com" , "suravee.suthikulpanit@amd.com" , "robin.murphy@arm.com" Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Fri, 14 Apr 2023 11:38:24 +0000 "Liu, Yi L" wrote: > > From: Tian, Kevin > > Sent: Friday, April 14, 2023 5:12 PM > > > > > From: Alex Williamson > > > Sent: Friday, April 14, 2023 2:07 AM > > > > > > We had already iterated a proposal where the group-id is replaced with > > > a dev-id in the existing ioctl and a flag indicates when the return > > > value is a dev-id vs group-id. This had a gap that userspace cannot > > > determine if a reset is available given this information since un-owned > > > devices report an invalid dev-id and userspace can't know if it has > > > implicit ownership. > > > > > > > > It seems cleaner to me though that we would could still re-use INFO in > > > a similar way, simply defining a new flag bit which is valid only in > > > the case of returning dev-ids and indicates if the reset is available. > > > Therefore in one ioctl, userspace knows if hot-reset is available > > > (based on a kernel determination) and can pull valid dev-ids from the > > Need to confirm the meaning of hot-reset available flag. I think it > should at least meet below two conditions to set this flag. Although > it may not mean hot-reset is for sure to succeed. (but should be > a high chance). > > 1) dev_set is resettable (all affected device are in dev_set) > 2) affected device are owned by the current user Per thread with Kevin, ownership can't always be known by the kernel. Beyond the group vs cdev discussion there, isn't it also possible (though perhaps not recommended) that a user can have multiple iommufd ctxs? So I think 2) becomes "ownership of the affected dev-set can be inferred from the iommufd_ctx of the calling device", iow, the null-array calling model is available and the flag is redefined to match. Reset may still be available via the proof-of-ownership model. > Also, we need to has assumption that below two cases are rare > if user encounters it, it just bad luck for them. I think the existing > _INFO and hot-reset already has such assumption. So cdev mode > can adopt it as well. > > a) physical topology change (e.g. new devices plugged to affected slot) > b) an affected device is unbound from vfio Yes, these are sufficiently rare that we can't do much about them. > > So the kernel needs to compare the group id between devices with > > valid dev-ids and devices with invalid dev-ids to decide the implicit > > ownership. For noiommu device which has no group_id when > > VFIO_GROUP is off then it's resettable only if having a valid dev_id. > > In cdev mode, noiommu device doesn't have dev_id as it is not > bound to valid iommufd. So if VFIO_GROUP is off, we may never > allow hot-reset for noiommu devices. But we don't want to have > regression with noiommu devices. Perhaps we may define the usage > of the resettable flag like this: > 1) if it is set, user does not need to own all the affected devices as > some of them may have been owned implicitly. Kernel should have > checked it. > 2) if the flag is not set, that means user needs to check ownership > by itself. It needs to own all the affected devices. If not, don't > do hot-reset. Exactly, the flag essentially indicates that the null-array approach is available, lack of the flag indicates proof-of-ownership is required. > This way we can still make noiommu devices support hot-reset > just like VFIO_GROUP is on. Because noiommu devices have fake > groups, such groups are all singleton. So checking all affected > devices are opened by user is just same as check all affected > groups. Yep. > > The only corner case with this option is when a user mixes group > > and cdev usages. iirc you mentioned it's a valid usage to be supported. > > In that case the kernel doesn't have sufficient knowledge to judge > > 'resettable' as it doesn't know which groups are opened by this user. > > > > Not sure whether we can leave it in a ugly way so INFO may not tell > > 'resettable' accurately in that weird scenario. > > This seems not easy to support. If above scenario is allowed there can be > three cases that returns invalid dev_id. > 1) devices not opened by user but owned implicitly The cdev approach has a hard time with this in general, it has no way to represent unopened devices. so any case where the nature of an unopened device block reset on the dev-set is rather opaque to the user. > 2) devices not owned by user (and presumable not owned) We still provide BDF. Not much difference from the group case here, being able to point to a BDF or group is about all we can do. > 3) devices opened via group but owned by user I think this still works in the proof-of-ownership, passing fds to hot-reset model. > User would require more info to tell the above cases from each other. Obviously we could be equivalent to the group model if IOMMU groups were exposed for a device and all devices had IOMMU groups, but reasons... > > > array to associate affected, owned devices, and still has the > > > equivalent information to know that one or more of the devices listed > > > with an invalid dev-id are preventing the hot-reset from being > > > available. > > > > > > Is that an option? Thanks, > > > > > > > This works for me if above corner case can be waived. > > One side check, perhaps already confirmed in prior email. @Alex, So > the reason for the prediction of hot-reset is to avoid the possible > vfio_pci_pre_reset() which does heavy operations like stop DMA and > copy config space. Is it? Any other special reason? Anyhow, this reason > is enough for this prediction per my understanding. It's not clear to me what "prediction" is referring to. As above, I think we can redefine the reset-available flag I proposed to more restrictively indicate that the null-array approach is available based on the dev-set group in relation to the iommufd_ctx of the calling device. Prediction of the affected devices seems like basic functionality to me, we can't assume the user's usage model, they must be able to make a well informed decision regarding affected devices. Thanks, Alex