From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EECA23B634 for ; Thu, 26 Jun 2025 22:10:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750975833; cv=none; b=AuPmIh5/egrcP3+Q1Nbeia4OeZczzvP/jLri5uPvzW254eNvD77xNSVQrcrpjBHjqVv5kFnJt9QfnRLCCKRzbcT3kg3farlA5Y+iLs3+JRAkEYAnCSLSmX7Qad9o6FGxSnVYe8uRfMwj5cUE5tmcxx85dz1VLPgVhhtKzZacXjs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750975833; c=relaxed/simple; bh=ggSirxR1drZ9XPlya+FuL7vbw55I9G/zrP21CglzSe4=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ihFtWxapF512629k59jJM0ljGuL/n+lCo+XGTM0dl9bVQmLDUelnj9g+frilSEpWkbW1+0hweaCBrAvuy3CO+Ami+On0CZZvB5P4bed8kfWFEXFN8SC2aXhu2svu9ukdvINKZWLLTaoLHv3RAWqZNEu0p3Hy93OipT6ctHFyDWg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=RhG6yrMv; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RhG6yrMv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750975830; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A2yDSqRpxx8M4NdtsCvKAVYY1i7sxWqZwfwf6PNsQv4=; b=RhG6yrMvwuhCJoRN7mzR8ICrtFG09LEc1259YcPEPlgZ1vIEQ4nmYI7T6VxX22J21TL6II ea7kCL/BVf/SdFKFXQ3uHYbXTomoNN25Lf0+/Id9SOuqS7B1Q5Y7P/0GvMPDSGx7HKifLm NiK0pysdAUtO6pt0Jd6+SuVklOqtm+Q= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-582-xNs2sVbjOaGYZPJRqAGL0A-1; Thu, 26 Jun 2025 18:10:28 -0400 X-MC-Unique: xNs2sVbjOaGYZPJRqAGL0A-1 X-Mimecast-MFC-AGG-ID: xNs2sVbjOaGYZPJRqAGL0A_1750975828 Received: by mail-io1-f70.google.com with SMTP id ca18e2360f4ac-86cff09d29dso29353939f.3 for ; Thu, 26 Jun 2025 15:10:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750975828; x=1751580628; h=content-transfer-encoding:mime-version:organization:references :in-reply-to:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=A2yDSqRpxx8M4NdtsCvKAVYY1i7sxWqZwfwf6PNsQv4=; b=juteYCdDH+Vd/3tqQUmiYkQ2jwocWlM79AcrDqC0RHjIVxz/dl4qdldPcEwSYgGocB /ULmgKhgkrU+aQqEk1HgtNUluchl/uheRh+fILUnq6QxpsFm193RUlm76FFZFTM13Y2c tFaoNX2JXq69y6ZEzcOP3jG07AvkhweJyC1aOXRveaXTP4K46vJnLwxe6UTfpgfZuWYE iQt6Gfpt7KJ4ZTpyIFAYxg6Eruv6XkBWAQhGPFYUXLXxjukQ5g5sThEbw6swJ6mSCvGN ZRgAmTRfdPFg1b+CpytnyiHL5fClyFp85+WDWwocAlBTqvw82PjDOuc89N4ovyZ2IN7Z f58w== X-Forwarded-Encrypted: i=1; AJvYcCXRxjHGU9ZBWkqzAXdzwn1u2e7ZHi8/76xEKOmTK9lhoqWKsJfLiIbwEE9a+Fd95ZGy21M=@vger.kernel.org X-Gm-Message-State: AOJu0YyhLcT5fLCrZXU+vlXp796FtimikNbXlMI53H3N029Q5kzwtOVX u5G0sbJHjfIuQUK48woLK2U1mGUpSchDV6WHx78pJvCiPPu82gbpSaJFI5GDJPXUIWY7pUI+UK1 UBGiDZI2LNiIbpCtfEfSZ46bzONwGdBfNnNabyxL10CXsTTDcJXKJQw== X-Gm-Gg: ASbGncs/JHp60p1c0HHCmFDGUjf9dGlcfFq/N1nT7JS78FQFYbwjBDyjfNT5mrs3e4I u5mXbV3W3bVu1IW8tYRWlbWKKnIecVG4o19Snz5tjvoXo9DWR9/WQNtMV5JUhSHdeIPDgFQcZAu jjoPzLqNGzMhiuT/DS5/OzJOiamPFpgCaT4zbUh98kaP643Kzaund9zZZZSaQR+flm2ocIoUxp/ tEa1PNPggjEcPAvzifG1cx3dxj/01MZ0KRNAWiG5HiPJGnwVhFr75F1+lKCGBXg9xddKkDMtFjD u+H3TmQoS9WZDJnvFUAPlIWi9g== X-Received: by 2002:a05:6e02:3804:b0:3dd:ba33:be80 with SMTP id e9e14a558f8ab-3df4aba855fmr4629535ab.4.1750975828049; Thu, 26 Jun 2025 15:10:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHDoE0TlTM0qzwfE3zrVHD12SB2JExgfCv/r6GbE0jkmdkz74Z39BQxLxcfQe+RznzbpQ27yg== X-Received: by 2002:a05:6e02:3804:b0:3dd:ba33:be80 with SMTP id e9e14a558f8ab-3df4aba855fmr4629365ab.4.1750975827556; Thu, 26 Jun 2025 15:10:27 -0700 (PDT) Received: from redhat.com ([38.15.36.11]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-50204adc2b1sm185252173.134.2025.06.26.15.10.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Jun 2025 15:10:26 -0700 (PDT) Date: Thu, 26 Jun 2025 16:10:25 -0600 From: Alex Williamson To: Aaron Lewis Cc: Jason Gunthorpe , bhelgaas@google.com, dmatlack@google.com, vipinsh@google.com, kvm@vger.kernel.org, seanjc@google.com, jrhilke@google.com Subject: Re: [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Message-ID: <20250626161025.7c407e70.alex.williamson@redhat.com> In-Reply-To: References: <20250626180424.632628-1-aaronlewis@google.com> <20250626132659.62178b7d.alex.williamson@redhat.com> <20250626205608.GO167785@nvidia.com> Organization: Red Hat Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu, 26 Jun 2025 14:58:54 -0700 Aaron Lewis wrote: > On Thu, Jun 26, 2025 at 1:56=E2=80=AFPM Jason Gunthorpe = wrote: > > > > On Thu, Jun 26, 2025 at 01:26:59PM -0600, Alex Williamson wrote: =20 > > > On Thu, 26 Jun 2025 18:04:21 +0000 > > > Aaron Lewis wrote: > > > =20 > > > > This series is being sent as an RFC to help brainstorm the best way= to > > > > fix a latency issue it uncovers. > > > > > > > > The crux of the issue is that when initializing multiple VFs from t= he > > > > same PF the devices are reset serially rather than in parallel > > > > regardless if they are initialized from different threads. That ha= ppens > > > > because a shared lock is acquired when vfio_df_ioctl_bind_iommufd()= is > > > > called, then a FLR (function level reset) is done which takes 100ms= to > > > > complete. That in combination with trying to initialize many devic= es at > > > > the same time results in a lot of wasted time. > > > > > > > > While the PCI spec does specify that a FLR requires 100ms to ensure= it > > > > has time to complete, I don't see anything indicating that other VFs > > > > can't be reset at the same time. > > > > > > > > A couple of ideas on how to approach a fix are: > > > > > > > > 1. See if the lock preventing the second thread from making forwa= rd > > > > progress can be sharded to only include the VF it protects. =20 > > > > > > I think we're talking about the dev_set mutex here, right? I think t= his > > > is just an oversight. The original lock that dev_set replaced was > > > devised to manage the set of devices affected by the same bus or slot > > > reset. I believe we've held the same semantics though and VFs just > > > happen to fall through to the default of a bus-based dev_set. > > > Obviously we cannot do a bus or slot reset of a VF, we only have FLR, > > > and it especially doesn't make sense that VFs on the same "bus" from > > > different PFs share this mutex. =20 > > > > It certainly could be.. But I am feeling a bit wary and would want to > > check this carefully. We ended up using the devset for more things - > > need to check where everything ended up. > > > > Off hand I don't recall any reason why the VF should be part of the > > dev set.. > > =20 > > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio= _pci_core.c > > > index 6328c3a05bcd..261a6dc5a5fc 100644 > > > --- a/drivers/vfio/pci/vfio_pci_core.c > > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > > @@ -2149,7 +2149,7 @@ int vfio_pci_core_register_device(struct vfio_p= ci_core_device *vdev) > > > return -EBUSY; > > > } > > > > > > - if (pci_is_root_bus(pdev->bus)) { > > > + if (pci_is_root_bus(pdev->bus) || pdev->is_virtfn) { =20 >=20 > That fixes the issue. I applied this patch and reran the test and > both threads finish in ~100ms now! We are now getting fully > parallelized resets! >=20 > [0x7fd12afc0700] '0000:17:0c.2' initialized in 102.3ms. > [0x7fd12b7c1700] '0000:17:0c.1' initialized in 102.4ms. >=20 > > > ret =3D vfio_assign_device_set(&vdev->vdev, vdev); > > > } else if (!pci_probe_reset_slot(pdev->slot)) { > > > ret =3D vfio_assign_device_set(&vdev->vdev, pdev->slot); > > > > > > Does that allow fully parallelized resets? =20 > > > > I forget all the details but if we are sure the reset of a VF is only > > the VF then that does seem like the right direction > > > > Jason =20 >=20 > Alex, would you like me to include your patch in this series so we > have the fix and test together, or would you prefer to propose the fix > yourself? Either way works for me. I think it's best not to queue the fix behind the selftests when there's no code dependency. I'll send out the patch and hopefully Jason can chew on whether I'm overlooking something. This mutex has basically become our defacto serialization mutex, but I think that hot reset remains the only multi-device serialization problem. Thanks, Alex