From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49240)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bZQkP-0004jw-Lz
	for qemu-devel@nongnu.org; Mon, 15 Aug 2016 18:52:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bZQkL-0000Q5-Kg
	for qemu-devel@nongnu.org; Mon, 15 Aug 2016 18:52:44 -0400
Received: from mx1.redhat.com ([209.132.183.28]:44620)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1bZQkL-0000Q0-C9
	for qemu-devel@nongnu.org; Mon, 15 Aug 2016 18:52:41 -0400
Date: Mon, 15 Aug 2016 16:52:39 -0600
From: Alex Williamson <alex.williamson@redhat.com>
Message-ID: <20160815165239.1a7e5357@t450s.home>
In-Reply-To: <20160815220930.GB14876@nvidia.com>
References: <1470251034-1555-1-git-send-email-kwankhede@nvidia.com>
	<1470251034-1555-2-git-send-email-kwankhede@nvidia.com>
	<20160809130022.67784f62@t450s.home>
	<47977462-4a68-f973-a2b8-4c40fbfc9451@nvidia.com>
	<20160812151608.0db20258@t450s.home>
	<361a1668-ca11-c46c-7bf6-19c36e703a2d@nvidia.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F944124@SHSMSX101.ccr.corp.intel.com>
	<20160815095926.1d2d6cc2@t450s.home>
	<20160815220930.GB14876@nvidia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Neo Jia <cjia@nvidia.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, Kirti Wankhede <kwankhede@nvidia.com>, "pbonzini@redhat.com" <pbonzini@redhat.com>, "kraxel@redhat.com" <kraxel@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "Song, Jike" <jike.song@intel.com>, "bjsdjshi@linux.vnet.ibm.com" <bjsdjshi@linux.vnet.ibm.com>

On Mon, 15 Aug 2016 15:09:30 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote:
> > On Mon, 15 Aug 2016 09:38:52 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > > > Sent: Saturday, August 13, 2016 8:37 AM
> > > > 
> > > > 
> > > > 
> > > > On 8/13/2016 2:46 AM, Alex Williamson wrote:    
> > > > > On Sat, 13 Aug 2016 00:14:39 +0530
> > > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > >    
> > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote:    
> > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530
> > > > >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > >>>
> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to get
> > > > >>> the parent_device so it can call the start and stop ops callbacks
> > > > >>> respectively.  That seems to imply that all of instances for a given
> > > > >>> uuid come from the same parent_device.  Where is that enforced?  I'm
> > > > >>> still having a hard time buying into the uuid+instance plan when it
> > > > >>> seems like each mdev_device should have an actual unique uuid.
> > > > >>> Userspace tools can figure out which uuids to start for a given user, I
> > > > >>> don't see much value in collecting them to instances within a uuid.
> > > > >>>    
> > > > >>
> > > > >> Initially we started discussion with VM_UUID+instance suggestion, where
> > > > >> instance was introduced to support multiple devices in a VM.    
> > > > >
> > > > > The instance number was never required in order to support multiple
> > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA
> > > > > management tools which wanted to re-use the VM UUID by creating vGPU
> > > > > devices with that same UUID and therefore associate udev events to a
> > > > > given VM.  Only then does an instance number become necessary since the
> > > > > UUID needs to be static for a vGPUs within a VM.  This has always felt
> > > > > like a very dodgy solution when we should probably just be querying
> > > > > libvirt to give us a device to VM association.    
> > > 
> > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID
> > > for mdev in the basic design. It's bound to NVIDIA management stack too tightly.
> > > 
> > > I'm OK to give enough flexibility for various upper level management stacks,
> > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better
> > > option where either UUID or STRING could be optional? Upper management 
> > > stack can choose its own policy to identify a mdev:
> > > 
> > > a) $UUID only, so each mdev is allocated with a unique UUID
> > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination 
> > > (vgpu0, vgpu1, etc.)
> > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be
> > > a numeric index
> > >   
> > > > >    
> > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of
> > > > >> all instances of similar devices assigned to VM.
> > > > >>
> > > > >> For example, to create 2 devices:
> > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> "$UUID-0" and "$UUID-1" devices are created.
> > > > >>
> > > > >> Commit resources for above devices with single 'mdev_start':
> > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Considering $UUID to be a unique UUID of a device, we don't need
> > > > >> 'instance', so 'mdev_create' would look like:
> > > > >>
> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create
> > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create
> > > > >>
> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params'
> > > > >> would be vendor specific parameters.
> > > > >>
> > > > >> Device nodes would be created as "$UUID1" and "$UUID"
> > > > >>
> > > > >> Then 'mdev_start' would be:
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start
> > > > >>
> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be:
> > > > >>
> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop    
> > > > >
> > > > > I'm not sure a comma separated list makes sense here, for both
> > > > > simplicity in the kernel and more fine grained error reporting, we
> > > > > probably want to start/stop them individually.  Actually, why is it
> > > > > that we can't use the mediated device being opened and released to
> > > > > automatically signal to the backend vendor driver to commit and release
> > > > > resources? I don't fully understand why userspace needs this interface.    
> > > 
> > > There is a meaningful use of start/stop interface, as required in live
> > > migration support. Such interface allows vendor driver to quiescent 
> > > mdev activity on source device before mdev hardware state is snapshot,
> > > and then resume mdev activity on dest device after its state is recovered.
> > > Intel has implemented experimental live migration support in KVMGT (soon
> > > to release), based on above two interfaces (plus another two to get/set
> > > mdev state).  
> > 
> > Ok, that's actually an interesting use case for start/stop.
> >   
> > > > 
> > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in
> > > > one shot to commit resources of all vGPUs assigned to a VM along with
> > > > some common resources.    
> > > 
> > > Kirti, can you elaborate the background about above one-shot commit
> > > requirement? It's hard to understand such a requirement.   
> > 
> > Agree, I know NVIDIA isn't planning to support hotplug initially, but
> > this seems like we're precluding hotplug from the design.  I don't
> > understand what's driving this one-shot requirement.  
> 
> Hi Alex,
> 
> The requirement here is based on how our internal vGPU device model designed and
> with this we are able to pre-allocate resources required for multiple virtual
> devices within same domain.
> 
> And I don't think this syntax will stop us from supporting hotplug at all.
> 
> For example, you can always create a virtual mdev and then do
> 
> echo "mdev_UUID" > /sys/class/mdev/mdev_start
> 
> then use QEMU monitor to add the device for hotplug.

Hi Neo,

I'm still not understanding the advantage you get from the "one-shot"
approach then if we can always add more mdevs by starting them later.
Are the hotplug mdevs somehow less capable than the initial set of
mdevs added in a single shot?  If the initial set is allocated
from the "same domain", does that give them some sort of hardware
locality/resource benefit?  Thanks,

Alex