netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirti Wankhede <kwankhede@nvidia.com>
To: Parav Pandit <parav@mellanox.com>,
	Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Or Gerlitz <gerlitz.or@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"michal.lkml@markovi.net" <michal.lkml@markovi.net>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jiri Pirko <jiri@mellanox.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Date: Fri, 8 Mar 2019 00:34:25 +0530	[thread overview]
Message-ID: <9dbc644f-4e4c-7119-8f99-99850fc67b73@nvidia.com> (raw)
In-Reply-To: <VI1PR0501MB22711C5925D09B9822382AC3D1730@VI1PR0501MB2271.eurprd05.prod.outlook.com>

CC += Alex

On 3/6/2019 11:12 AM, Parav Pandit wrote:
> Hi Kirti,
> 
>> -----Original Message-----
>> From: Kirti Wankhede <kwankhede@nvidia.com>
>> Sent: Tuesday, March 5, 2019 9:51 PM
>> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
>> <jakub.kicinski@netronome.com>
>> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; michal.lkml@markovi.net; davem@davemloft.net;
>> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/6/2019 6:14 AM, Parav Pandit wrote:
>>> Hi Greg, Kirti,
>>>
>>>> -----Original Message-----
>>>> From: Parav Pandit
>>>> Sent: Tuesday, March 5, 2019 5:45 PM
>>>> To: Parav Pandit <parav@mellanox.com>; Kirti Wankhede
>>>> <kwankhede@nvidia.com>; Jakub Kicinski
>> <jakub.kicinski@netronome.com>
>>>> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org; linux-
>>>> kernel@vger.kernel.org; michal.lkml@markovi.net;
>> davem@davemloft.net;
>>>> gregkh@linuxfoundation.org; Jiri Pirko <jiri@mellanox.com>
>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
>>>>> owner@vger.kernel.org> On Behalf Of Parav Pandit
>>>>> Sent: Tuesday, March 5, 2019 5:17 PM
>>>>> To: Kirti Wankhede <kwankhede@nvidia.com>; Jakub Kicinski
>>>>> <jakub.kicinski@netronome.com>
>>>>> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org;
>>>>> linux- kernel@vger.kernel.org; michal.lkml@markovi.net;
>>>>> davem@davemloft.net; gregkh@linuxfoundation.org; Jiri Pirko
>>>>> <jiri@mellanox.com>
>>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>> extension
>>>>>
>>>>> Hi Kirti,
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Kirti Wankhede <kwankhede@nvidia.com>
>>>>>> Sent: Tuesday, March 5, 2019 4:40 PM
>>>>>> To: Parav Pandit <parav@mellanox.com>; Jakub Kicinski
>>>>>> <jakub.kicinski@netronome.com>
>>>>>> Cc: Or Gerlitz <gerlitz.or@gmail.com>; netdev@vger.kernel.org;
>>>>>> linux- kernel@vger.kernel.org; michal.lkml@markovi.net;
>>>>>> davem@davemloft.net; gregkh@linuxfoundation.org; Jiri Pirko
>>>>>> <jiri@mellanox.com>
>>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>>> extension
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I am novice at mdev level too. mdev or vfio mdev.
>>>>>>> Currently by default we bind to same vendor driver, but when it
>>>>>>> was
>>>>>> created as passthrough device, vendor driver won't create netdevice
>>>>>> or rdma device for it.
>>>>>>> And vfio/mdev or whatever mature available driver would bind at
>>>>>>> that
>>>>>> point.
>>>>>>>
>>>>>>
>>>>>> Using mdev framework, if you want to partition a physical device
>>>>>> into multiple logic devices, you can bind those devices to same
>>>>>> vendor driver through vfio-mdev, where as if you want to
>>>>>> passthrough the device bind it to vfio-pci. If I understand
>>>>>> correctly, that is what you are
>>>>> looking for.
>>>>>>
>>>>>>
>>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
>>>>> PCI device has existing protocol devices on it such as netdevs and rdma
>> dev.
>>>>> This device is partitioned while those protocol devices exist and
>>>>> mlx5_core, mlx5_ib drivers are loaded on it.
>>>>> And we also need to connect these objects rightly to eswitch exposed
>>>>> by devlink interface (net/core/devlink.c) that supports eswitch
>>>>> binding, health, registers, parameters, ports support.
>>>>> It also supports existing PCI VFs.
>>>>>
>>>>> I don’t think we want to replicate all of this again in mdev subsystem [1].
>>>>>
>>>>> [1]
>>>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>>>
>>>>> So devlink interface to migrate users from managing VFs to non_VF
>>>>> sub device is natural progression.
>>>>>
>>>>> However, in future, I believe we would be creating mediated devices
>>>>> on user request, to use mdev modules and map them to VM.
>>>>>
>>>>> Also 'mdev_bus' is created as a class and not as a bus. This limits
>>>>> to not use devlink interface whose handle is bus+device name.
>>>>>
>>>>> So one option is to change mdev from class to bus.
>>>>> devlink will create mdevs on the bus, mdev driver can probe these
>>>>> devices on host system by default.
>>>>> And if told to do passthrough, a different driver exposes them to VM.
>>>>> How feasible is this?
>>>>>
>>>> Wait, I do see a mdev bus and mdevs are created on this bus using
>>>> mdev_device_create().
>>>> So how about we create mdevs on this bus using devlink, instead of sysfs?
>>>> And driver side on host gets the mdev_register_driver()->probe()?
>>>>
>>>
>>> Thinking more and reviewing more mdev code, I believe mdev fits this
>>> need a lot better than new subdev bus, mfd, platform device, or devlink
>> subport.
>>> For coming future, to map this sub device (mdev) to VM will also be easier
>> by using mdev bus.
>>>
>>
>> Thanks for taking close look at mdev code.
>>
>> Assigning mdev to VM support is already in place, QEMU and libvirt have
>> support to assign mdev device to VM.
>>
>>> I also believe we can use the sysfs interface for mdev life cycle.
>>> Here when mdev are created it will register as devlink instance and
>>> will be able to query/config parameters before driver probe the device.
>>> (instead of having life cycle via devlink)
>>>
>>> Few enhancements would be needed for mdev side.
>>> 1. making iommu optional.
>>
>> Currently mdev devices are not IOMMU aware, vendor driver is responsible
>> for programming IOMMU for mdev device, if required.
>> IOMMU aware mdev device patch set is almost reviewed and ready to get
>> pulled. This is optional, vendor driver have to decide whether mdev device
>> should be associated with its parents IOMMU or not. I'm testing it and I
>> think Alex is on vacation and this will get pulled when Alex will be back from
>> vacation.
>> https://lwn.net/Articles/779650/
>>
>>> 2. configuring mdev device parameters during creation time
>>>
>>
>> Mdev framework provides a way to define multiple types for creation
>> through sysfs. You can define multiple types rather than having creation
>> time parameter and on creation accordingly update 'available_instances'.
>> Mdev also provides a way to provide vendor-specific-attributes for parent
>> physical device as well as for created mdev device. You can add sysfs
>> interface to get input parameters for a mdev device which can be used by
>> vendor driver when open() on that mdev device is called.
>>
>> Thanks,
>> Kirti
> 
> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
> Will wait for a day to receive more comments/views from Greg and others.
> 
> As I explained in this cover-letter and discussion,
> First use case is to create and use mdevs in the host (and not in VM).
> Later on, I am sure once we have mdevs available, VM users will likely use it.
> 
> So, mlx5_core driver will have two components as starting point.
> 
> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> This is mdev device life cycle driver which will do, mdev_register_device() and implements mlx5_mdev_ops.
> 
Ok. I would suggest not use mdev.c file name, may be add device name,
something like mlx_mdev.c or vfio_mlx.c

> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> This is mdev device driver which does mdev_register_driver() 
> and probe() creates netdev by heavily reusing existing code of the PF device.
> These drivers will not be placed under drivers/vfio/mdev, because this is not a vfio driver.
> This is fine, right?
>

I'm not too familiar with netdev, but can you create netdev on open()
call on mlx mdev device? Then you don't have to write mdev device driver.


> Given that this is net driver, we will be submitting patches,
> through netdev mailing list through Dave Miller's net-next tree.
> And CC kvm@vger.kernel.org, you and others as usual.
> Are you ok, merging code this way as mdev device creator and mdev driver.
> Yes?
> 

Keep Alex and me in loop.

Thanks,
Kirti

  reply	other threads:[~2019-03-07 19:04 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-01  5:37 [RFC net-next 0/8] Introducing subdev bus and devlink extension Parav Pandit
2019-03-01  5:37 ` [RFC net-next 1/8] subdev: Introducing subdev bus Parav Pandit
2019-03-01  7:17   ` Greg KH
2019-03-01 16:35     ` Parav Pandit
2019-03-01 17:00       ` Greg KH
2019-03-26 11:48     ` Lorenzo Pieralisi
2019-03-01  5:37 ` [RFC net-next 2/8] subdev: Introduce pm callbacks Parav Pandit
2019-03-01  5:37 ` [RFC net-next 3/8] modpost: Add support for subdev device id table Parav Pandit
2019-03-01  5:37 ` [RFC net-next 4/8] devlink: Introduce and use devlink_init/cleanup() in alloc/free Parav Pandit
2019-03-01  5:37 ` [RFC net-next 5/8] devlink: Add variant of devlink_register/unregister Parav Pandit
2019-03-01  5:37 ` [RFC net-next 6/8] devlink: Add support for devlink subdev lifecycle Parav Pandit
2019-03-01  5:37 ` [RFC net-next 7/8] net/mlx5: Add devlink subdev life cycle command support Parav Pandit
2019-03-01  7:18   ` Greg KH
2019-03-01 16:04     ` Parav Pandit
2019-03-01  5:37 ` [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices Parav Pandit
2019-03-01  7:21   ` Greg KH
2019-03-01 17:21     ` Parav Pandit
2019-03-05  7:13       ` Greg KH
2019-03-05 17:57         ` Parav Pandit
2019-03-05 19:27           ` Greg KH
2019-03-05 21:37             ` Parav Pandit
2019-03-01 22:12   ` Saeed Mahameed
2019-03-04 16:45     ` Parav Pandit
2019-03-01 20:03 ` [RFC net-next 0/8] Introducing subdev bus and devlink extension Jakub Kicinski
2019-03-04  4:41   ` Parav Pandit
2019-03-05  1:35     ` Jakub Kicinski
2019-03-05 19:46       ` Parav Pandit
2019-03-05 22:39         ` Kirti Wankhede
2019-03-05 23:17           ` Parav Pandit
2019-03-05 23:44             ` Parav Pandit
2019-03-06  0:44               ` Parav Pandit
2019-03-06  3:51                 ` Kirti Wankhede
2019-03-06  5:42                   ` Parav Pandit
2019-03-07 19:04                     ` Kirti Wankhede [this message]
2019-03-07 20:27                       ` Parav Pandit
2019-03-07 20:53                         ` Kirti Wankhede
2019-03-07 21:02                           ` Parav Pandit
2019-03-07 21:07                             ` Kirti Wankhede
2019-03-07 21:21                               ` Parav Pandit
2019-03-07 22:01                                 ` Kirti Wankhede
2019-03-07 22:31                                   ` Parav Pandit
2019-03-08 12:19                                     ` Kirti Wankhede
2019-03-08 17:09                                       ` Parav Pandit
2019-03-05  1:45     ` Jakub Kicinski
2019-03-05 16:52       ` Parav Pandit
2021-05-31 10:36         ` moyufeng
2021-06-01  5:37           ` Jakub Kicinski
2021-06-01  7:33             ` Yunsheng Lin
2021-06-01 21:34               ` Jakub Kicinski
2021-06-02  2:24                 ` Yunsheng Lin
2021-06-02 16:34                   ` Jakub Kicinski
2021-06-03  3:46                     ` Yunsheng Lin
2021-06-03 17:53                       ` Jakub Kicinski
2021-06-04  1:18                         ` Yunsheng Lin
2021-06-04 18:41                           ` Jakub Kicinski
2021-06-07  1:36                             ` Yunsheng Lin
2021-06-07 19:46                               ` Jakub Kicinski
2021-06-08 12:10                                 ` Yunsheng Lin
2021-06-08 17:29                                   ` Jakub Kicinski
2021-06-09  9:16                                     ` Yunsheng Lin
2021-06-09  9:38                                       ` Parav Pandit
2021-06-09 11:05                                         ` Yunsheng Lin
2021-06-09 11:59                                           ` Parav Pandit
2021-06-09 12:30                                             ` Yunsheng Lin
2021-06-09 13:45                                               ` Parav Pandit
2021-06-10  7:04                                                 ` Yunsheng Lin
2021-06-10  7:17                                                   ` Parav Pandit
2021-06-09 16:40                                       ` Jakub Kicinski
2021-06-10  6:52                                         ` Yunsheng Lin
2021-06-09  9:52                                   ` Parav Pandit
2021-06-09 11:16                                     ` Yunsheng Lin
2021-06-09 12:00                                       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9dbc644f-4e4c-7119-8f99-99850fc67b73@nvidia.com \
    --to=kwankhede@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.lkml@markovi.net \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).