Netdev List
 help / color / mirror / Atom feed
* [net-next 1/2] net: netcp: add api to support set rx mode in netcp modules
From: Murali Karicheri @ 2018-04-02 16:17 UTC (permalink / raw)
  To: w-kwok2, linux-kernel, davem, netdev, nsekhar, grygorii.strashko
In-Reply-To: <1522685839-9497-1-git-send-email-m-karicheri2@ti.com>

From: WingMan Kwok <w-kwok2@ti.com>

This patch adds an API to support setting rx mode in
netcp modules.  If a netcp module needs to be notified
when upper layer transitions from one rx mode to
another and react accordingly, such a module will implement
the new API set_rx_mode added in this patch.  Currently
rx modes supported are PROMISCUOUS and NON_PROMISCUOUS
modes.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
---
 drivers/net/ethernet/ti/netcp.h      |  1 +
 drivers/net/ethernet/ti/netcp_core.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/ti/netcp.h b/drivers/net/ethernet/ti/netcp.h
index 416f732..c4ffdf4 100644
--- a/drivers/net/ethernet/ti/netcp.h
+++ b/drivers/net/ethernet/ti/netcp.h
@@ -214,6 +214,7 @@ struct netcp_module {
 	int	(*add_vid)(void *intf_priv, int vid);
 	int	(*del_vid)(void *intf_priv, int vid);
 	int	(*ioctl)(void *intf_priv, struct ifreq *req, int cmd);
+	int	(*set_rx_mode)(void *intf_priv, bool promisc);
 
 	/* used internally */
 	struct list_head	module_list;
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 736f6f7..e40aa3e 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1509,6 +1509,24 @@ static void netcp_addr_sweep_add(struct netcp_intf *netcp)
 	}
 }
 
+static int netcp_set_promiscuous(struct netcp_intf *netcp, bool promisc)
+{
+	struct netcp_intf_modpriv *priv;
+	struct netcp_module *module;
+	int error;
+
+	for_each_module(netcp, priv) {
+		module = priv->netcp_module;
+		if (!module->set_rx_mode)
+			continue;
+
+		error = module->set_rx_mode(priv->module_priv, promisc);
+		if (error)
+			return error;
+	}
+	return 0;
+}
+
 static void netcp_set_rx_mode(struct net_device *ndev)
 {
 	struct netcp_intf *netcp = netdev_priv(ndev);
@@ -1538,6 +1556,7 @@ static void netcp_set_rx_mode(struct net_device *ndev)
 	/* finally sweep and callout into modules */
 	netcp_addr_sweep_del(netcp);
 	netcp_addr_sweep_add(netcp);
+	netcp_set_promiscuous(netcp, promisc);
 	spin_unlock(&netcp->lock);
 }
 
-- 
1.9.1

^ permalink raw reply related

* [net-next 0/2] Add promiscous mode support in k2g network driver
From: Murali Karicheri @ 2018-04-02 16:17 UTC (permalink / raw)
  To: w-kwok2, linux-kernel, davem, netdev, nsekhar, grygorii.strashko

This patch adds support for promiscuous mode in network driver for K2G
SoC. This depends on v3 of my series at
https://www.spinics.net/lists/kernel/msg2765942.html

I plan to fold this to the above series and submit again when the net-next
merge windows opens. At this time, please review and let me know if it
looks good or need any re-work. I would like to get this ready so that it
can be merged along with the above series.

The boot and promiscuous mode test logs are at
https://pastebin.ubuntu.com/p/XQCvFS3QZb/

WingMan Kwok (2):
  net: netcp: add api to support set rx mode in netcp modules
  net: netcp: ethss: k2g: add promiscuous mode support

 drivers/net/ethernet/ti/netcp.h       |  1 +
 drivers/net/ethernet/ti/netcp_core.c  | 19 ++++++++++++
 drivers/net/ethernet/ti/netcp_ethss.c | 56 +++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+)

-- 
1.9.1

^ permalink raw reply

* Re: [PATCH 0/4] RFC: Realtek 83xx SMI driver core
From: Carl-Daniel Hailfinger @ 2018-04-02 16:10 UTC (permalink / raw)
  To: linus.walleij; +Cc: Linux Netdev List
In-Reply-To: <20171105231909.5599-1-linus.walleij@linaro.org>

Hi Linus,

did you make any progress with this?
I noticed that the Vodafone Easybox 904xdsl/904lte models both make use
of the RTL8367 switch. About one million of these routers have been
deployed in Germany.
There is an OpenWrt fork at
https://github.com/Quallenauge/Easybox-904-XDSL/commits/master-lede
which depends on the out-of-tree patches which seem to be the basis for
your Realtek 83xx driver patches.

Having your Realtek 83xx patches in the upstream Linux kernel would help
tremendously in getting support for those router models merged in OpenWrt.

Regards,
Carl-Daniel

^ permalink raw reply

* RE: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute max supported link bandwidth
From: Keller, Jacob E @ 2018-04-02 16:00 UTC (permalink / raw)
  To: Tal Gilboa, Bjorn Helgaas
  Cc: Tariq Toukan, Ariel Elior, Ganesh Goudar, Kirsher, Jeffrey T,
	everest-linux-l2@cavium.com, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
In-Reply-To: <3decbba0-74a2-c906-b5b4-a04f261860eb@mellanox.com>

> -----Original Message-----
> From: Tal Gilboa [mailto:talgi@mellanox.com]
> Sent: Monday, April 02, 2018 7:34 AM
> To: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>; Ariel Elior <ariel.elior@cavium.com>; Ganesh
> Goudar <ganeshgr@chelsio.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; everest-linux-l2@cavium.com; intel-wired-
> lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-pci@vger.kernel.org
> Subject: Re: [PATCH v5 03/14] PCI: Add pcie_bandwidth_capable() to compute
> max supported link bandwidth
> 
> On 4/2/2018 5:05 PM, Bjorn Helgaas wrote:
> > On Mon, Apr 02, 2018 at 10:34:58AM +0300, Tal Gilboa wrote:
> >> On 4/2/2018 3:40 AM, Bjorn Helgaas wrote:
> >>> On Sun, Apr 01, 2018 at 11:38:53PM +0300, Tal Gilboa wrote:
> >>>> On 3/31/2018 12:05 AM, Bjorn Helgaas wrote:
> >>>>> From: Tal Gilboa <talgi@mellanox.com>
> >>>>>
> >>>>> Add pcie_bandwidth_capable() to compute the max link bandwidth
> supported by
> >>>>> a device, based on the max link speed and width, adjusted by the
> encoding
> >>>>> overhead.
> >>>>>
> >>>>> The maximum bandwidth of the link is computed as:
> >>>>>
> >>>>>      max_link_speed * max_link_width * (1 - encoding_overhead)
> >>>>>
> >>>>> The encoding overhead is about 20% for 2.5 and 5.0 GT/s links using
> 8b/10b
> >>>>> encoding, and about 1.5% for 8 GT/s or higher speed links using 128b/130b
> >>>>> encoding.
> >>>>>
> >>>>> Signed-off-by: Tal Gilboa <talgi@mellanox.com>
> >>>>> [bhelgaas: adjust for pcie_get_speed_cap() and pcie_get_width_cap()
> >>>>> signatures, don't export outside drivers/pci]
> >>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >>>>> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> >>>>> ---
> >>>>>     drivers/pci/pci.c |   21 +++++++++++++++++++++
> >>>>>     drivers/pci/pci.h |    9 +++++++++
> >>>>>     2 files changed, 30 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>> index 43075be79388..9ce89e254197 100644
> >>>>> --- a/drivers/pci/pci.c
> >>>>> +++ b/drivers/pci/pci.c
> >>>>> @@ -5208,6 +5208,27 @@ enum pcie_link_width
> pcie_get_width_cap(struct pci_dev *dev)
> >>>>>     	return PCIE_LNK_WIDTH_UNKNOWN;
> >>>>>     }
> >>>>> +/**
> >>>>> + * pcie_bandwidth_capable - calculates a PCI device's link bandwidth
> capability
> >>>>> + * @dev: PCI device
> >>>>> + * @speed: storage for link speed
> >>>>> + * @width: storage for link width
> >>>>> + *
> >>>>> + * Calculate a PCI device's link bandwidth by querying for its link speed
> >>>>> + * and width, multiplying them, and applying encoding overhead.
> >>>>> + */
> >>>>> +u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed
> *speed,
> >>>>> +			   enum pcie_link_width *width)
> >>>>> +{
> >>>>> +	*speed = pcie_get_speed_cap(dev);
> >>>>> +	*width = pcie_get_width_cap(dev);
> >>>>> +
> >>>>> +	if (*speed == PCI_SPEED_UNKNOWN || *width ==
> PCIE_LNK_WIDTH_UNKNOWN)
> >>>>> +		return 0;
> >>>>> +
> >>>>> +	return *width * PCIE_SPEED2MBS_ENC(*speed);
> >>>>> +}
> >>>>> +
> >>>>>     /**
> >>>>>      * pci_select_bars - Make BAR mask from the type of resource
> >>>>>      * @dev: the PCI device for which BAR mask is made
> >>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>> index 66738f1050c0..2a50172b9803 100644
> >>>>> --- a/drivers/pci/pci.h
> >>>>> +++ b/drivers/pci/pci.h
> >>>>> @@ -261,8 +261,17 @@ void pci_disable_bridge_window(struct pci_dev
> *dev);
> >>>>>     	 (speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
> >>>>>     	 "Unknown speed")
> >>>>> +/* PCIe speed to Mb/s with encoding overhead: 20% for gen2, ~1.5% for
> gen3 */
> >>>>> +#define PCIE_SPEED2MBS_ENC(speed) \
> >>>>
> >>>> Missing gen4.
> >>>
> >>> I made it "gen3+".  I think that's accurate, isn't it?  The spec
> >>> doesn't seem to actually use "gen3" as a specific term, but sec 4.2.2
> >>> says rates of 8 GT/s or higher (which I think includes gen3 and gen4)
> >>> use 128b/130b encoding.
> >>>
> >>
> >> I meant that PCIE_SPEED_16_0GT will return 0 from this macro since it wasn't
> >> added. Need to return 15754.
> >
> > Oh, duh, of course!  Sorry for being dense.  What about the following?
> > I included the calculation as opposed to just the magic numbers to try
> > to make it clear how they're derived.  This has the disadvantage of
> > truncating the result instead of rounding, but I doubt that's
> > significant in this context.  If it is, we could use the magic numbers
> > and put the computation in a comment.
> 
> We can always use DIV_ROUND_UP((speed * enc_nominator),
> enc_denominator). I think this is confusing and since this introduces a
> bandwidth limit I would prefer to give a wider limit than a wrong one,
> even it is by less than 1Mb/s. My vote is for leaving it as you wrote below.
> 
> >
> > Another question: we currently deal in Mb/s, not MB/s.  Mb/s has the
> > advantage of sort of corresponding to the GT/s numbers, but using MB/s
> > would have the advantage of smaller numbers that match the table here:
> > https://en.wikipedia.org/wiki/PCI_Express#History_and_revisions,
> > but I don't know what's most typical in user-facing situations.
> > What's better?
> 
> I don't know what's better but for network devices we measure bandwidth
> in Gb/s, so presenting bandwidth in MB/s would mean additional
> calculations. The truth is I would have prefer to use Gb/s instead of
> Mb/s, but again, don't want to loss up to 1Gb/s.
> 

I prefer this version with the calculation in line since it makes the derivation clear. Keeping them in Mb/s makes it easier to convert to Gb/s, which is what most people would expect.

Thanks,
Jake

> >
> >
> > commit 946435491b35b7782157e9a4d1bd73071fba7709
> > Author: Tal Gilboa <talgi@mellanox.com>
> > Date:   Fri Mar 30 08:32:03 2018 -0500
> >
> >      PCI: Add pcie_bandwidth_capable() to compute max supported link
> bandwidth
> >
> >      Add pcie_bandwidth_capable() to compute the max link bandwidth
> supported by
> >      a device, based on the max link speed and width, adjusted by the encoding
> >      overhead.
> >
> >      The maximum bandwidth of the link is computed as:
> >
> >        max_link_width * max_link_speed * (1 - encoding_overhead)
> >
> >      2.5 and 5.0 GT/s links use 8b/10b encoding, which reduces the raw
> bandwidth
> >      available by 20%; 8.0 GT/s and faster links use 128b/130b encoding, which
> >      reduces it by about 1.5%.
> >
> >      The result is in Mb/s, i.e., megabits/second, of raw bandwidth.
> >
> >      Signed-off-by: Tal Gilboa <talgi@mellanox.com>
> >      [bhelgaas: add 16 GT/s, adjust for pcie_get_speed_cap() and
> >      pcie_get_width_cap() signatures, don't export outside drivers/pci]
> >      Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >      Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 43075be79388..ff1e72060952 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5208,6 +5208,28 @@ enum pcie_link_width pcie_get_width_cap(struct
> pci_dev *dev)
> >   	return PCIE_LNK_WIDTH_UNKNOWN;
> >   }
> >
> > +/**
> > + * pcie_bandwidth_capable - calculate a PCI device's link bandwidth capability
> > + * @dev: PCI device
> > + * @speed: storage for link speed
> > + * @width: storage for link width
> > + *
> > + * Calculate a PCI device's link bandwidth by querying for its link speed
> > + * and width, multiplying them, and applying encoding overhead.  The result
> > + * is in Mb/s, i.e., megabits/second of raw bandwidth.
> > + */
> > +u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed
> *speed,
> > +			   enum pcie_link_width *width)
> > +{
> > +	*speed = pcie_get_speed_cap(dev);
> > +	*width = pcie_get_width_cap(dev);
> > +
> > +	if (*speed == PCI_SPEED_UNKNOWN || *width ==
> PCIE_LNK_WIDTH_UNKNOWN)
> > +		return 0;
> > +
> > +	return *width * PCIE_SPEED2MBS_ENC(*speed);
> > +}
> > +
> >   /**
> >    * pci_select_bars - Make BAR mask from the type of resource
> >    * @dev: the PCI device for which BAR mask is made
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 66738f1050c0..37f9299ed623 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -261,8 +261,18 @@ void pci_disable_bridge_window(struct pci_dev *dev);
> >   	 (speed) == PCIE_SPEED_2_5GT ? "2.5 GT/s" : \
> >   	 "Unknown speed")
> >
> > +/* PCIe speed to Mb/s reduced by encoding overhead */
> > +#define PCIE_SPEED2MBS_ENC(speed) \
> > +	((speed) == PCIE_SPEED_16_0GT ? (16000*(128/130)) : \
> > +	 (speed) == PCIE_SPEED_8_0GT  ?  (8000*(128/130)) : \
> > +	 (speed) == PCIE_SPEED_5_0GT  ?  (5000*(8/10)) : \
> > +	 (speed) == PCIE_SPEED_2_5GT  ?  (2500*(8/10)) : \
> > +	 0)
> > +
> >   enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
> >   enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
> > +u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed
> *speed,
> > +			   enum pcie_link_width *width);
> >
> >   /* Single Root I/O Virtualization */
> >   struct pci_sriov {
> >

^ permalink raw reply

* RE: [PATCH v5 12/14] fm10k: Report PCIe link properties with pcie_print_link_status()
From: Keller, Jacob E @ 2018-04-02 15:56 UTC (permalink / raw)
  To: Bjorn Helgaas, Tal Gilboa
  Cc: Tariq Toukan, Ariel Elior, Ganesh Goudar, Kirsher, Jeffrey T,
	everest-linux-l2@cavium.com, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
In-Reply-To: <152244397259.135666.12655029447932912161.stgit@bhelgaas-glaptop.roam.corp.google.com>

> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Friday, March 30, 2018 2:06 PM
> To: Tal Gilboa <talgi@mellanox.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>; Ariel Elior <ariel.elior@cavium.com>; Ganesh
> Goudar <ganeshgr@chelsio.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; everest-linux-l2@cavium.com; intel-wired-
> lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-pci@vger.kernel.org
> Subject: [PATCH v5 12/14] fm10k: Report PCIe link properties with
> pcie_print_link_status()
> 
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> Use pcie_print_link_status() to report PCIe link speed and possible
> limitations instead of implementing this in the driver itself.
> 
> Note that pcie_get_minimum_link() can return misleading information because
> it finds the slowest link and the narrowest link without considering the
> total bandwidth of the link.  If the path contains a 16 GT/s x1 link and a
> 2.5 GT/s x16 link, pcie_get_minimum_link() returns 2.5 GT/s x1, which
> corresponds to 250 MB/s of bandwidth, not the actual available bandwidth of
> about 2000 MB/s for a 16 GT/s x1 link.

This comment is about what's being fixed, so it would have been easier to parse if it were written to more clearly indicate that we're removing (and not adding) this behavior.

Aside from the commit message (which I don't feel strongly enough needs a re-send of the patch) this looks good to me.

Acked-by: Jacob Keller <jacob.e.keller@intel.com>

Thanks Bjorn and Tal for fixing this!

> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |   87 --------------------------
>  1 file changed, 1 insertion(+), 86 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> index a434fecfdfeb..aa05fb534942 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> @@ -2120,91 +2120,6 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
>  	return 0;
>  }
> 
> -static void fm10k_slot_warn(struct fm10k_intfc *interface)
> -{
> -	enum pcie_link_width width = PCIE_LNK_WIDTH_UNKNOWN;
> -	enum pci_bus_speed speed = PCI_SPEED_UNKNOWN;
> -	struct fm10k_hw *hw = &interface->hw;
> -	int max_gts = 0, expected_gts = 0;
> -
> -	if (pcie_get_minimum_link(interface->pdev, &speed, &width) ||
> -	    speed == PCI_SPEED_UNKNOWN || width ==
> PCIE_LNK_WIDTH_UNKNOWN) {
> -		dev_warn(&interface->pdev->dev,
> -			 "Unable to determine PCI Express bandwidth.\n");
> -		return;
> -	}
> -
> -	switch (speed) {
> -	case PCIE_SPEED_2_5GT:
> -		/* 8b/10b encoding reduces max throughput by 20% */
> -		max_gts = 2 * width;
> -		break;
> -	case PCIE_SPEED_5_0GT:
> -		/* 8b/10b encoding reduces max throughput by 20% */
> -		max_gts = 4 * width;
> -		break;
> -	case PCIE_SPEED_8_0GT:
> -		/* 128b/130b encoding has less than 2% impact on throughput */
> -		max_gts = 8 * width;
> -		break;
> -	default:
> -		dev_warn(&interface->pdev->dev,
> -			 "Unable to determine PCI Express bandwidth.\n");
> -		return;
> -	}
> -
> -	dev_info(&interface->pdev->dev,
> -		 "PCI Express bandwidth of %dGT/s available\n",
> -		 max_gts);
> -	dev_info(&interface->pdev->dev,
> -		 "(Speed:%s, Width: x%d, Encoding Loss:%s, Payload:%s)\n",
> -		 (speed == PCIE_SPEED_8_0GT ? "8.0GT/s" :
> -		  speed == PCIE_SPEED_5_0GT ? "5.0GT/s" :
> -		  speed == PCIE_SPEED_2_5GT ? "2.5GT/s" :
> -		  "Unknown"),
> -		 hw->bus.width,
> -		 (speed == PCIE_SPEED_2_5GT ? "20%" :
> -		  speed == PCIE_SPEED_5_0GT ? "20%" :
> -		  speed == PCIE_SPEED_8_0GT ? "<2%" :
> -		  "Unknown"),
> -		 (hw->bus.payload == fm10k_bus_payload_128 ? "128B" :
> -		  hw->bus.payload == fm10k_bus_payload_256 ? "256B" :
> -		  hw->bus.payload == fm10k_bus_payload_512 ? "512B" :
> -		  "Unknown"));
> -
> -	switch (hw->bus_caps.speed) {
> -	case fm10k_bus_speed_2500:
> -		/* 8b/10b encoding reduces max throughput by 20% */
> -		expected_gts = 2 * hw->bus_caps.width;
> -		break;
> -	case fm10k_bus_speed_5000:
> -		/* 8b/10b encoding reduces max throughput by 20% */
> -		expected_gts = 4 * hw->bus_caps.width;
> -		break;
> -	case fm10k_bus_speed_8000:
> -		/* 128b/130b encoding has less than 2% impact on throughput */
> -		expected_gts = 8 * hw->bus_caps.width;
> -		break;
> -	default:
> -		dev_warn(&interface->pdev->dev,
> -			 "Unable to determine expected PCI Express
> bandwidth.\n");
> -		return;
> -	}
> -
> -	if (max_gts >= expected_gts)
> -		return;
> -
> -	dev_warn(&interface->pdev->dev,
> -		 "This device requires %dGT/s of bandwidth for optimal
> performance.\n",
> -		 expected_gts);
> -	dev_warn(&interface->pdev->dev,
> -		 "A %sslot with x%d lanes is suggested.\n",
> -		 (hw->bus_caps.speed == fm10k_bus_speed_2500 ? "2.5GT/s " :
> -		  hw->bus_caps.speed == fm10k_bus_speed_5000 ? "5.0GT/s " :
> -		  hw->bus_caps.speed == fm10k_bus_speed_8000 ? "8.0GT/s " :
> ""),
> -		 hw->bus_caps.width);
> -}
> -
>  /**
>   * fm10k_probe - Device Initialization Routine
>   * @pdev: PCI device information struct
> @@ -2326,7 +2241,7 @@ static int fm10k_probe(struct pci_dev *pdev, const
> struct pci_device_id *ent)
>  	mod_timer(&interface->service_timer, (HZ * 2) + jiffies);
> 
>  	/* print warning for non-optimal configurations */
> -	fm10k_slot_warn(interface);
> +	pcie_print_link_status(interface->pdev);
> 
>  	/* report MAC address for logging */
>  	dev_info(&pdev->dev, "%pM\n", netdev->dev_addr);


^ permalink raw reply

* Re: [bpf-next PATCH 4/4] bpf: sockmap, add hash map support
From: Alexei Starovoitov @ 2018-04-02 15:54 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, davem
In-Reply-To: <20180401150109.24727.86658.stgit@john-Precision-Tower-5810>

On Sun, Apr 01, 2018 at 08:01:10AM -0700, John Fastabend wrote:
> Sockmap is currently backed by an array and enforces keys to be
> four bytes. This works well for many use cases and was originally
> modeled after devmap which also uses four bytes keys. However,
> this has become limiting in larger use cases where a hash would
> be more appropriate. For example users may want to use the 5-tuple
> of the socket as the lookup key.
> 
> To support this add hash support.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

api looks good, but I think it came a bit too late for this release.
_nulls part you don't need for this hash. Few other nits:

> +static void htab_elem_free_rcu(struct rcu_head *head)
> +{
> +	struct htab_elem *l = container_of(head, struct htab_elem, rcu);
> +
> +	/* must increment bpf_prog_active to avoid kprobe+bpf triggering while
> +	 * we're calling kfree, otherwise deadlock is possible if kprobes
> +	 * are placed somewhere inside of slub
> +	 */
> +	preempt_disable();
> +	__this_cpu_inc(bpf_prog_active);
> +	kfree(l);
> +	__this_cpu_dec(bpf_prog_active);
> +	preempt_enable();

I don't think it's necessary.

> +static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
> +{
> +	struct bpf_htab *htab;
> +	int i, err;
> +	u64 cost;
> +
> +	if (!capable(CAP_NET_ADMIN))
> +		return ERR_PTR(-EPERM);
> +
> +	/* check sanity of attributes */
> +	if (attr->max_entries == 0 ||
> +	    attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (attr->value_size > KMALLOC_MAX_SIZE)
> +		return ERR_PTR(-E2BIG);

doesn't seem to match
+	u32 fd = *(u32 *)value;
that is done later.

> +static struct htab_elem *lookup_elem_raw(struct hlist_nulls_head *head,
> +					 u32 hash, void *key, u32 key_size)
> +{
> +	struct hlist_nulls_node *n;
> +	struct htab_elem *l;
> +
> +	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
> +		if (l->hash == hash && !memcmp(&l->key, key, key_size))
> +			return l;

if nulls is needed, there gotta be a comment explaining it.

please add tests for all methods.

> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index f95fa67..2fa4cbb 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -67,6 +67,7 @@
>  	[BPF_MAP_TYPE_DEVMAP]		= "devmap",
>  	[BPF_MAP_TYPE_SOCKMAP]		= "sockmap",
>  	[BPF_MAP_TYPE_CPUMAP]		= "cpumap",
> +	[BPF_MAP_TYPE_SOCKHASH]		= "sockhash",
>  };
>  
>  static unsigned int get_possible_cpus(void)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 9d07465..1a19450 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -115,6 +115,7 @@ enum bpf_map_type {
>  	BPF_MAP_TYPE_DEVMAP,
>  	BPF_MAP_TYPE_SOCKMAP,
>  	BPF_MAP_TYPE_CPUMAP,
> +	BPF_MAP_TYPE_SOCKHASH,

tools/* updates should be in separate commit.

^ permalink raw reply

* Re: [PATCH net-next] net: ipv6/gre: Add GRO support
From: Tariq Toukan @ 2018-04-02 15:45 UTC (permalink / raw)
  To: Eric Dumazet, Eran Ben Elisha
  Cc: David S. Miller, Linux Netdev List, Eran Ben Elisha, Eric Dumazet
In-Reply-To: <e6b0bbf7-559e-e86a-aaf7-c64b713dfaa8@gmail.com>



On 02/04/2018 6:19 PM, Eric Dumazet wrote:
> 
> 
> On 04/02/2018 08:00 AM, Eran Ben Elisha wrote:
>>>>> Seems good, but why isn't this handled directly in GRO native layer ?
>>>> ip6_tunnel and ip6_gre do not share initialization flow functions (unlike ipv4).
>>>> Changing the ipv6 init infrastructure should not be part of this
>>>> patch. we prefer to keep this one minimal, simple and safe.
>>>
>>>
>>>
>>> Looking at gre_gro_receive() and gre_gro_complete() I could not see why they
>>> could not be copied/pasted to IPv6.
>>
>> These functions to handle GRO over GRE are already assigned in
>> gre_offload_init() (in net/ipv4/gre_offload.c under CONFIG_IPV6).
>> However without initializing the gro_cells, the receive path will not
>> go via napi_gro_receive path, but directly to netif_rx.
>> So AFAIU, only gcells->cells was missing for gro_cells_receive to
>> really go via GRO flow.
>>
>>>
>>> Maybe give more details on the changelog, it is really not obvious.
>> Hopefully the above filled this request.
>>>
> 
> Not really :/
> 

So you're referring to native interface. We thought you meant native IP 
module.


> gro_cells_receive() is not really useful with native GRO, since packet is already
> a GRO packet by the time it reaches ip_tunnel_rcv() or __ip6_tnl_rcv()
> 

Right. If GRO on native interface is ON, our patch doesn't help much.
The case we improve here is:
Native has GRO OFF, GRE has GRO ON.

Before this patch there were no GRO packets at all in this case, only 
MTU packets went up the stack.

> Sure, it might be usefull if native GRO (happening on eth0 if you prefer) did not
> handle a particular encapsulation.
> 

Or it is turned OFF.

> gro_cell was a work around before we extended GRO to be able to decap some tunnel headers.
> 
> It seems we have to extend this to also support GRE6.
> 

^ permalink raw reply

* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
From: Kirill Tkhai @ 2018-04-02 15:36 UTC (permalink / raw)
  To: Oliver Hartkopp, mkl; +Cc: linux-can, netdev, dev
In-Reply-To: <377fbd8a-cd7e-2650-8efd-907cea0a0aee@hartkopp.net>

Hi, Oliver,

On 02.04.2018 18:28, Oliver Hartkopp wrote:
> Hi Kirill, Marc,
> 
> I checked the code once more and added some debug output to the other parts in CAN notifier code.
> 
> In fact the code pointed to by both of you seems to be obsolete as I only wanted to be 'really sure' that no leftovers of the CAN filters at module unloading.
> 
> 
>> Yes, this one looks good:
>> https://marc.info/?l=linux-can&m=150169589119335&w=2
>>
>> Regards,
>> Kirill
>>
> 
> I was obviously too cautious ;-)
> 
> All tests I made resulted in the function iterating through all the CAN netdevices doing exactly nothing.
> 
> I'm fine with removing that stuff - but I'm not sure whether it's worth to push that patch to stable 4.12+ or even before 4.12 (without namespace support - and removing rcu_barrier() too).
> 
> Any opinions?

I think the same -- it's not need for stable as there is just iteration over empty list, i.e., noop.

Kirill

^ permalink raw reply

* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
From: Oliver Hartkopp @ 2018-04-02 15:28 UTC (permalink / raw)
  To: Kirill Tkhai, mkl; +Cc: linux-can, netdev, dev
In-Reply-To: <d97f5629-6de5-0d24-8a48-01a612a39bc9@virtuozzo.com>

Hi Kirill, Marc,

I checked the code once more and added some debug output to the other 
parts in CAN notifier code.

In fact the code pointed to by both of you seems to be obsolete as I 
only wanted to be 'really sure' that no leftovers of the CAN filters at 
module unloading.


> Yes, this one looks good:
> https://marc.info/?l=linux-can&m=150169589119335&w=2
> 
> Regards,
> Kirill
> 

I was obviously too cautious ;-)

All tests I made resulted in the function iterating through all the CAN 
netdevices doing exactly nothing.

I'm fine with removing that stuff - but I'm not sure whether it's worth 
to push that patch to stable 4.12+ or even before 4.12 (without 
namespace support - and removing rcu_barrier() too).

Any opinions?

Best regards,
Oliver

^ permalink raw reply

* Re: [PATCH net-next] bridge: Allow max MTU when multiple VLANs present
From: Chas Williams @ 2018-04-02 15:26 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Toshiaki Makita, David Miller, netdev, Stephen Hemminger,
	Nikolay Aleksandrov
In-Reply-To: <CAJieiUjcUGjab0RMz-+cMgrvPooUopX6PbB0wzzSviyqj1jjKg@mail.gmail.com>

On Mon, Apr 2, 2018 at 11:08 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On Fri, Mar 30, 2018 at 12:54 PM, Chas Williams <3chas3@gmail.com> wrote:
>> On Thu, Mar 29, 2018 at 9:02 PM, Toshiaki Makita
>> <makita.toshiaki@lab.ntt.co.jp> wrote:
>>> On 2018/03/30 1:49, Roopa Prabhu wrote:
>>>> On Thu, Mar 22, 2018 at 9:53 PM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>>>>> On Thu, Mar 22, 2018 at 8:34 AM, Chas Williams <3chas3@gmail.com> wrote:
>>>>>> If the bridge is allowing multiple VLANs, some VLANs may have
>>>>>> different MTUs.  Instead of choosing the minimum MTU for the
>>>>>> bridge interface, choose the maximum MTU of the bridge members.
>>>>>> With this the user only needs to set a larger MTU on the member
>>>>>> ports that are participating in the large MTU VLANS.
>>>>>>
>>>>>> Signed-off-by: Chas Williams <3chas3@gmail.com>
>>>>>> ---
>>>>>
>>>>> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>>
>>>>> This or an equivalent fix is necessary: as stated above, today the
>>>>> bridge mtu capped at min port mtu limits all
>>>>> vlan devices on top of the vlan filtering bridge to min port mtu.
>>>>
>>>>
>>>> On further thought, since this patch changes default behavior, it may
>>>> upset people. ie with this patch, a vlan device
>>>> on the bridge by default will now use the  bridge max mtu and that
>>>> could cause unexpected drops in the bridge driver
>>>> if the xmit port had a lower mtu. This may surprise users.
>>
>> It only changes the default behavior when you are using VLAN aware bridges.
>> The behavior remains the same otherwise.  I don't know if VLAN aware bridges
>> are that popular yet so there probably isn't any particular
>> expectation from those
>> bridges.
>
> they are popular...in-fact they are the default bridge mode on our
> network switches.
> And they have been around for some time now to ignore its users.
> Plus it is not right to change default mtu behavior for one mode of the bridge
> and not the others (bridge mtu handling from user-space is complex enough today
> due to dynamic mtu changes on port enslave/deslave).

I don't see the issue with one mode of bridge behaving differently
from another mode.
The VLAN behavior between the two bridge modes is completely different so having
a different MTU behavior doesn't seem that surprising.

You are potentially mixing different sized VLAN on a same bridge.  The only sane
choice is to pick the largest MTU for the bridge.  This lets you have
whatever MTU
is appropriate on the child VLAN interfaces of the bridge.  If you
attempt to forward
from a port with a larger MTU to a smaller MTU, you get the expected behavior.

Forcing the end user to configure all the ports to the maximum MTU of
all the VLANs
on the bridge is wrong IMHO.  You then risk attempting to forward
oversize packets
on a network that can't support that.

>
>>
>> I don't think those drops are unexpected.  If a user has misconfigured
>> the bridge
>> we can't be expected to fix that for them.  It is the user's
>> responsbility to ensure
>> that the ports on the VLAN have a size consistent with the traffic
>> they expect to
>> pass.
>>
>
> By default they are not expected today. The problem is changing the bridge
> to max mtu changes 'all' the vlan devices on top of the vlan aware bridge to
> max mtu by default which makes drops at the bridge driver more common if the
> user had mixed mtu on its ports.

That's not been my experience.  The MTU on the vlan devices is only
limited by the
bridges's MTU.  Setting the bridge MTU doesn't change the children
VLAN devices MTUs.

^ permalink raw reply

* [RFC] vhost: introduce mdev based hardware vhost backend
From: Tiwei Bie @ 2018-04-02 15:23 UTC (permalink / raw)
  To: mst, jasowang, alex.williamson, ddutile, alexander.h.duyck
  Cc: virtio-dev, linux-kernel, kvm, virtualization, netdev, dan.daly,
	cunming.liang, zhihong.wang, jianfeng.tan, xiao.w.wang, tiwei.bie

This patch introduces a mdev (mediated device) based hardware
vhost backend. This backend is an abstraction of the various
hardware vhost accelerators (potentially any device that uses
virtio ring can be used as a vhost accelerator). Some generic
mdev parent ops are provided for accelerator drivers to support
generating mdev instances.

What's this
===========

The idea is that we can setup a virtio ring compatible device
with the messages available at the vhost-backend. Originally,
these messages are used to implement a software vhost backend,
but now we will use these messages to setup a virtio ring
compatible hardware device. Then the hardware device will be
able to work with the guest virtio driver in the VM just like
what the software backend does. That is to say, we can implement
a hardware based vhost backend in QEMU, and any virtio ring
compatible devices potentially can be used with this backend.
(We also call it vDPA -- vhost Data Path Acceleration).

One problem is that, different virtio ring compatible devices
may have different device interfaces. That is to say, we will
need different drivers in QEMU. It could be troublesome. And
that's what this patch trying to fix. The idea behind this
patch is very simple: mdev is a standard way to emulate device
in kernel. So we defined a standard device based on mdev, which
is able to accept vhost messages. When the mdev emulation code
(i.e. the generic mdev parent ops provided by this patch) gets
vhost messages, it will parse and deliver them to accelerator
drivers. Drivers can use these messages to setup accelerators.

That is to say, the generic mdev parent ops (e.g. read()/write()/
ioctl()/...) will be provided for accelerator drivers to register
accelerators as mdev parent devices. And each accelerator device
will support generating standard mdev instance(s).

With this standard device interface, we will be able to just
develop one userspace driver to implement the hardware based
vhost backend in QEMU.

Difference between vDPA and PCI passthru
========================================

The key difference between vDPA and PCI passthru is that, in
vDPA only the data path of the device (e.g. DMA ring, notify
region and queue interrupt) is pass-throughed to the VM, the
device control path (e.g. PCI configuration space and MMIO
regions) is still defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware design, especially the
  accelerator for each vhost backend doesn't have to be a
  full PCI device;
- leveraging the existing virtio live-migration framework;

The interface of this mdev based device
=======================================

1. BAR0

The MMIO region described by BAR0 is the main control
interface. Messages will be written to or read from
this region.

The message type is determined by the `request` field
in message header. The message size is encoded in the
message header too. The message format looks like this:

struct vhost_vfio_op {
	__u64 request;
	__u32 flags;
	/* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
	__u32 size;
	union {
		__u64 u64;
		struct vhost_vring_state state;
		struct vhost_vring_addr addr;
		struct vhost_memory memory;
	} payload;
};

The existing vhost-kernel ioctl cmds are reused as
the message requests in above structure.

Each message will be written to or read from this
region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
	int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
	struct vhost_vfio *vfio = dev->opaque;
	int ret;

	ret = pwrite64(vfio->device_fd, op, count, vfio->bar0_offset);
	if (ret != count)
		return -1;

	return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
	int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
	struct vhost_vfio *vfio = dev->opaque;
	uint64_t request = op->request;
	int ret;

	ret = pread64(vfio->device_fd, op, count, vfio->bar0_offset);
	if (ret != count || request != op->request)
		return -1;

	return 0;
}

It's quite straightforward to set things to the device.
Just need to write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
	struct vhost_vfio_op op;

	op.request = VHOST_SET_FEATURES;
	op.flags = 0;
	op.size = sizeof(features);
	op.payload.u64 = features;

	return vhost_vfio_write(dev, &op);
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
	struct vhost_vfio_op op;
	int ret;

	op.request = VHOST_GET_FEATURES;
	op.flags = VHOST_VFIO_NEED_REPLY;
	op.size = 0;

	/* Just need to write the header */
	ret = vhost_vfio_write(dev, &op);
	if (ret != 0)
		goto out;

	/* `op` wasn't changed during write */
	op.flags = 0;
	op.size = sizeof(*features);

	ret = vhost_vfio_read(dev, &op);
	if (ret != 0)
		goto out;

	*features = op.payload.u64;
out:
	return ret;
}

2. BAR1 (mmap-able)

The MMIO region described by BAR1 will be used to notify the
device.

Each queue will has a page for notification, and it can be
mapped to VM (if hardware also supports), and the virtio
driver in the VM will be able to notify the device directly.

The MMIO region described by BAR1 is also write-able. If the
accelerator's notification register(s) cannot be mapped to the
VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
	......
	offset = 0x1000 * queue_idx; /* XXX assume page size is 4K here. */

	ret = pwrite64(vfio->device_fd, &queue_idx, sizeof(queue_idx),
			vfio->bar1_offset + offset);
	......
}

Other BARs are reserved.

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass will also be supported.

Currently, only VFIO_PCI_MSIX_IRQ_INDEX is supported.

The API for drivers to provide mdev instances
=============================================

The read()/write()/ioctl()/mmap()/open()/release() mdev
parent ops have been provided for accelerators' drivers
to provide mdev instances.

ssize_t vdpa_read(struct mdev_device *mdev, char __user *buf,
		  size_t count, loff_t *ppos);
ssize_t vdpa_write(struct mdev_device *mdev, const char __user *buf,
		   size_t count, loff_t *ppos);
long vdpa_ioctl(struct mdev_device *mdev, unsigned int cmd, unsigned long arg);
int vdpa_mmap(struct mdev_device *mdev, struct vm_area_struct *vma);
int vdpa_open(struct mdev_device *mdev);
void vdpa_close(struct mdev_device *mdev);

Each accelerator driver just needs to implement its own
create()/remove() ops, and provide a vdpa device ops
which will be called by the generic mdev emulation code.

Currently, the vdpa device ops are defined as:

typedef int (*vdpa_start_device_t)(struct vdpa_dev *vdpa);
typedef int (*vdpa_stop_device_t)(struct vdpa_dev *vdpa);
typedef int (*vdpa_dma_map_t)(struct vdpa_dev *vdpa);
typedef int (*vdpa_dma_unmap_t)(struct vdpa_dev *vdpa);
typedef int (*vdpa_set_eventfd_t)(struct vdpa_dev *vdpa, int vector, int fd);
typedef u64 (*vdpa_supported_features_t)(struct vdpa_dev *vdpa);
typedef void (*vdpa_notify_device_t)(struct vdpa_dev *vdpa, int qid);
typedef u64 (*vdpa_get_notify_addr_t)(struct vdpa_dev *vdpa, int qid);

struct vdpa_device_ops {
	vdpa_start_device_t		start;
	vdpa_stop_device_t		stop;
	vdpa_dma_map_t			dma_map;
	vdpa_dma_unmap_t		dma_unmap;
	vdpa_set_eventfd_t		set_eventfd;
	vdpa_supported_features_t	supported_features;
	vdpa_notify_device_t		notify;
	vdpa_get_notify_addr_t		get_notify_addr;
};

struct vdpa_dev {
	struct mdev_device *mdev;
	struct mutex ops_lock;
	u8 vconfig[VDPA_CONFIG_SIZE];
	int nr_vring;
	u64 features;
	u64 state;
	struct vhost_memory *mem_table;
	bool pending_reply;
	struct vhost_vfio_op pending;
	const struct vdpa_device_ops *ops;
	void *private;
	int max_vrings;
	struct vdpa_vring_info vring_info[0];
};

struct vdpa_dev *vdpa_alloc(struct mdev_device *mdev, void *private,
			    int max_vrings);
void vdpa_free(struct vdpa_dev *vdpa);

A simple example
================

# Query the number of available mdev instances
$ cat /sys/class/mdev_bus/0000:06:00.2/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/available_instances

# Create a mdev instance
$ echo $UUID > /sys/class/mdev_bus/0000:06:00.2/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/create

# Launch QEMU with a virtio-net device
$ qemu \
	...... \
	-netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=$ID \
	-device virtio-net-pci,netdev=$ID

-------- END --------

Most of above words will be refined and moved to a doc in
the formal patch. In this RFC, all introductions and code
are gathered in this patch, the idea is to make it easier
to find all the relevant information. Anyone who wants to
comment could use inline comment and just keep the relevant
parts. Sorry for the big RFC patch..

This patch is just a RFC for now, and something is still
missing or needs to be refined. But it's never too early
to hear the thoughts from the community. So any comments
would be appreciated! Thanks! :-)

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/vhost/Makefile     |   3 +
 drivers/vhost/vdpa.c       | 805 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vdpa_mdev.h  |  76 +++++
 include/uapi/linux/vhost.h |  26 ++
 4 files changed, 910 insertions(+)
 create mode 100644 drivers/vhost/vdpa.c
 create mode 100644 include/linux/vdpa_mdev.h

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..7d185e083140 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -11,3 +11,6 @@ vhost_vsock-y := vsock.o
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
 obj-$(CONFIG_VHOST)	+= vhost.o
+
+obj-m += vhost_vdpa.o  # FIXME: add an option
+vhost_vdpa-y := vdpa.o
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
new file mode 100644
index 000000000000..aa19c266ea19
--- /dev/null
+++ b/drivers/vhost/vdpa.c
@@ -0,0 +1,805 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Intel Corporation.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/vfio.h>
+#include <linux/vhost.h>
+#include <linux/mdev.h>
+#include <linux/vdpa_mdev.h>
+
+#define VDPA_BAR0_SIZE		0x1000000 // TBD
+
+#define VDPA_VFIO_PCI_OFFSET_SHIFT	40
+#define VDPA_VFIO_PCI_OFFSET_MASK \
+		((1ULL << VDPA_VFIO_PCI_OFFSET_SHIFT) - 1)
+#define VDPA_VFIO_PCI_OFFSET_TO_INDEX(offset) \
+		((offset) >> VDPA_VFIO_PCI_OFFSET_SHIFT)
+#define VDPA_VFIO_PCI_INDEX_TO_OFFSET(index) \
+		((u64)(index) << VDPA_VFIO_PCI_OFFSET_SHIFT)
+#define VDPA_VFIO_PCI_BAR_OFFSET(offset) \
+		((offset) & VDPA_VFIO_PCI_OFFSET_MASK)
+
+#define STORE_LE16(addr, val)	(*(u16 *)(addr) = cpu_to_le16(val))
+#define STORE_LE32(addr, val)	(*(u32 *)(addr) = cpu_to_le32(val))
+
+static void vdpa_create_config_space(struct vdpa_dev *vdpa)
+{
+	/* PCI device ID / vendor ID */
+	STORE_LE32(&vdpa->vconfig[0x0], 0xffffffff); // FIXME TBD
+
+	/* Programming interface class */
+	vdpa->vconfig[0x9] = 0x00;
+
+	/* Sub class */
+	vdpa->vconfig[0xa] = 0x00;
+
+	/* Base class */
+	vdpa->vconfig[0xb] = 0x02;
+
+	// FIXME TBD
+}
+
+struct vdpa_dev *vdpa_alloc(struct mdev_device *mdev, void *private,
+			    int max_vrings)
+{
+	struct vdpa_dev *vdpa;
+	size_t size;
+
+	size = sizeof(struct vdpa_dev) + max_vrings *
+			sizeof(struct vdpa_vring_info);
+
+	vdpa = kzalloc(size, GFP_KERNEL);
+	if (vdpa == NULL)
+		return NULL;
+
+	mutex_init(&vdpa->ops_lock);
+
+	vdpa->mdev = mdev;
+	vdpa->private = private;
+	vdpa->max_vrings = max_vrings;
+
+	vdpa_create_config_space(vdpa);
+
+	return vdpa;
+}
+EXPORT_SYMBOL(vdpa_alloc);
+
+void vdpa_free(struct vdpa_dev *vdpa)
+{
+	struct mdev_device *mdev;
+
+	mdev = vdpa->mdev;
+
+	vdpa->ops->stop(vdpa);
+	vdpa->ops->dma_unmap(vdpa);
+
+	mdev_set_drvdata(mdev, NULL);
+
+	mutex_destroy(&vdpa->ops_lock);
+
+	kfree(vdpa->mem_table);
+	kfree(vdpa);
+}
+EXPORT_SYMBOL(vdpa_free);
+
+static ssize_t vdpa_handle_pcicfg_read(struct mdev_device *mdev,
+		char __user *buf, size_t count, loff_t *ppos)
+{
+	struct vdpa_dev *vdpa;
+	loff_t pos = *ppos;
+	loff_t offset;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	offset = VDPA_VFIO_PCI_BAR_OFFSET(pos);
+
+	if (count + offset > VDPA_CONFIG_SIZE)
+		return -EINVAL;
+
+	if (copy_to_user(buf, (vdpa->vconfig + offset), count))
+		return -EFAULT;
+
+	return count;
+}
+
+static ssize_t vdpa_handle_bar0_read(struct mdev_device *mdev,
+		char __user *buf, size_t count, loff_t *ppos)
+{
+	struct vdpa_dev *vdpa;
+	struct vhost_vfio_op *op = NULL;
+	loff_t pos = *ppos;
+	loff_t offset;
+	int ret;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	offset = VDPA_VFIO_PCI_BAR_OFFSET(pos);
+	if (offset != 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!vdpa->pending_reply) {
+		ret = 0;
+		goto out;
+	}
+
+	vdpa->pending_reply = false;
+
+	op = kzalloc(VHOST_VFIO_OP_HDR_SIZE + VHOST_VFIO_OP_PAYLOAD_MAX_SIZE,
+		     GFP_KERNEL);
+	if (op == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	op->request = vdpa->pending.request;
+
+	switch (op->request) {
+	case VHOST_GET_VRING_BASE:
+		op->payload.state = vdpa->pending.payload.state;
+		op->size = sizeof(op->payload.state);
+		break;
+	case VHOST_GET_FEATURES:
+		op->payload.u64 = vdpa->pending.payload.u64;
+		op->size = sizeof(op->payload.u64);
+		break;
+	default:
+		ret = -EINVAL;
+		goto out_free;
+	}
+
+	if (op->size + VHOST_VFIO_OP_HDR_SIZE != count) {
+		ret = -EINVAL;
+		goto out_free;
+	}
+
+	if (copy_to_user(buf, op, count)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	ret = count;
+
+out_free:
+	kfree(op);
+out:
+	return ret;
+}
+
+ssize_t vdpa_read(struct mdev_device *mdev, char __user *buf,
+		  size_t count, loff_t *ppos)
+{
+	int done = 0;
+	unsigned int index;
+	loff_t pos = *ppos;
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	mutex_lock(&vdpa->ops_lock);
+
+	index = VDPA_VFIO_PCI_OFFSET_TO_INDEX(pos);
+
+	switch (index) {
+	case VFIO_PCI_CONFIG_REGION_INDEX:
+		done = vdpa_handle_pcicfg_read(mdev, buf, count, ppos);
+		break;
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		done = vdpa_handle_bar0_read(mdev, buf, count, ppos);
+		break;
+	}
+
+	if (done > 0)
+		*ppos += done;
+
+	mutex_unlock(&vdpa->ops_lock);
+
+	return done;
+}
+EXPORT_SYMBOL(vdpa_read);
+
+static ssize_t vdpa_handle_pcicfg_write(struct mdev_device *mdev,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	return count;
+}
+
+static int vhost_set_mem_table(struct mdev_device *mdev,
+		struct vhost_memory *mem)
+{
+	struct vdpa_dev *vdpa;
+	struct vhost_memory *mem_table;
+	size_t size;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	// FIXME fix this
+	if (vdpa->state != VHOST_DEVICE_S_STOPPED)
+		return -EBUSY;
+
+	size = sizeof(*mem) + mem->nregions * sizeof(*mem->regions);
+
+	mem_table = kzalloc(size, GFP_KERNEL);
+	if (mem_table == NULL)
+		return -ENOMEM;
+
+	memcpy(mem_table, mem, size);
+
+	kfree(vdpa->mem_table);
+
+	vdpa->mem_table = mem_table;
+
+	vdpa->ops->dma_unmap(vdpa);
+	vdpa->ops->dma_map(vdpa);
+
+	return 0;
+}
+
+static int vhost_set_vring_addr(struct mdev_device *mdev,
+		struct vhost_vring_addr *addr)
+{
+	struct vdpa_dev *vdpa;
+	int qid = addr->index;
+	struct vdpa_vring_info *vring;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	if (qid >= vdpa->max_vrings)
+		return -EINVAL;
+
+	/* FIXME to be fixed */
+	if (qid >= vdpa->nr_vring)
+		vdpa->nr_vring = qid + 1;
+
+	vring = &vdpa->vring_info[qid];
+
+	vring->desc_user_addr = addr->desc_user_addr;
+	vring->used_user_addr = addr->used_user_addr;
+	vring->avail_user_addr = addr->avail_user_addr;
+	vring->log_guest_addr = addr->log_guest_addr;
+
+	return 0;
+}
+
+static int vhost_set_vring_num(struct mdev_device *mdev,
+		struct vhost_vring_state *num)
+{
+	struct vdpa_dev *vdpa;
+	int qid = num->index;
+	struct vdpa_vring_info *vring;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	if (qid >= vdpa->max_vrings)
+		return -EINVAL;
+
+	vring = &vdpa->vring_info[qid];
+
+	vring->size = num->num;
+
+	return 0;
+}
+
+static int vhost_set_vring_base(struct mdev_device *mdev,
+		struct vhost_vring_state *base)
+{
+	struct vdpa_dev *vdpa;
+	int qid = base->index;
+	struct vdpa_vring_info *vring;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	if (qid >= vdpa->max_vrings)
+		return -EINVAL;
+
+	vring = &vdpa->vring_info[qid];
+
+	vring->base = base->num;
+
+	return 0;
+}
+
+static int vhost_get_vring_base(struct mdev_device *mdev,
+		struct vhost_vring_state *base)
+{
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	vdpa->pending_reply = true;
+	vdpa->pending.request = VHOST_GET_VRING_BASE;
+	vdpa->pending.payload.state.index = base->index;
+
+	// FIXME to be implemented
+
+	return 0;
+}
+
+static int vhost_set_features(struct mdev_device *mdev, u64 *features)
+{
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	vdpa->features = *features;
+
+	return 0;
+}
+
+static int vhost_get_features(struct mdev_device *mdev, u64 *features)
+{
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	vdpa->pending_reply = true;
+	vdpa->pending.request = VHOST_GET_FEATURES;
+	vdpa->pending.payload.u64 =
+		vdpa->ops->supported_features(vdpa);
+
+	return 0;
+}
+
+static int vhost_set_owner(struct mdev_device *mdev)
+{
+	return 0;
+}
+
+static int vhost_reset_owner(struct mdev_device *mdev)
+{
+	return 0;
+}
+
+static int vhost_set_state(struct mdev_device *mdev, u64 *state)
+{
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	if (*state >= VHOST_DEVICE_S_MAX)
+		return -EINVAL;
+
+	if (vdpa->state == *state)
+		return 0;
+
+	vdpa->state = *state;
+
+	switch (vdpa->state) {
+	case VHOST_DEVICE_S_RUNNING:
+		vdpa->ops->start(vdpa);
+		break;
+	case VHOST_DEVICE_S_STOPPED:
+		vdpa->ops->stop(vdpa);
+		break;
+	}
+
+	return 0;
+}
+
+static ssize_t vdpa_handle_bar0_write(struct mdev_device *mdev,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	struct vhost_vfio_op *op = NULL;
+	loff_t pos = *ppos;
+	loff_t offset;
+	int ret;
+
+	offset = VDPA_VFIO_PCI_BAR_OFFSET(pos);
+	if (offset != 0) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (count < VHOST_VFIO_OP_HDR_SIZE) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	op = kzalloc(VHOST_VFIO_OP_HDR_SIZE + VHOST_VFIO_OP_PAYLOAD_MAX_SIZE,
+		     GFP_KERNEL);
+	if (op == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (copy_from_user(op, buf, VHOST_VFIO_OP_HDR_SIZE)) {
+		ret = -EINVAL;
+		goto out_free;
+	}
+
+	if (op->size > VHOST_VFIO_OP_PAYLOAD_MAX_SIZE ||
+	    op->size + VHOST_VFIO_OP_HDR_SIZE != count) {
+		ret = -EINVAL;
+		goto out_free;
+	}
+
+	if (copy_from_user(&op->payload, buf + VHOST_VFIO_OP_HDR_SIZE,
+			   op->size)) {
+		ret = -EFAULT;
+		goto out_free;
+	}
+
+	switch (op->request) {
+	case VHOST_SET_LOG_BASE:
+		break;
+	case VHOST_SET_MEM_TABLE:
+		vhost_set_mem_table(mdev, &op->payload.memory);
+		break;
+	case VHOST_SET_VRING_ADDR:
+		vhost_set_vring_addr(mdev, &op->payload.addr);
+		break;
+	case VHOST_SET_VRING_NUM:
+		vhost_set_vring_num(mdev, &op->payload.state);
+		break;
+	case VHOST_SET_VRING_BASE:
+		vhost_set_vring_base(mdev, &op->payload.state);
+		break;
+	case VHOST_GET_VRING_BASE:
+		vhost_get_vring_base(mdev, &op->payload.state);
+		break;
+	case VHOST_SET_FEATURES:
+		vhost_set_features(mdev, &op->payload.u64);
+		break;
+	case VHOST_GET_FEATURES:
+		vhost_get_features(mdev, &op->payload.u64);
+		break;
+	case VHOST_SET_OWNER:
+		vhost_set_owner(mdev);
+		break;
+	case VHOST_RESET_OWNER:
+		vhost_reset_owner(mdev);
+		break;
+	case VHOST_DEVICE_SET_STATE:
+		vhost_set_state(mdev, &op->payload.u64);
+		break;
+	default:
+		break;
+	}
+
+	ret = count;
+
+out_free:
+	kfree(op);
+out:
+	return ret;
+}
+
+static ssize_t vdpa_handle_bar1_write(struct mdev_device *mdev,
+		const char __user *buf, size_t count, loff_t *ppos)
+{
+	struct vdpa_dev *vdpa;
+	int qid;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	if (count < sizeof(qid))
+		return -EINVAL;
+
+	if (copy_from_user(&qid, buf, sizeof(qid)))
+		return -EINVAL;
+
+	vdpa->ops->notify(vdpa, qid);
+
+	return count;
+}
+
+ssize_t vdpa_write(struct mdev_device *mdev, const char __user *buf,
+		   size_t count, loff_t *ppos)
+{
+	int done = 0;
+	unsigned int index;
+	loff_t pos = *ppos;
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	mutex_lock(&vdpa->ops_lock);
+
+	index = VDPA_VFIO_PCI_OFFSET_TO_INDEX(pos);
+
+	switch (index) {
+	case VFIO_PCI_CONFIG_REGION_INDEX:
+		done = vdpa_handle_pcicfg_write(mdev, buf, count, ppos);
+		break;
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		done = vdpa_handle_bar0_write(mdev, buf, count, ppos);
+		break;
+	case VFIO_PCI_BAR1_REGION_INDEX:
+		done = vdpa_handle_bar1_write(mdev, buf, count, ppos);
+		break;
+	}
+
+	if (done > 0)
+		*ppos += done;
+
+	mutex_unlock(&vdpa->ops_lock);
+
+	return done;
+}
+EXPORT_SYMBOL(vdpa_write);
+
+static int vdpa_get_region_info(struct mdev_device *mdev,
+				struct vfio_region_info *region_info,
+				u16 *cap_type_id, void **cap_type)
+{
+	struct vdpa_dev *vdpa;
+	u32 bar_index;
+	u64 size = 0;
+
+	if (!mdev)
+		return -EINVAL;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -EINVAL;
+
+	bar_index = region_info->index;
+	if (bar_index >= VFIO_PCI_NUM_REGIONS)
+		return -EINVAL;
+
+	mutex_lock(&vdpa->ops_lock);
+
+	switch (bar_index) {
+	case VFIO_PCI_CONFIG_REGION_INDEX:
+		size = VDPA_CONFIG_SIZE;
+		break;
+	case VFIO_PCI_BAR0_REGION_INDEX:
+		size = VDPA_BAR0_SIZE;
+		break;
+	case VFIO_PCI_BAR1_REGION_INDEX:
+		size = (u64)vdpa->max_vrings << PAGE_SHIFT;
+		break;
+	default:
+		size = 0;
+		break;
+	}
+
+	// FIXME: mark BAR1 as mmap-able (VFIO_REGION_INFO_FLAG_MMAP)
+	region_info->size = size;
+	region_info->offset = VDPA_VFIO_PCI_INDEX_TO_OFFSET(bar_index);
+	region_info->flags = VFIO_REGION_INFO_FLAG_READ |
+		VFIO_REGION_INFO_FLAG_WRITE;
+	mutex_unlock(&vdpa->ops_lock);
+	return 0;
+}
+
+static int vdpa_reset(struct mdev_device *mdev)
+{
+	struct vdpa_dev *vdpa;
+
+	if (!mdev)
+		return -EINVAL;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int vdpa_get_device_info(struct mdev_device *mdev,
+				struct vfio_device_info *dev_info)
+{
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	dev_info->flags = VFIO_DEVICE_FLAGS_PCI;
+	dev_info->num_regions = VFIO_PCI_NUM_REGIONS;
+	dev_info->num_irqs = vdpa->max_vrings;
+
+	return 0;
+}
+
+static int vdpa_get_irq_info(struct mdev_device *mdev,
+			     struct vfio_irq_info *info)
+{
+	struct vdpa_dev *vdpa;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	if (info->index != VFIO_PCI_MSIX_IRQ_INDEX)
+		return -ENOTSUPP;
+
+	info->flags = VFIO_IRQ_INFO_EVENTFD;
+	info->count = vdpa->max_vrings;
+
+	return 0;
+}
+
+static int vdpa_set_irqs(struct mdev_device *mdev, uint32_t flags,
+			 unsigned int index, unsigned int start,
+			 unsigned int count, void *data)
+{
+	struct vdpa_dev *vdpa;
+	int *fd = data, i;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -EINVAL;
+
+	if (index != VFIO_PCI_MSIX_IRQ_INDEX)
+		return -ENOTSUPP;
+
+	for (i = 0; i < count; i++)
+		vdpa->ops->set_eventfd(vdpa, start + i,
+			(flags & VFIO_IRQ_SET_DATA_EVENTFD) ? fd[i] : -1);
+
+	return 0;
+}
+
+long vdpa_ioctl(struct mdev_device *mdev, unsigned int cmd, unsigned long arg)
+{
+	int ret = 0;
+	unsigned long minsz;
+	struct vdpa_dev *vdpa;
+
+	if (!mdev)
+		return -EINVAL;
+
+	vdpa = mdev_get_drvdata(mdev);
+	if (!vdpa)
+		return -ENODEV;
+
+	switch (cmd) {
+	case VFIO_DEVICE_GET_INFO:
+	{
+		struct vfio_device_info info;
+
+		minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		ret = vdpa_get_device_info(mdev, &info);
+		if (ret)
+			return ret;
+
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			return -EFAULT;
+
+		return 0;
+	}
+	case VFIO_DEVICE_GET_REGION_INFO:
+	{
+		struct vfio_region_info info;
+		u16 cap_type_id = 0;
+		void *cap_type = NULL;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		ret = vdpa_get_region_info(mdev, &info, &cap_type_id,
+					   &cap_type);
+		if (ret)
+			return ret;
+
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			return -EFAULT;
+
+		return 0;
+	}
+	case VFIO_DEVICE_GET_IRQ_INFO:
+	{
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz || info.index >= vdpa->max_vrings)
+			return -EINVAL;
+
+		ret = vdpa_get_irq_info(mdev, &info);
+		if (ret)
+			return ret;
+
+		if (copy_to_user((void __user *)arg, &info, minsz))
+			return -EFAULT;
+
+		return 0;
+	}
+	case VFIO_DEVICE_SET_IRQS:
+	{
+		struct vfio_irq_set hdr;
+		size_t data_size = 0;
+		u8 *data = NULL;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		ret = vfio_set_irqs_validate_and_prepare(&hdr, vdpa->max_vrings,
+							 VFIO_PCI_NUM_IRQS,
+							 &data_size);
+		if (ret)
+			return ret;
+
+		if (data_size) {
+			data = memdup_user((void __user *)(arg + minsz),
+					   data_size);
+			if (IS_ERR(data))
+				return PTR_ERR(data);
+		}
+
+		ret = vdpa_set_irqs(mdev, hdr.flags, hdr.index, hdr.start,
+				hdr.count, data);
+
+		kfree(data);
+		return ret;
+	}
+	case VFIO_DEVICE_RESET:
+		return vdpa_reset(mdev);
+	}
+	return -ENOTTY;
+}
+EXPORT_SYMBOL(vdpa_ioctl);
+
+int vdpa_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
+{
+	// FIXME: to be implemented
+
+	return 0;
+}
+EXPORT_SYMBOL(vdpa_mmap);
+
+int vdpa_open(struct mdev_device *mdev)
+{
+	return 0;
+}
+EXPORT_SYMBOL(vdpa_open);
+
+void vdpa_close(struct mdev_device *mdev)
+{
+}
+EXPORT_SYMBOL(vdpa_close);
+
+MODULE_VERSION("0.0.0");
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Hardware virtio accelerator abstraction");
diff --git a/include/linux/vdpa_mdev.h b/include/linux/vdpa_mdev.h
new file mode 100644
index 000000000000..8414e86ba4b8
--- /dev/null
+++ b/include/linux/vdpa_mdev.h
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Intel Corporation.
+ */
+
+#ifndef VDPA_MDEV_H
+#define VDPA_MDEV_H
+
+#define VDPA_CONFIG_SIZE 0xff
+
+struct mdev_device;
+struct vdpa_dev;
+
+/*
+ * XXX: Any comments about the vDPA API design for drivers
+ *      would be appreciated!
+ */
+
+typedef int (*vdpa_start_device_t)(struct vdpa_dev *vdpa);
+typedef int (*vdpa_stop_device_t)(struct vdpa_dev *vdpa);
+typedef int (*vdpa_dma_map_t)(struct vdpa_dev *vdpa);
+typedef int (*vdpa_dma_unmap_t)(struct vdpa_dev *vdpa);
+typedef int (*vdpa_set_eventfd_t)(struct vdpa_dev *vdpa, int vector, int fd);
+typedef u64 (*vdpa_supported_features_t)(struct vdpa_dev *vdpa);
+typedef void (*vdpa_notify_device_t)(struct vdpa_dev *vdpa, int qid);
+typedef u64 (*vdpa_get_notify_addr_t)(struct vdpa_dev *vdpa, int qid);
+
+struct vdpa_device_ops {
+	vdpa_start_device_t		start;
+	vdpa_stop_device_t		stop;
+	vdpa_dma_map_t			dma_map;
+	vdpa_dma_unmap_t		dma_unmap;
+	vdpa_set_eventfd_t		set_eventfd;
+	vdpa_supported_features_t	supported_features;
+	vdpa_notify_device_t		notify;
+	vdpa_get_notify_addr_t		get_notify_addr;
+};
+
+struct vdpa_vring_info {
+	u64 desc_user_addr;
+	u64 used_user_addr;
+	u64 avail_user_addr;
+	u64 log_guest_addr;
+	u16 size;
+	u16 base;
+};
+
+struct vdpa_dev {
+	struct mdev_device *mdev;
+	struct mutex ops_lock;
+	u8 vconfig[VDPA_CONFIG_SIZE];
+	int nr_vring;
+	u64 features;
+	u64 state;
+	struct vhost_memory *mem_table;
+	bool pending_reply;
+	struct vhost_vfio_op pending;
+	const struct vdpa_device_ops *ops;
+	void *private;
+	int max_vrings;
+	struct vdpa_vring_info vring_info[0];
+};
+
+struct vdpa_dev *vdpa_alloc(struct mdev_device *mdev, void *private,
+			    int max_vrings);
+void vdpa_free(struct vdpa_dev *vdpa);
+ssize_t vdpa_read(struct mdev_device *mdev, char __user *buf,
+		  size_t count, loff_t *ppos);
+ssize_t vdpa_write(struct mdev_device *mdev, const char __user *buf,
+		   size_t count, loff_t *ppos);
+long vdpa_ioctl(struct mdev_device *mdev, unsigned int cmd, unsigned long arg);
+int vdpa_mmap(struct mdev_device *mdev, struct vm_area_struct *vma);
+int vdpa_open(struct mdev_device *mdev);
+void vdpa_close(struct mdev_device *mdev);
+
+#endif /* VDPA_MDEV_H */
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index c51f8e5cc608..92a1ca0b5fe1 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -207,4 +207,30 @@ struct vhost_scsi_target {
 #define VHOST_VSOCK_SET_GUEST_CID	_IOW(VHOST_VIRTIO, 0x60, __u64)
 #define VHOST_VSOCK_SET_RUNNING		_IOW(VHOST_VIRTIO, 0x61, int)
 
+/* VHOST_DEVICE specific defines */
+
+#define VHOST_DEVICE_SET_STATE _IOW(VHOST_VIRTIO, 0x70, __u64)
+
+#define VHOST_DEVICE_S_STOPPED 0
+#define VHOST_DEVICE_S_RUNNING 1
+#define VHOST_DEVICE_S_MAX     2
+
+struct vhost_vfio_op {
+	__u64 request;
+	__u32 flags;
+	/* Flag values: */
+#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
+	__u32 size;
+	union {
+		__u64 u64;
+		struct vhost_vring_state state;
+		struct vhost_vring_addr addr;
+		struct vhost_memory memory;
+	} payload;
+};
+
+#define VHOST_VFIO_OP_HDR_SIZE \
+		((unsigned long)&((struct vhost_vfio_op *)NULL)->payload)
+#define VHOST_VFIO_OP_PAYLOAD_MAX_SIZE 1024 /* FIXME TBD */
+
 #endif
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH v3 2/2] net: usb: asix88179_178a: de-duplicate code
From: Alexander Kurz @ 2018-04-02 15:21 UTC (permalink / raw)
  To: David Miller; +Cc: marc.zyngier, afd, linux-usb, netdev, freddy
In-Reply-To: <20180402.101410.2105582698262762699.davem@davemloft.net>

Hi Marc, David,
with the v2 patch ("net: usb: asix88179_178a: de-duplicate code")
I made an embarrasly stupid mistake of removing the wrong function.
The v2 patch accidentially changed ax88179_link_reset() instead of 
ax88179_reset(). Hunk 6 of v2 ("net: usb: asix88179_178a: de-duplicate 
code") is just utterly wrong.

ax88179_bind() and ax88179_reset() were the correct targets to be 
de-duplicated, as done in the v3 patch.

Sorry for this, Alexander

On Mon, 2 Apr 2018, David Miller wrote:

> From: Marc Zyngier <marc.zyngier@arm.com>
> Date: Mon, 02 Apr 2018 10:45:40 +0100
> 
> > What has changed between this patch and the previous one? Having a bit
> > of a change-log would certainly help. Also, I would have appreciated a
> > reply to the questions I had on v2 before you posted a third version.
> 
> Agreed, and I'm not applying these patches until this is sorted out
> and explained properly.
> 

^ permalink raw reply

* Re: [PATCH net-next] net: ipv6/gre: Add GRO support
From: Eric Dumazet @ 2018-04-02 15:19 UTC (permalink / raw)
  To: Eran Ben Elisha, Eric Dumazet
  Cc: Tariq Toukan, David S. Miller, Linux Netdev List, Eran Ben Elisha,
	Eric Dumazet
In-Reply-To: <CAKHjkjm8JE_N_v5HCmqokmzxmfYDd=T4cwqaBSRAXNezg-hNYg@mail.gmail.com>



On 04/02/2018 08:00 AM, Eran Ben Elisha wrote:
>>>> Seems good, but why isn't this handled directly in GRO native layer ?
>>> ip6_tunnel and ip6_gre do not share initialization flow functions (unlike ipv4).
>>> Changing the ipv6 init infrastructure should not be part of this
>>> patch. we prefer to keep this one minimal, simple and safe.
>>
>>
>>
>> Looking at gre_gro_receive() and gre_gro_complete() I could not see why they
>> could not be copied/pasted to IPv6.
> 
> These functions to handle GRO over GRE are already assigned in
> gre_offload_init() (in net/ipv4/gre_offload.c under CONFIG_IPV6).
> However without initializing the gro_cells, the receive path will not
> go via napi_gro_receive path, but directly to netif_rx.
> So AFAIU, only gcells->cells was missing for gro_cells_receive to
> really go via GRO flow.
> 
>>
>> Maybe give more details on the changelog, it is really not obvious.
> Hopefully the above filled this request.
>>

Not really :/

gro_cells_receive() is not really useful with native GRO, since packet is already
a GRO packet by the time it reaches ip_tunnel_rcv() or __ip6_tnl_rcv()

Sure, it might be usefull if native GRO (happening on eth0 if you prefer) did not
handle a particular encapsulation.

gro_cell was a work around before we extended GRO to be able to decap some tunnel headers.

It seems we have to extend this to also support GRE6.

^ permalink raw reply

* Re: [PATCH] connector: add parent pid and tgid to coredump and exit events
From: Stefan Strogin @ 2018-04-02 15:18 UTC (permalink / raw)
  To: David Miller
  Cc: zbr, netdev, linux-kernel, xe-linux-external, jderehag,
	matt.helsley, minipli
In-Reply-To: <20180330.125921.653839794312978457.davem@davemloft.net>

Hi David,

I don't see how it breaks UAPI. The point is that structures
coredump_proc_event and exit_proc_event are members of *union*
event_data, thus position of the existing data in the structure is
unchanged. Furthermore, this change won't increase size of struct
proc_event, because comm_proc_event (also a member of event_data) is
of bigger size than the changed structures.

If I'm wrong, could you please explain what exactly will the change
break in UAPI?


On 30/03/18 19:59, David Miller wrote:
> From: Stefan Strogin <sstrogin@cisco.com>
> Date: Thu, 29 Mar 2018 17:12:47 +0300
> 
>> diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h
>> index 68ff25414700..db210625cee8 100644
>> --- a/include/uapi/linux/cn_proc.h
>> +++ b/include/uapi/linux/cn_proc.h
>> @@ -116,12 +116,16 @@ struct proc_event {
>>  		struct coredump_proc_event {
>>  			__kernel_pid_t process_pid;
>>  			__kernel_pid_t process_tgid;
>> +			__kernel_pid_t parent_pid;
>> +			__kernel_pid_t parent_tgid;
>>  		} coredump;
>>  
>>  		struct exit_proc_event {
>>  			__kernel_pid_t process_pid;
>>  			__kernel_pid_t process_tgid;
>>  			__u32 exit_code, exit_signal;
>> +			__kernel_pid_t parent_pid;
>> +			__kernel_pid_t parent_tgid;
>>  		} exit;
>>  
>>  	} event_data;
> 
> I don't think you can add these members without breaking UAPI.
> 

^ permalink raw reply

* Re: [PATCH net v5 2/3] ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()
From: David Miller @ 2018-04-02 15:17 UTC (permalink / raw)
  To: alexey.kodanev; +Cc: netdev, edumazet, kafai
In-Reply-To: <1522677635-5364-3-git-send-email-alexey.kodanev@oracle.com>

From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Mon,  2 Apr 2018 17:00:34 +0300

> +++ b/net/ipv6/ip6_output.c
> @@ -1105,23 +1105,32 @@ struct dst_entry *ip6_dst_lookup_flow(const struct sock *sk, struct flowi6 *fl6,
>   *	@sk: socket which provides the dst cache and route info
>   *	@fl6: flow to lookup
>   *	@final_dst: final destination address for ipsec lookup
> + *	@connected: whether @sk is connected or not
 ...
>  struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
> -					 const struct in6_addr *final_dst)
> +					 const struct in6_addr *final_dst,
> +					 int connected)

Please use type 'bool' and true/false for this new parameter.

Thank you.

^ permalink raw reply

* Re: [PATCH v3 0/2] net: mvneta: improve suspend/resume
From: David Miller @ 2018-04-02 15:14 UTC (permalink / raw)
  To: Jisheng.Zhang
  Cc: thomas.petazzoni, linux, linux-arm-kernel, netdev, linux-kernel
In-Reply-To: <20180402112229.508e1feb@xhacker.debian>

From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Date: Mon, 2 Apr 2018 11:22:29 +0800

> This series tries to optimize the mvneta's suspend/resume
> implementation by only taking necessary actions.
> 
> Since v2:
>  - keep rtnl lock when calling mvneta_start_dev() and mvneta_stop_dev()
>    Thank Russell for pointing this out
> 
> Since v1:
>  - unify ret check
>  - try best to keep the suspend/resume behavior
>  - split txq deinit into sw/hw parts as well
>  - adjust mvneta_stop_dev() location

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] bridge: Allow max MTU when multiple VLANs present
From: Roopa Prabhu @ 2018-04-02 15:08 UTC (permalink / raw)
  To: Chas Williams
  Cc: Toshiaki Makita, David Miller, netdev, Stephen Hemminger,
	Nikolay Aleksandrov
In-Reply-To: <CAG2-Gk=F+nJ97v1s8Jz3Jz8NJxmNPG4P1HLejfGu5SROcbNAug@mail.gmail.com>

On Fri, Mar 30, 2018 at 12:54 PM, Chas Williams <3chas3@gmail.com> wrote:
> On Thu, Mar 29, 2018 at 9:02 PM, Toshiaki Makita
> <makita.toshiaki@lab.ntt.co.jp> wrote:
>> On 2018/03/30 1:49, Roopa Prabhu wrote:
>>> On Thu, Mar 22, 2018 at 9:53 PM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>>>> On Thu, Mar 22, 2018 at 8:34 AM, Chas Williams <3chas3@gmail.com> wrote:
>>>>> If the bridge is allowing multiple VLANs, some VLANs may have
>>>>> different MTUs.  Instead of choosing the minimum MTU for the
>>>>> bridge interface, choose the maximum MTU of the bridge members.
>>>>> With this the user only needs to set a larger MTU on the member
>>>>> ports that are participating in the large MTU VLANS.
>>>>>
>>>>> Signed-off-by: Chas Williams <3chas3@gmail.com>
>>>>> ---
>>>>
>>>> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> This or an equivalent fix is necessary: as stated above, today the
>>>> bridge mtu capped at min port mtu limits all
>>>> vlan devices on top of the vlan filtering bridge to min port mtu.
>>>
>>>
>>> On further thought, since this patch changes default behavior, it may
>>> upset people. ie with this patch, a vlan device
>>> on the bridge by default will now use the  bridge max mtu and that
>>> could cause unexpected drops in the bridge driver
>>> if the xmit port had a lower mtu. This may surprise users.
>
> It only changes the default behavior when you are using VLAN aware bridges.
> The behavior remains the same otherwise.  I don't know if VLAN aware bridges
> are that popular yet so there probably isn't any particular
> expectation from those
> bridges.

they are popular...in-fact they are the default bridge mode on our
network switches.
And they have been around for some time now to ignore its users.
Plus it is not right to change default mtu behavior for one mode of the bridge
and not the others (bridge mtu handling from user-space is complex enough today
due to dynamic mtu changes on port enslave/deslave).


>
> I don't think those drops are unexpected.  If a user has misconfigured
> the bridge
> we can't be expected to fix that for them.  It is the user's
> responsbility to ensure
> that the ports on the VLAN have a size consistent with the traffic
> they expect to
> pass.
>

By default they are not expected today. The problem is changing the bridge
to max mtu changes 'all' the vlan devices on top of the vlan aware bridge to
max mtu by default which makes drops at the bridge driver more common if the
user had mixed mtu on its ports.

^ permalink raw reply

* Re: [net-next PATCH v3 00/11] Add support for netcp driver on K2G SoC
From: Murali Karicheri @ 2018-04-02 15:07 UTC (permalink / raw)
  To: David Miller
  Cc: robh+dt, mark.rutland, ssantosh, malat, w-kwok2, devicetree,
	linux-kernel, linux-arm-kernel, netdev
In-Reply-To: <20180402.104009.2227758352325722680.davem@davemloft.net>

On 04/02/2018 10:40 AM, David Miller wrote:
> 
> The net-next tree is closed, please resubmit this after the merge window and
> the net-next tree is open back up again.
> 
Ok. Will do. Thanks

-- 
Murali Karicheri
Linux Kernel, Keystone

^ permalink raw reply

* Re: [PATCH net-next] net: ipv6/gre: Add GRO support
From: Eran Ben Elisha @ 2018-04-02 15:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tariq Toukan, David S. Miller, Linux Netdev List, Eran Ben Elisha,
	Eric Dumazet
In-Reply-To: <e0b35a89-2642-d83e-2704-0a76804acb34@gmail.com>

>>> Seems good, but why isn't this handled directly in GRO native layer ?
>> ip6_tunnel and ip6_gre do not share initialization flow functions (unlike ipv4).
>> Changing the ipv6 init infrastructure should not be part of this
>> patch. we prefer to keep this one minimal, simple and safe.
>
>
>
> Looking at gre_gro_receive() and gre_gro_complete() I could not see why they
> could not be copied/pasted to IPv6.

These functions to handle GRO over GRE are already assigned in
gre_offload_init() (in net/ipv4/gre_offload.c under CONFIG_IPV6).
However without initializing the gro_cells, the receive path will not
go via napi_gro_receive path, but directly to netif_rx.
So AFAIU, only gcells->cells was missing for gro_cells_receive to
really go via GRO flow.

>
> Maybe give more details on the changelog, it is really not obvious.
Hopefully the above filled this request.
>

^ permalink raw reply

* Re: [PATCH net-next 0/5] virtio-net:  Add SCTP checksum offload support
From: Marcelo Ricardo Leitner @ 2018-04-02 14:47 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, linux-sctp, virtualization, mst, jasowang, nhorman,
	Vladislav Yasevich
In-Reply-To: <20180402134006.10111-1-vyasevic@redhat.com>

On Mon, Apr 02, 2018 at 09:40:01AM -0400, Vladislav Yasevich wrote:
> Now that we have SCTP offload capabilities in the kernel, we can add
> them to virtio as well.  First step is SCTP checksum.

Thanks.

> As for GSO, the way sctp GSO is currently implemented buys us nothing
> in added support to virtio.  To add true GSO, would require a lot of
> re-work inside of SCTP and would require extensions to the virtio
> net header to carry extra sctp data.

Can you please elaborate more on this? Is this because SCTP GSO relies
on the gso skb format for knowing how to segment it instead of having
a list of sizes?

  Marcelo

^ permalink raw reply

* Re: [PATCH net-next] net: ipv6/gre: Add GRO support
From: Eric Dumazet @ 2018-04-02 14:42 UTC (permalink / raw)
  To: Eran Ben Elisha
  Cc: Tariq Toukan, David S. Miller, Linux Netdev List, Eran Ben Elisha,
	Eric Dumazet
In-Reply-To: <CAKHjkjkk4ktJbh+3NJvS+PF+V1ZCKGE8ACfrOj0SE19m3-1oeA@mail.gmail.com>



On 04/02/2018 05:40 AM, Eran Ben Elisha wrote:
> On Sun, Apr 1, 2018 at 7:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>> On 04/01/2018 06:17 AM, Tariq Toukan wrote:
>>> From: Eran Ben Elisha <eranbe@mellanox.com>
>>>
>>> Add GRO capability for IPv6 GRE tunnel and ip6erspan tap, via gro_cells
>>> infrastructure.
>>>
>>> Performance testing: 55% higher badwidth.
>>> Measuring bandwidth of 1 thread IPv4 TCP traffic over IPv6 GRE tunnel
>>> while GRO on the physical interface is disabled.
>>> CPU: Intel Xeon E312xx (Sandy Bridge)
>>> NIC: Mellanox Technologies MT27700 Family [ConnectX-4]
>>> Before (GRO not working in tunnel) : 2.47 Gbits/sec
>>> After  (GRO working in tunnel)     : 3.85 Gbits/sec
>>>
>>> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
>>> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>>> CC: Eric Dumazet <edumazet@google.com>
>>> ---
>>
>>
>> Seems good, but why isn't this handled directly in GRO native layer ?
> ip6_tunnel and ip6_gre do not share initialization flow functions (unlike ipv4).
> Changing the ipv6 init infrastructure should not be part of this
> patch. we prefer to keep this one minimal, simple and safe.



Looking at gre_gro_receive() and gre_gro_complete() I could not see why they
could not be copied/pasted to IPv6.

Maybe give more details on the changelog, it is really not obvious.

^ permalink raw reply

* Re: [PATCH net-next V2 0/4] Introduce adaptive TX interrupt moderation to net DIM
From: Tal Gilboa @ 2018-04-02 14:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tariqt, saeedm, f.fainelli
In-Reply-To: <20180402.102745.1490063389768603049.davem@davemloft.net>

On 4/2/2018 5:27 PM, David Miller wrote:
> From: Tal Gilboa <talgi@mellanox.com>
> Date: Mon,  2 Apr 2018 16:59:30 +0300
> 
>> Net DIM is a library designed for dynamic interrupt moderation. It was
>> implemented and optimized with receive side interrupts in mind, since these
>> are usually the CPU expensive ones. This patch-set introduces adaptive transmit
>> interrupt moderation to net DIM, complete with a usage in the mlx5e driver.
>> Using adaptive TX behavior would reduce interrupt rate for multiple scenarios.
>> Furthermore, it is essential for increasing bandwidth on cases where payload
>> aggregation is required.
>>
>> v2: Rebased over proper tree.
>>
>> v1: Fix compilation issues due to missed function renaming.
> 
> This series still needs fixes, and the net-next tree has closed meanwhile.
> 
> And to be honest, handling this series has been very painful for me so far.
> The patches either didn't apply or didn't even compile.
> 
> Please do not resubmit this until the merge window is over and the net-next
> tree opens up again.
> 
> Thank you.
> 
Ack.

^ permalink raw reply

* Re: [PATCH] net: implement IP_RECVHDRS option to get full headers through recvmsg cmsg.
From: David Miller @ 2018-04-02 14:42 UTC (permalink / raw)
  To: zenczykowski; +Cc: maze, netdev, lrizzo, edumazet
In-Reply-To: <20180401054314.33578-1-zenczykowski@gmail.com>

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Sat, 31 Mar 2018 22:43:14 -0700

> From: Luigi Rizzo <lrizzo@google.com>
> 
> We have all sorts of different ways to fetch pre-UDP payload metadata:
>   IP_RECVTOS
>   IP_RECVTTL
>   IP_RECVOPTS
>   IP_RETOPTS
> 
> But nothing generic which simply allows you to receive the entire packet header.
> 
> This is in similar vein to TCP_SAVE_SYN but for UDP and other datagram sockets.
> 
> This is envisioned as a way to get GUE extension metadata for encapsulated
> packets, but implemented in a way to be much more future proof.
> 
> (Implemented by Luigi, who asked me to send it upstream)
> 
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Luigi Rizzo <lrizzo@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

This is an ipv4 level socket option, so why are you copying in the MAC
header(s)?

That part I don't like at all.

First of all, you have no idea what the link level protocol is for that
MAC header, therefore how could you even begin to interpret it's contents
correctly?

Second of all, MAC level details belong not in AF_INET socket interfaces.

Thank you.

^ permalink raw reply

* Re: [net-next PATCH v3 00/11] Add support for netcp driver on K2G SoC
From: David Miller @ 2018-04-02 14:40 UTC (permalink / raw)
  To: m-karicheri2
  Cc: robh+dt, mark.rutland, ssantosh, malat, w-kwok2, devicetree,
	linux-kernel, linux-arm-kernel, netdev
In-Reply-To: <1522679881-25643-1-git-send-email-m-karicheri2@ti.com>


The net-next tree is closed, please resubmit this after the merge window and
the net-next tree is open back up again.

^ permalink raw reply

* [net-next PATCH v3 11/11] net: netcp: support probe deferral
From: Murali Karicheri @ 2018-04-02 14:38 UTC (permalink / raw)
  To: robh+dt, mark.rutland, ssantosh, malat, w-kwok2, devicetree,
	linux-kernel, linux-arm-kernel, davem, netdev
In-Reply-To: <1522679881-25643-1-git-send-email-m-karicheri2@ti.com>

The netcp driver shouldn't proceed until the knav qmss and dma
devices are ready. So return -EPROBE_DEFER if these devices are not
ready.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
---
 drivers/net/ethernet/ti/netcp_core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 9c51b25..736f6f7 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -2158,6 +2158,10 @@ static int netcp_probe(struct platform_device *pdev)
 	struct netcp_module *module;
 	int ret;
 
+	if (!knav_dma_device_ready() ||
+	    !knav_qmss_device_ready())
+		return -EPROBE_DEFER;
+
 	if (!node) {
 		dev_err(dev, "could not find device info\n");
 		return -ENODEV;
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox