Netdev List
 help / color / mirror / Atom feed
* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: John Fastabend @ 2016-11-28  2:26 UTC (permalink / raw)
  To: Roi Dayan, Daniel Borkmann, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko
In-Reply-To: <583A7D67.50003@mellanox.com>

On 16-11-26 10:29 PM, Roi Dayan wrote:
> 
> 
> On 27/11/2016 06:47, Roi Dayan wrote:
>>
>>
>> On 27/11/2016 02:33, Daniel Borkmann wrote:
>>> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
>>>> On 11/26/2016 07:46 AM, Cong Wang wrote:
>>>>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann
>>>>> <daniel@iogearbox.net> wrote:
>>> [...]
>>>>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>>>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>>>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>>>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>>>>> Outstanding readers should either bail out due to if (!cl) or can
>>>>>> still
>>>>>> process the chain until read section ends, but during that time,
>>>>>> cl->q
>>>>>> resp. bstats should be good. Do you happen to know what's at address
>>>>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(),
>>>>>> but
>>>>>> at least on ingress (netif_receive_skb_internal()) we hold
>>>>>> rcu_read_lock()
>>>>>> here. The KASAN report is reliably happening at this location, right?
>>>>>
>>>>> I am confused as well, I don't see how it could be related to my
>>>>> patch yet.
>>>>> I will take a deep look in the weekend.
>>
>>
>>
>> Hi Cong,
>>
>> When reported the new trace I didn't mean it's related to your patch,
>> I just wanted to point it out it exposed something. I should have been
>> clear about it.
>>
>>
>>>>
>>>> Ok, I'm currently on the run. Got too late yesterday night, but I'll
>>>> write what I found in the evening today, not related to ingress though.
>>>
>>> Just pushed out my analysis to netdev under "[PATCH net] net, sched:
>>> respect
>>> rcu grace period on cls destruction". My conclusion is that both
>>> issues are
>>> actually separate, and that one is small enough where we could route
>>> it via
>>> net actually. Perhaps this at the same time shrinks your "[PATCH
>>> net-next]
>>> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
>>> reasonable size that it's suitable to net as well. Your
>>> ->delete()/->destroy()
>>> one is definitely needed, too. The tp->root one is independant of
>>> ->delete()/
>>> ->destroy() as they are different races and tp->root could also
>>> happen when
>>> you just destroy the whole tp directly. I think that seems like a
>>> good path
>>> forward to me.
>>>
>>> Thanks,
>>> Daniel
>>
>>
>>
>> Hi Daniel,
>>
>> As for the tainted kernel. I was in old (week or two) net-next tree
>> and only cherry-picked from latest net-next related patches to
>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted
>> modules.
>> I have the issue reproducing in that tree so wanted it to check it
>> with Cong's patch instead of latest net-next.
>> I'll try running reproducing the issue with your new patch and later
>> try latest net-next as well.
>>
>> Thanks,
>> Roi
>>
> 
> Hi,
> 
> I tested "[PATCH net] net, sched: respect rcu grace period on cls
> destruction" and could not reproduce my original issue.

Hi Roi,

Just so I'm 100% clear. No issue with just the above "respect rcu grace
period on cls destruction" per above statement.

> I rebased "[Patch net-next] net_sched: move the empty tp check from
> ->destroy() to ->delete()" over to test it in the same tree and got into
> a new trace in fl_delete.

In this case did you test with "net_sched: move the empty tp check from
->destroy() to ->delete()" _only_ or did this include both patches when
you see the error below.

>From my inspection we really need both patches to get correct behavior.

Thanks!
John

> 
> [35659.012123] BUG: KASAN: wild-memory-access on address 1ffffffff803ca31
> [35659.020042] Write of size 1 by task ovs-vswitchd/20135
> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted:
> G           O    4.9.0-rc3+ #18
> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
> [35659.043730] Call Trace:
> [35659.046619]  [<ffffffff95b6dc42>] dump_stack+0x63/0x81
> [35659.052456]  [<ffffffff955fbbf8>] kasan_report_error+0x408/0x4e0
> [35659.059402]  [<ffffffff955fc2e8>] kasan_report+0x58/0x60
> [35659.065428]  [<ffffffff952d5e8d>] ? call_rcu_sched+0x1d/0x20
> [35659.072119]  [<ffffffffc01e0701>] ? fl_destroy_filter+0x21/0x30
> [cls_flower]
> [35659.080217]  [<ffffffffc01e1ccf>] ? fl_delete+0x1df/0x2e0 [cls_flower]
> [35659.087580]  [<ffffffff955fa4ca>] __asan_store1+0x4a/0x50
> [35659.093697]  [<ffffffffc01e1ccf>] fl_delete+0x1df/0x2e0 [cls_flower]
> [35659.100870]  [<ffffffff9653ecba>] tc_ctl_tfilter+0x10da/0x1b90
> 
> 
> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
> 800             struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
> 801
> 802             rhashtable_remove_fast(&head->ht, &f->ht_node,
> 803                                    head->ht_params);
> 804             __fl_delete(tp, f);
> 805             *last = list_empty(&head->filters);
> 806             return 0;
> 807     }
> 
> 
> Thanks,
> Roi

^ permalink raw reply

* RE: [net,v2] neigh: fix the loop index error in neigh dump
From: 张胜举 @ 2016-11-28  2:34 UTC (permalink / raw)
  To: 'David Ahern', netdev
In-Reply-To: <caa11887-00b9-d2f1-7c0b-5b5096b42f56@cumulusnetworks.com>

> -----Original Message-----
> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:10 AM
> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> > Loop index in neigh dump function is not updated correctly under some
> > circumstances, this patch will fix it.
> 
> What's an example?

If dev is filtered out, the original code goes to next loop without updating
loop index 'idx'.

> 
> >
> > Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by
> > device index")
> > Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by
> > master device")
> >
> > Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
> > ---
> >  net/core/neighbour.c | 39 ++++++++++++++++++---------------------
> >  1 file changed, 18 insertions(+), 21 deletions(-)
> >
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> > 2ae929f..ce32e9c 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct
> net_device *dev, int filter_idx)
> >  	return false;
> >  }
> >
> > +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
> > +		int filter_master_idx)
> > +{
> > +	if (neigh_ifindex_filtered(dev, filter_idx) ||
> > +	    neigh_master_filtered(dev, filter_master_idx))
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> >  static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff
*skb,
> >  			    struct netlink_callback *cb)
> >  {
> > @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> >  	rcu_read_lock_bh();
> >  	nht = rcu_dereference_bh(tbl->nht);
> >
> > -	for (h = s_h; h < (1 << nht->hash_shift); h++) {
> > -		if (h > s_h)
> > -			s_idx = 0;
> > +	for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
> >  		for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
> >  		     n != NULL;
> > -		     n = rcu_dereference_bh(n->next)) {
> > -			if (!net_eq(dev_net(n->dev), net))
> > -				continue;
> > -			if (neigh_ifindex_filtered(n->dev, filter_idx))
> > +		     n = rcu_dereference_bh(n->next), idx++) {
> > +			if (idx < s_idx || !net_eq(dev_net(n->dev), net))
> >  				continue;
> > -			if (neigh_master_filtered(n->dev,
filter_master_idx))
> > +			if (neigh_dump_filtered(n->dev, filter_idx,
> > +						filter_master_idx))
> >  				continue;
> > -			if (idx < s_idx)
> > -				goto next;
> >  			if (neigh_fill_info(skb, n, NETLINK_CB(cb-
> >skb).portid,
> >  					    cb->nlh->nlmsg_seq,
> >  					    RTM_NEWNEIGH,
> > @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> >  				rc = -1;
> >  				goto out;
> >  			}
> > -next:
> > -			idx++;
> >  		}
> >  	}
> >  	rc = skb->len;
> > @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct
> > neigh_table *tbl, struct sk_buff *skb,
> >
> >  	read_lock_bh(&tbl->lock);
> >
> > -	for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
> > -		if (h > s_h)
> > -			s_idx = 0;
> > -		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
> > -			if (pneigh_net(n) != net)
> > +	for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
> > +		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next,
idx++)
> {
> > +			if (idx < s_idx || pneigh_net(n) != net)
> >  				continue;
> > -			if (idx < s_idx)
> > -				goto next;
> >  			if (pneigh_fill_info(skb, n, NETLINK_CB(cb-
> >skb).portid,
> >  					    cb->nlh->nlmsg_seq,
> >  					    RTM_NEWNEIGH,
> > @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> >  				rc = -1;
> >  				goto out;
> >  			}
> > -		next:
> > -			idx++;
> >  		}
> >  	}
> >
> 
> This fix is way to be complicated to be fixing anything related to
16660f0bd9
> or 21fdd092ac. Both of those commits added a continue:
> 
>                         if (neigh_ifindex_filtered(n->dev, filter_idx))
>                                 continue;
>                         if (neigh_master_filtered(n->dev,
filter_master_idx))
>                                 continue;
> 
> At best the continue is replaced by 'goto next;' and I am not convinced
that is
> right.
> 
> You are completely rewriting the dump loops.

I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.  The
other change is style related. 

Thanks,
Zhang Shengju

^ permalink raw reply

* Re: [net,v2] neigh: fix the loop index error in neigh dump
From: David Ahern @ 2016-11-28  2:39 UTC (permalink / raw)
  To: 张胜举, netdev
In-Reply-To: <001c01d2491f$fcd53250$f67f96f0$@cmss.chinamobile.com>

On 11/27/16 7:34 PM, 张胜举 wrote:
>> -----Original Message-----
>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>> Sent: Monday, November 28, 2016 10:10 AM
>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
>> netdev@vger.kernel.org
>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>
>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>> Loop index in neigh dump function is not updated correctly under some
>>> circumstances, this patch will fix it.
>>
>> What's an example?
> 
> If dev is filtered out, the original code goes to next loop without updating
> loop index 'idx'.

And you have a use case with missing or redundant data? Or is your comment based on a review of code only?


>> You are completely rewriting the dump loops.
> 
> I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.  The
> other change is style related. 

A "fixes" should not include 'style related' changes.

^ permalink raw reply

* [PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation
From: Florian Fainelli @ 2016-11-28  2:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, Florian Fainelli

Hi all,

This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.

Changes in v3:

- add Timur's feedback into patch 3

Changes in v2:

- clarify a few things in the RGMII section, add a paragraph about common issues
  with RGMII delay mismatches

Florian Fainelli (4):
  Documentation: net: phy: remove description of function pointers
  Documentation: net: phy: Add a paragraph about pause frames/flow
    control
  Documentation: net: phy: Add blurb about RGMII
  Documentation: net: phy: Add links to several standards documents

 Documentation/networking/phy.txt | 140 +++++++++++++++++++++++++++++----------
 1 file changed, 105 insertions(+), 35 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net-next v3 1/4] Documentation: net: phy: remove description of function pointers
From: Florian Fainelli @ 2016-11-28  2:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>

Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/phy.txt | 35 ++---------------------------------
 1 file changed, 2 insertions(+), 33 deletions(-)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 7ab9404a8412..4b25c0f24201 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -251,39 +251,8 @@ Writing a PHY driver
  PHY_BASIC_FEATURES, but you can look in include/mii.h for other
  features.
 
- Each driver consists of a number of function pointers:
-
-   soft_reset: perform a PHY software reset
-   config_init: configures PHY into a sane state after a reset.
-     For instance, a Davicom PHY requires descrambling disabled.
-   probe: Allocate phy->priv, optionally refuse to bind.
-   PHY may not have been reset or had fixups run yet.
-   suspend/resume: power management
-   config_aneg: Changes the speed/duplex/negotiation settings
-   aneg_done: Determines the auto-negotiation result
-   read_status: Reads the current speed/duplex/negotiation settings
-   ack_interrupt: Clear a pending interrupt
-   did_interrupt: Checks if the PHY generated an interrupt
-   config_intr: Enable or disable interrupts
-   remove: Does any driver take-down
-   ts_info: Queries about the HW timestamping status
-   match_phy_device: used for Clause 45 capable PHYs to match devices
-   in package and ensure they are compatible
-   hwtstamp: Set the PHY HW timestamping configuration
-   rxtstamp: Requests a receive timestamp at the PHY level for a 'skb'
-   txtsamp: Requests a transmit timestamp at the PHY level for a 'skb'
-   set_wol: Enable Wake-on-LAN at the PHY level
-   get_wol: Get the Wake-on-LAN status at the PHY level
-   link_change_notify: called to inform the core is about to change the
-   link state, can be used to work around bogus PHY between state changes
-   read_mmd_indirect: Read PHY MMD indirect register
-   write_mmd_indirect: Write PHY MMD indirect register
-   module_info: Get the size and type of an EEPROM contained in an plug-in
-   module
-   module_eeprom: Get EEPROM information of a plug-in module
-   get_sset_count: Get number of strings sets that get_strings will count
-   get_strings: Get strings from requested objects (statistics)
-   get_stats: Get the extended statistics from the PHY device
+ Each driver consists of a number of function pointers, documented
+ in include/linux/phy.h under the phy_driver structure.
 
  Of these, only config_aneg and read_status are required to be
  assigned by the driver code.  The rest are optional.  Also, it is
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next v3 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control
From: Florian Fainelli @ 2016-11-28  2:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>

Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/phy.txt | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 4b25c0f24201..9a42a9414cea 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything
  values pruned from them which don't make sense for your controller (a 10/100
  controller may be connected to a gigabit capable PHY, so you would need to
  mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
- for these bitfields. Note that you should not SET any bits, or the PHY may
- get put into an unsupported state.
+ for these bitfields. Note that you should not SET any bits, except the
+ SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
+ put into an unsupported state.
 
  Lastly, once the controller is ready to handle network traffic, you call
  phy_start(phydev).  This tells the PAL that you are ready, and configures the
@@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything
  When you want to disconnect from the network (even if just briefly), you call
  phy_stop(phydev).
 
+Pause frames / flow control
+
+ The PHY does not participate directly in flow control/pause frames except by
+ making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
+ MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
+ controller supports such a thing. Since flow control/pause frames generation
+ involves the Ethernet MAC driver, it is recommended that this driver takes care
+ of properly indicating advertisement and support for such features by setting
+ the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
+ either before or after phy_connect() and/or as a result of implementing the
+ ethtool::set_pauseparam feature.
+
+
 Keeping Close Tabs on the PAL
 
  It is possible that the PAL's built-in state machine needs a little help to
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next v3 3/4] Documentation: net: phy: Add blurb about RGMII
From: Florian Fainelli @ 2016-11-28  2:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>

RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/phy.txt | 77 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 9a42a9414cea..c7ba84b5d912 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -65,6 +65,83 @@ The MDIO bus
  drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
  for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
 
+(RG)MII/electrical interface considerations
+
+ The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
+ electrical signal interface using a synchronous 125Mhz clock signal and several
+ data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
+ between the clock line (RXC or TXC) and the data lines to let the PHY (clock
+ sink) have enough setup and hold times to sample the data lines correctly. The
+ PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
+ the PHY driver and optionally the MAC driver, implement the required delay. The
+ values of phy_interface_t must be understood from the perspective of the PHY
+ device itself, leading to the following:
+
+ * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+   internal delay by itself, it assumes that either the Ethernet MAC (if capable
+   or the PCB traces) insert the correct 1.5-2ns delay
+
+ * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
+   for the transmit data lines (TXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
+   for the receive data lines (RXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
+   both transmit AND receive data lines from/to the PHY device
+
+ Whenever possible, use the PHY side RGMII delay for these reasons:
+
+ * PHY devices may offer sub-nanosecond granularity in how they allow a
+   receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
+   precision may be required to account for differences in PCB trace lengths
+
+ * PHY devices are typically qualified for a large range of applications
+   (industrial, medical, automotive...), and they provide a constant and
+   reliable delay across temperature/pressure/voltage ranges
+
+ * PHY device drivers in PHYLIB being reusable by nature, being able to
+   configure correctly a specified delay enables more designs with similar delay
+   requirements to be operate correctly
+
+ For cases where the PHY is not capable of providing this delay, but the
+ Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
+ should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+ configured correctly in order to provide the required transmit and/or receive
+ side delay from the perspective of the PHY device. Conversely, if the Ethernet
+ MAC driver looks at the phy_interface_t value, for any other mode but
+ PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
+ disabled.
+
+ In case neither the Ethernet MAC, nor the PHY are capable of providing the
+ required delays, as defined per the RGMII standard, several options may be
+ available:
+
+ * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+   set of pins'strength, delays, and voltage; and it may be a suitable
+   option to insert the expected 2ns RGMII delay.
+
+ * Modifying the PCB design to include a fixed delay (e.g: using a specifically
+   designed serpentine), which may not require software configuration at all.
+
+Common problems with RGMII delay mismatch
+
+ When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
+ will most likely result in the clock and data line signals to be unstable when
+ the PHY or MAC take a snapshot of these signals to translate them into logical
+ 1 or 0 states and reconstruct the data being transmitted/received. Typical
+ symptoms include:
+
+ * Transmission/reception partially works, and there is frequent or occasional
+   packet loss observed
+
+ * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
+   or just discard them all
+
+ * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
+   (since there is enough setup/hold time in that case)
+
+
 Connecting to a PHY
 
  Sometime during startup, the network driver needs to establish a connection
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next v3 4/4] Documentation: net: phy: Add links to several standards documents
From: Florian Fainelli @ 2016-11-28  2:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>

Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/phy.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index c7ba84b5d912..e017d933d530 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -407,3 +407,13 @@ Board Fixups
  The stubs set one of the two matching criteria, and set the other one to
  match anything.
 
+Standards
+
+ IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
+ http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
+
+ RGMII v1.3:
+ http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
+
+ RGMII v2.0:
+ http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
-- 
2.9.3

^ permalink raw reply related

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: John Fastabend @ 2016-11-28  2:51 UTC (permalink / raw)
  To: Roi Dayan, Daniel Borkmann, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko
In-Reply-To: <583B95CE.7080309@gmail.com>

On 16-11-27 06:26 PM, John Fastabend wrote:
> On 16-11-26 10:29 PM, Roi Dayan wrote:
>>
>>
>> On 27/11/2016 06:47, Roi Dayan wrote:
>>>
>>>
>>> On 27/11/2016 02:33, Daniel Borkmann wrote:
>>>> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
>>>>> On 11/26/2016 07:46 AM, Cong Wang wrote:
>>>>>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann
>>>>>> <daniel@iogearbox.net> wrote:
>>>> [...]
>>>>>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>>>>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>>>>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>>>>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>>>>>> Outstanding readers should either bail out due to if (!cl) or can
>>>>>>> still
>>>>>>> process the chain until read section ends, but during that time,
>>>>>>> cl->q
>>>>>>> resp. bstats should be good. Do you happen to know what's at address
>>>>>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(),
>>>>>>> but
>>>>>>> at least on ingress (netif_receive_skb_internal()) we hold
>>>>>>> rcu_read_lock()
>>>>>>> here. The KASAN report is reliably happening at this location, right?
>>>>>>
>>>>>> I am confused as well, I don't see how it could be related to my
>>>>>> patch yet.
>>>>>> I will take a deep look in the weekend.
>>>
>>>
>>>
>>> Hi Cong,
>>>
>>> When reported the new trace I didn't mean it's related to your patch,
>>> I just wanted to point it out it exposed something. I should have been
>>> clear about it.
>>>
>>>
>>>>>
>>>>> Ok, I'm currently on the run. Got too late yesterday night, but I'll
>>>>> write what I found in the evening today, not related to ingress though.
>>>>
>>>> Just pushed out my analysis to netdev under "[PATCH net] net, sched:
>>>> respect
>>>> rcu grace period on cls destruction". My conclusion is that both
>>>> issues are
>>>> actually separate, and that one is small enough where we could route
>>>> it via
>>>> net actually. Perhaps this at the same time shrinks your "[PATCH
>>>> net-next]
>>>> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
>>>> reasonable size that it's suitable to net as well. Your
>>>> ->delete()/->destroy()
>>>> one is definitely needed, too. The tp->root one is independant of
>>>> ->delete()/
>>>> ->destroy() as they are different races and tp->root could also
>>>> happen when
>>>> you just destroy the whole tp directly. I think that seems like a
>>>> good path
>>>> forward to me.
>>>>
>>>> Thanks,
>>>> Daniel
>>>
>>>
>>>
>>> Hi Daniel,
>>>
>>> As for the tainted kernel. I was in old (week or two) net-next tree
>>> and only cherry-picked from latest net-next related patches to
>>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted
>>> modules.
>>> I have the issue reproducing in that tree so wanted it to check it
>>> with Cong's patch instead of latest net-next.
>>> I'll try running reproducing the issue with your new patch and later
>>> try latest net-next as well.
>>>
>>> Thanks,
>>> Roi
>>>
>>
>> Hi,
>>
>> I tested "[PATCH net] net, sched: respect rcu grace period on cls
>> destruction" and could not reproduce my original issue.
> 
> Hi Roi,
> 
> Just so I'm 100% clear. No issue with just the above "respect rcu grace
> period on cls destruction" per above statement.
> 
>> I rebased "[Patch net-next] net_sched: move the empty tp check from
>> ->destroy() to ->delete()" over to test it in the same tree and got into
>> a new trace in fl_delete.
> 
> In this case did you test with "net_sched: move the empty tp check from
> ->destroy() to ->delete()" _only_ or did this include both patches when
> you see the error below.
> 
> From my inspection we really need both patches to get correct behavior.
> 
> Thanks!
> John

Ah dang nevermind I just read both patches in detail and applying them
both at the same time is nonsense. Let me reply with comments directly
to the patches.

Thanks. sorry for the noise.

> 
>>
>> [35659.012123] BUG: KASAN: wild-memory-access on address 1ffffffff803ca31
>> [35659.020042] Write of size 1 by task ovs-vswitchd/20135
>> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted:
>> G           O    4.9.0-rc3+ #18
>> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
>> [35659.043730] Call Trace:
>> [35659.046619]  [<ffffffff95b6dc42>] dump_stack+0x63/0x81
>> [35659.052456]  [<ffffffff955fbbf8>] kasan_report_error+0x408/0x4e0
>> [35659.059402]  [<ffffffff955fc2e8>] kasan_report+0x58/0x60
>> [35659.065428]  [<ffffffff952d5e8d>] ? call_rcu_sched+0x1d/0x20
>> [35659.072119]  [<ffffffffc01e0701>] ? fl_destroy_filter+0x21/0x30
>> [cls_flower]
>> [35659.080217]  [<ffffffffc01e1ccf>] ? fl_delete+0x1df/0x2e0 [cls_flower]
>> [35659.087580]  [<ffffffff955fa4ca>] __asan_store1+0x4a/0x50
>> [35659.093697]  [<ffffffffc01e1ccf>] fl_delete+0x1df/0x2e0 [cls_flower]
>> [35659.100870]  [<ffffffff9653ecba>] tc_ctl_tfilter+0x10da/0x1b90
>>
>>
>> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
>> 800             struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
>> 801
>> 802             rhashtable_remove_fast(&head->ht, &f->ht_node,
>> 803                                    head->ht_params);
>> 804             __fl_delete(tp, f);
>> 805             *last = list_empty(&head->filters);
>> 806             return 0;
>> 807     }
>>
>>
>> Thanks,
>> Roi
> 

^ permalink raw reply

* [PATCH] net: handle no dst on skb in icmp6_send
From: David Ahern @ 2016-11-28  2:52 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, David Ahern

Andrey reported the following while fuzzing the kernel with syzkaller:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
 ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
 [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 ...

icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
cases the dst->dev should be preferred for determining the L3 domain
if the dst has been set on the skb. Fallback to the skb->dev if it has
not. This covers the case reported here where icmp6_send is invoked on
Rx before the route lookup.

Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv6/icmp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 7370ad2e693a..2772004ba5a1 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -447,8 +447,10 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 
 	if (__ipv6_addr_needs_scope_id(addr_type))
 		iif = skb->dev->ifindex;
-	else
-		iif = l3mdev_master_ifindex(skb_dst(skb)->dev);
+	else {
+		dst = skb_dst(skb);
+		iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev);
+	}
 
 	/*
 	 *	Must not send error if the source does not uniquely
-- 
2.1.4

^ permalink raw reply related

* RE: [net,v2] neigh: fix the loop index error in neigh dump
From: 张胜举 @ 2016-11-28  2:53 UTC (permalink / raw)
  To: 'David Ahern', netdev
In-Reply-To: <6d1a324e-e29e-e2dd-a6fc-1e9b4455cb3d@cumulusnetworks.com>



> -----Original Message-----
> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:39 AM
> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 7:34 PM, 张胜举 wrote:
> >> -----Original Message-----
> >> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> >> Sent: Monday, November 28, 2016 10:10 AM
> >> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
> >> netdev@vger.kernel.org
> >> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> >>
> >> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> >>> Loop index in neigh dump function is not updated correctly under
> >>> some circumstances, this patch will fix it.
> >>
> >> What's an example?
> >
> > If dev is filtered out, the original code goes to next loop without
> > updating loop index 'idx'.
> 
> And you have a use case with missing or redundant data? Or is your
> comment based on a review of code only?
It's on my code review. No use case currently,  this is uncommon to happen.


> 
> >> You are completely rewriting the dump loops.
> >
> > I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
> > The other change is style related.
> 
> A "fixes" should not include 'style related' changes.
Okay, I will send another version without style changes.

^ permalink raw reply

* Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
From: John Fastabend @ 2016-11-28  2:55 UTC (permalink / raw)
  To: Daniel Borkmann, davem; +Cc: xiyou.wangcong, roid, ast, hannes, jiri, netdev
In-Reply-To: <0d6d89f885033f1739e97f7f3372ae6e1db72892.1480204343.git.daniel@iogearbox.net>

On 16-11-26 04:18 PM, Daniel Borkmann wrote:
> Roi reported a crash in flower where tp->root was NULL in ->classify()
> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
> this doesn't respect RCU grace period for them, and as a result, still
> outstanding readers from tc_classify() will try to blindly dereference
> a NULL tp->root.
> 
> The tp->root object is strictly private to the classifier implementation
> and holds internal data the core such as tc_ctl_tfilter() doesn't know
> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
> is only checked for NULL in ->get() callback, but nowhere else. This is
> misleading and seemed to be copied from old classifier code that was not
> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
> fix NULL pointer dereference") moved tp->root initialization into ->init()
> routine, where before it was part of ->change(), so ->get() had to deal
> with tp->root being NULL back then, so that was indeed a valid case, after
> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
> in packet classifiers"); but the NULLifying was reintroduced with the
> RCUification, but it's not correct for every classifier implementation.
> 
> In the cases that are fixed here with one exception of cls_cgroup, tp->root
> object is allocated and initialized inside ->init() callback, which is always
> performed at a point in time after we allocate a new tp, which means tp and
> thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
> handler, same for the tp which is kfree_rcu()'ed right when we return
> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
> for such classifiers is always tied to the tp lifetime. The RCU callback
> invocation for the two kfree_rcu() could be out of order, but that's fine
> since both are independent.
> 
> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
> means that 1) we don't need a useless NULL check in fast-path and, 2) that
> outstanding readers of that tp in tc_classify() can still execute under
> respect with RCU grace period as it is actually expected.
> 
> Things that haven't been touched here: cls_fw and cls_route. They each
> handle tp->root being NULL in ->classify() path for historic reasons, so
> their ->destroy() implementation can stay as is. If someone actually
> cares, they could get cleaned up at some point to avoid the test in fast
> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
> !head should anyone actually be using/testing it, so it at least aligns with
> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
> destruction (to a sleepable context) after RCU grace period as concurrent
> readers might still access it. (Note that in this case we need to hold module
> reference to keep work callback address intact, since we only wait on module
> unload for all call_rcu()s to finish.)
> 
> This fixes one race to bring RCU grace period guarantees back. Next step
> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
> proto tp when all filters are gone") to get the order of unlinking the tp
> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
> removal be done through the prior ->delete() callback. Both are independant
> issues. Once we have that right, we can then clean tp->root up for a number
> of classifiers by not making them RCU pointers, which requires a new callback
> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
> tp->root from there.

Thanks looks good to me and appreciate the detailed commit message.

Acked-by: John Fastabend <john.r.fastabend@intel.com>

> 
> Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
> Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
> Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
> Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
> Reported-by: Roi Dayan <roid@mellanox.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Roi Dayan <roid@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> ---

^ permalink raw reply

* Re: [net,v2] neigh: fix the loop index error in neigh dump
From: David Ahern @ 2016-11-28  2:56 UTC (permalink / raw)
  To: 张胜举, netdev
In-Reply-To: <001d01d24922$9a16b1e0$ce4415a0$@cmss.chinamobile.com>

On 11/27/16 7:53 PM, 张胜举 wrote:
> 
> 
>> -----Original Message-----
>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>> Sent: Monday, November 28, 2016 10:39 AM
>> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
>> netdev@vger.kernel.org
>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>
>> On 11/27/16 7:34 PM, 张胜举 wrote:
>>>> -----Original Message-----
>>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>>>> Sent: Monday, November 28, 2016 10:10 AM
>>>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
>>>> netdev@vger.kernel.org
>>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>>
>>>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>>>> Loop index in neigh dump function is not updated correctly under
>>>>> some circumstances, this patch will fix it.
>>>>
>>>> What's an example?
>>>
>>> If dev is filtered out, the original code goes to next loop without
>>> updating loop index 'idx'.
>>
>> And you have a use case with missing or redundant data? Or is your
>> comment based on a review of code only?
> It's on my code review. No use case currently,  this is uncommon to happen.
> 
> 
>>
>>>> You are completely rewriting the dump loops.
>>>
>>> I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
>>> The other change is style related.
>>
>> A "fixes" should not include 'style related' changes.
> Okay, I will send another version without style changes.
> 

Personally, I think you need to produce a use case that fails before sending another patch. I have not seen a problem with this code.

^ permalink raw reply

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: John Fastabend @ 2016-11-28  2:57 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: roid, jiri, Daniel Borkmann
In-Reply-To: <1479952708-26763-1-git-send-email-xiyou.wangcong@gmail.com>

On 16-11-23 05:58 PM, Cong Wang wrote:
> Roi reported we could have a race condition where in ->classify() path
> we dereference tp->root and meanwhile a parallel ->destroy() makes it
> a NULL.
> 
> This is possible because ->destroy() could be called when deleting
> a filter to check if we are the last one in tp, this tp is still
> linked and visible at that time.
> 
> The root cause of this problem is the semantic of ->destroy(), it
> does two things (for non-force case):
> 
> 1) check if tp is empty
> 2) if tp is empty we could really destroy it
> 
> and its caller, if cares, needs to check its return value to see if
> it is really destroyed. Therefore we can't unlink tp unless we know
> it is empty.
> 
> As suggested by Daniel, we could actually move the test logic to ->delete()
> so that we can safely unlink tp after ->delete() tells us the last one is
> just deleted and before ->destroy().
> 
> What's more, even we unlink it before ->destroy(), it could still have
> readers since we don't wait for a grace period here, we should not modify
> tp->root in ->destroy() either.
> 
> Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
> Reported-by: Roi Dayan <roid@mellanox.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---

Hi Cong,

Thanks a lot for doing this. Can you rebase it on top of Daniel's patch
though,

 [PATCH net] net, sched: respect rcu grace period on cls destruction

And then push the NULL pointer work for the cls_fw and cls_route
classifiers into another patch.

Then I believe the last thing to make this correct is to convert the
call_rcu() paths to call_rcu_bh().

.John

^ permalink raw reply

* Re: [net,v2] neigh: fix the loop index error in neigh dump
From: David Ahern @ 2016-11-28  3:09 UTC (permalink / raw)
  To: 张胜举, netdev
In-Reply-To: <a01fcd54-c0ed-9d05-743a-27592d845c56@cumulusnetworks.com>

On 11/27/16 7:56 PM, David Ahern wrote:
> On 11/27/16 7:53 PM, 张胜举 wrote:
>>
>>
>>> -----Original Message-----
>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>>> Sent: Monday, November 28, 2016 10:39 AM
>>> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
>>> netdev@vger.kernel.org
>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>
>>> On 11/27/16 7:34 PM, 张胜举 wrote:
>>>>> -----Original Message-----
>>>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>>>>> Sent: Monday, November 28, 2016 10:10 AM
>>>>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
>>>>> netdev@vger.kernel.org
>>>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>>>
>>>>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>>>>> Loop index in neigh dump function is not updated correctly under
>>>>>> some circumstances, this patch will fix it.
>>>>>
>>>>> What's an example?
>>>>
>>>> If dev is filtered out, the original code goes to next loop without
>>>> updating loop index 'idx'.
>>>
>>> And you have a use case with missing or redundant data? Or is your
>>> comment based on a review of code only?
>> It's on my code review. No use case currently,  this is uncommon to happen.
>>
>>
>>>
>>>>> You are completely rewriting the dump loops.
>>>>
>>>> I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
>>>> The other change is style related.
>>>
>>> A "fixes" should not include 'style related' changes.
>> Okay, I will send another version without style changes.
>>
> 
> Personally, I think you need to produce a use case that fails before sending another patch. I have not seen a problem with this code.
> 

And looking back at 3f0ae05d6f I should not have acked it (reviewed it too quickly while on PTO). Your change is a no-op because of what idx represents - the position in the hash list for devices relevant for the dump request. Same goes for the neigh dump so this patch is not needed.

^ permalink raw reply

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: Michael S. Tsirkin @ 2016-11-28  3:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <5838ABF3.8060308@gmail.com>

On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>
> >>>> This adds XDP support to virtio_net. Some requirements must be
> >>>> met for XDP to be enabled depending on the mode. First it will
> >>>> only be supported with LRO disabled so that data is not pushed
> >>>> across multiple buffers. The MTU must be less than a page size
> >>>> to avoid having to handle XDP across multiple pages.
> >>>>
> >>>> If mergeable receive is enabled this first series only supports
> >>>> the case where header and data are in the same buf which we can
> >>>> check when a packet is received by looking at num_buf. If the
> >>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>> does not happen and both header and data is put in a single buffer
> >>>> as expected so we check this when XDP programs are loaded. Note I
> >>>> have only tested this with Linux vhost backend.
> >>>>
> >>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>> met then XDP is allowed.
> >>>>
> >>>> A follow on patch can be generated to solve the mergeable receive
> >>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>> be handled has easily.
> >>>
> >>>
> >>> I would very much prefer support for other layouts without drops
> >>> before merging this.
> >>> header by itself can certainly be handled by skipping it.
> >>> People wanted to use that e.g. for zero copy.
> >>
> >> OK fair enough I'll do this now rather than push it out.
> >>
> 
> Hi Michael,
> 
> The header skip logic however complicates the xmit handling a fair
> amount. Specifically when we release the buffers after xmit then
> both the hdr and data portions need to be released which requires
> some tracking.

I thought you disable all checksum offloads so why not discard the
header immediately?

> Is the header split logic actually in use somewhere today? It looks
> like its not being used in Linux case. And zero copy RX is currently as
> best I can tell not supported anywhere so I would prefer not to
> complicate the XDP path at the moment with a possible future feature.

Well it's part of the documented interface so we never
know who implemented it. Normally if we want to make
restrictions we would do the reverse and add a feature.

We can do this easily, but I'd like to first look into
just handling all possible inputs as the spec asks us to.
I'm a bit too busy with other stuff next week but will
look into this a week after that if you don't beat me to it.

> >>>
> >>> Anything else can be handled by copying the packet.
> 
> Any idea how to test this? At the moment I have some code to linearize
> the data in all cases with more than a single buffer. But wasn't clear
> to me which features I could negotiate with vhost/qemu to get more than
> a single buffer in the receive path.
> 
> Thanks,
> John

ATM you need to hack qemu. Here's a hack to make header completely
separate.


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..4866144 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
+            continue;
         } else {
             guest_offset = 0;
         }



here's one that should cap the 1st s/g to 100 bytes:


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..7943004 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
+            sg.iov_len = MIN(sg.iov_len, 100);
         } else {
             guest_offset = 0;
         }

^ permalink raw reply related

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: John Fastabend @ 2016-11-28  3:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <20161128052211-mutt-send-email-mst@kernel.org>

On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
>> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
>>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
>>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>>>>>
>>>>>> This adds XDP support to virtio_net. Some requirements must be
>>>>>> met for XDP to be enabled depending on the mode. First it will
>>>>>> only be supported with LRO disabled so that data is not pushed
>>>>>> across multiple buffers. The MTU must be less than a page size
>>>>>> to avoid having to handle XDP across multiple pages.
>>>>>>
>>>>>> If mergeable receive is enabled this first series only supports
>>>>>> the case where header and data are in the same buf which we can
>>>>>> check when a packet is received by looking at num_buf. If the
>>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
>>>>>> is dropped and a warning is thrown. When any_header_sg is set this
>>>>>> does not happen and both header and data is put in a single buffer
>>>>>> as expected so we check this when XDP programs are loaded. Note I
>>>>>> have only tested this with Linux vhost backend.
>>>>>>
>>>>>> If big packets mode is enabled and MTU/LRO conditions above are
>>>>>> met then XDP is allowed.
>>>>>>
>>>>>> A follow on patch can be generated to solve the mergeable receive
>>>>>> case with num_bufs equal to 2. Buffers greater than two may not
>>>>>> be handled has easily.
>>>>>
>>>>>
>>>>> I would very much prefer support for other layouts without drops
>>>>> before merging this.
>>>>> header by itself can certainly be handled by skipping it.
>>>>> People wanted to use that e.g. for zero copy.
>>>>
>>>> OK fair enough I'll do this now rather than push it out.
>>>>
>>
>> Hi Michael,
>>
>> The header skip logic however complicates the xmit handling a fair
>> amount. Specifically when we release the buffers after xmit then
>> both the hdr and data portions need to be released which requires
>> some tracking.
> 
> I thought you disable all checksum offloads so why not discard the
> header immediately?

Well in the "normal" case where the header is part of the same buffer
we keep it to use the same space for the header on the TX path.

If we discard it in the header split case we have to push the header
somewhere else. In the skb case the cb[] region is used it looks like.
In our case I guess free space at the end of the page could be used.

My thinking is if we handle the general case of more than one buffer
being used with a copy we can handle the case above using the same
logic and no need to handle it as a special case. It seems to be an odd
case that doesn't really exist anyways. At least not in qemu/Linux. I
have not tested anything else.

> 
>> Is the header split logic actually in use somewhere today? It looks
>> like its not being used in Linux case. And zero copy RX is currently as
>> best I can tell not supported anywhere so I would prefer not to
>> complicate the XDP path at the moment with a possible future feature.
> 
> Well it's part of the documented interface so we never
> know who implemented it. Normally if we want to make
> restrictions we would do the reverse and add a feature.
> 
> We can do this easily, but I'd like to first look into
> just handling all possible inputs as the spec asks us to.
> I'm a bit too busy with other stuff next week but will
> look into this a week after that if you don't beat me to it.
> 

Well I've almost got it working now with some logic to copy everything
into a single page if we hit this case so should be OK but slow. I'll
finish testing this and send it out hopefully in the next few days.

>>>>>
>>>>> Anything else can be handled by copying the packet.
>>
>> Any idea how to test this? At the moment I have some code to linearize
>> the data in all cases with more than a single buffer. But wasn't clear
>> to me which features I could negotiate with vhost/qemu to get more than
>> a single buffer in the receive path.
>>
>> Thanks,
>> John
> 
> ATM you need to hack qemu. Here's a hack to make header completely
> separate.
> 

Perfect! hacking qemu for testing is no problem this helps a lot thanks
and saves me time trying to figure out how to get qemu to do this.

> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..4866144 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>              offset = n->host_hdr_len;
>              total += n->guest_hdr_len;
>              guest_offset = n->guest_hdr_len;
> +            continue;
>          } else {
>              guest_offset = 0;
>          }
> 
> 
> 
> here's one that should cap the 1st s/g to 100 bytes:
> 
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..7943004 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>              offset = n->host_hdr_len;
>              total += n->guest_hdr_len;
>              guest_offset = n->guest_hdr_len;
> +            sg.iov_len = MIN(sg.iov_len, 100);
>          } else {
>              guest_offset = 0;
>          }
> 

^ permalink raw reply

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: Michael S. Tsirkin @ 2016-11-28  4:07 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <583BAAD9.4080408@gmail.com>

On Sun, Nov 27, 2016 at 07:56:09PM -0800, John Fastabend wrote:
> On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>>>
> >>>>>> This adds XDP support to virtio_net. Some requirements must be
> >>>>>> met for XDP to be enabled depending on the mode. First it will
> >>>>>> only be supported with LRO disabled so that data is not pushed
> >>>>>> across multiple buffers. The MTU must be less than a page size
> >>>>>> to avoid having to handle XDP across multiple pages.
> >>>>>>
> >>>>>> If mergeable receive is enabled this first series only supports
> >>>>>> the case where header and data are in the same buf which we can
> >>>>>> check when a packet is received by looking at num_buf. If the
> >>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>>>> does not happen and both header and data is put in a single buffer
> >>>>>> as expected so we check this when XDP programs are loaded. Note I
> >>>>>> have only tested this with Linux vhost backend.
> >>>>>>
> >>>>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>>>> met then XDP is allowed.
> >>>>>>
> >>>>>> A follow on patch can be generated to solve the mergeable receive
> >>>>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>>>> be handled has easily.
> >>>>>
> >>>>>
> >>>>> I would very much prefer support for other layouts without drops
> >>>>> before merging this.
> >>>>> header by itself can certainly be handled by skipping it.
> >>>>> People wanted to use that e.g. for zero copy.
> >>>>
> >>>> OK fair enough I'll do this now rather than push it out.
> >>>>
> >>
> >> Hi Michael,
> >>
> >> The header skip logic however complicates the xmit handling a fair
> >> amount. Specifically when we release the buffers after xmit then
> >> both the hdr and data portions need to be released which requires
> >> some tracking.
> > 
> > I thought you disable all checksum offloads so why not discard the
> > header immediately?
> 
> Well in the "normal" case where the header is part of the same buffer
> we keep it to use the same space for the header on the TX path.
> 
> If we discard it in the header split case we have to push the header
> somewhere else. In the skb case the cb[] region is used it looks like.
> In our case I guess free space at the end of the page could be used.

You don't have to put start of page in a buffer, you
can put an offset there. Will result in some waste in the
common case, but it's just several bytes so likely not a big deal.

> My thinking is if we handle the general case of more than one buffer
> being used with a copy we can handle the case above using the same
> logic and no need to handle it as a special case. It seems to be an odd
> case that doesn't really exist anyways. At least not in qemu/Linux. I
> have not tested anything else.

OK

> > 
> >> Is the header split logic actually in use somewhere today? It looks
> >> like its not being used in Linux case. And zero copy RX is currently as
> >> best I can tell not supported anywhere so I would prefer not to
> >> complicate the XDP path at the moment with a possible future feature.
> > 
> > Well it's part of the documented interface so we never
> > know who implemented it. Normally if we want to make
> > restrictions we would do the reverse and add a feature.
> > 
> > We can do this easily, but I'd like to first look into
> > just handling all possible inputs as the spec asks us to.
> > I'm a bit too busy with other stuff next week but will
> > look into this a week after that if you don't beat me to it.
> > 
> 
> Well I've almost got it working now with some logic to copy everything
> into a single page if we hit this case so should be OK but slow. I'll
> finish testing this and send it out hopefully in the next few days.
> 
> >>>>>
> >>>>> Anything else can be handled by copying the packet.
> >>
> >> Any idea how to test this? At the moment I have some code to linearize
> >> the data in all cases with more than a single buffer. But wasn't clear
> >> to me which features I could negotiate with vhost/qemu to get more than
> >> a single buffer in the receive path.
> >>
> >> Thanks,
> >> John
> > 
> > ATM you need to hack qemu. Here's a hack to make header completely
> > separate.
> > 
> 
> Perfect! hacking qemu for testing is no problem this helps a lot thanks
> and saves me time trying to figure out how to get qemu to do this.

Pls note I didn't try this at all, so might not work, but should
give you the idea.

> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..4866144 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> >              offset = n->host_hdr_len;
> >              total += n->guest_hdr_len;
> >              guest_offset = n->guest_hdr_len;
> > +            continue;
> >          } else {
> >              guest_offset = 0;
> >          }
> > 
> > 
> > 
> > here's one that should cap the 1st s/g to 100 bytes:
> > 
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..7943004 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> >              offset = n->host_hdr_len;
> >              total += n->guest_hdr_len;
> >              guest_offset = n->guest_hdr_len;
> > +            sg.iov_len = MIN(sg.iov_len, 100);
> >          } else {
> >              guest_offset = 0;
> >          }
> > 

^ permalink raw reply

* Re: [RFC net-next 2/3] net: dsa: Propagate VLAN add/del to CPU port(s)
From: Florian Fainelli @ 2016-11-28  4:30 UTC (permalink / raw)
  To: Vivien Didelot, netdev; +Cc: davem, bridge, stephen, andrew, jiri, idosch
In-Reply-To: <87eg23zcf5.fsf@ketchup.i-did-not-set--mail-host-address--so-tickle-me>



On 11/22/2016 08:50 AM, Vivien Didelot wrote:
> Hi Florian,
> 
> Open question: will we need to do the same for FDB and MDB objects?

(overlooked that question early this week), I do expect that this could
be helpful for FDB and MBD objects as well, yes.

> 
> Florian Fainelli <f.fainelli@gmail.com> writes:
> 
>> Now that the bridge layer can call into switchdev to signal programming
>> requests targeting the bridge master device itself, allow the switch
>> drivers to implement separate programming of downstream and
>> upstream/management ports.
>>
>> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  net/dsa/slave.c | 45 +++++++++++++++++++++++++++++++++------------
>>  1 file changed, 33 insertions(+), 12 deletions(-)
>>
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index d0c7bce88743..18288261b964 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -223,35 +223,30 @@ static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
>>  	return 0;
>>  }
>>  
>> -static int dsa_slave_port_vlan_add(struct net_device *dev,
>> +static int dsa_slave_port_vlan_add(struct dsa_switch *ds, int port,
>>  				   const struct switchdev_obj_port_vlan *vlan,
>>  				   struct switchdev_trans *trans)
>>  {
>> -	struct dsa_slave_priv *p = netdev_priv(dev);
>> -	struct dsa_switch *ds = p->parent;
>>  
> 
> Extra newline ^.
> 
>>  	if (switchdev_trans_ph_prepare(trans)) {
>>  		if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
>>  			return -EOPNOTSUPP;
>>  
>> -		return ds->ops->port_vlan_prepare(ds, p->port, vlan, trans);
>> +		return ds->ops->port_vlan_prepare(ds, port, vlan, trans);
>>  	}
>>  
>> -	ds->ops->port_vlan_add(ds, p->port, vlan, trans);
>> +	ds->ops->port_vlan_add(ds, port, vlan, trans);
>>  
>>  	return 0;
>>  }
>>  
>> -static int dsa_slave_port_vlan_del(struct net_device *dev,
>> +static int dsa_slave_port_vlan_del(struct dsa_switch *ds, int port,
>>  				   const struct switchdev_obj_port_vlan *vlan)
>>  {
>> -	struct dsa_slave_priv *p = netdev_priv(dev);
>> -	struct dsa_switch *ds = p->parent;
>> -
>>  	if (!ds->ops->port_vlan_del)
>>  		return -EOPNOTSUPP;
>>  
>> -	return ds->ops->port_vlan_del(ds, p->port, vlan);
>> +	return ds->ops->port_vlan_del(ds, port, vlan);
>>  }
>>  
>>  static int dsa_slave_port_vlan_dump(struct net_device *dev,
>> @@ -465,8 +460,21 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>>  				  const struct switchdev_obj *obj,
>>  				  struct switchdev_trans *trans)
>>  {
>> +	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	struct dsa_switch *ds = p->parent;
>> +	int port = p->port;
>>  	int err;
>>  
>> +	/* Here we may be called with an orig_dev which is different from dev,
>> +	 * on purpose, to receive request coming from e.g the bridge master
>> +	 * device. Although there are no network device associated with CPU/DSA
>> +	 * ports, we may still have programming operation for these ports.
>> +	 */
>> +	if (obj->orig_dev == p->bridge_dev) {
>> +		ds = ds->dst->ds[0];
>> +		port = ds->dst->cpu_port;
>> +	}
>> +
>>  	/* For the prepare phase, ensure the full set of changes is feasable in
>>  	 * one go in order to signal a failure properly. If an operation is not
>>  	 * supported, return -EOPNOTSUPP.
>> @@ -483,7 +491,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>>  					     trans);
>>  		break;
>>  	case SWITCHDEV_OBJ_ID_PORT_VLAN:
>> -		err = dsa_slave_port_vlan_add(dev,
>> +		err = dsa_slave_port_vlan_add(ds, port,
>>  					      SWITCHDEV_OBJ_PORT_VLAN(obj),
>>  					      trans);
> 
> Note that dsa_slave_port_vlan_add() will be called N times, N being the
> number of bridge ports. This is not an issue for the moment though.
> Programming it only once requires caching, so leave it for an eventual
> future patch.
> 
> When issuing the following command (lan0 being a member of br0):
> 
>     # bridge vlan add vid 42 dev lan0
> 
> the CPU port is also programmed as tagged in VLAN 42. Is that expected?

The first time the VLAN id is programmed to either lan0 or br0, and it
did not exist prior to that call, it also gets populated into the bridge
VLAN database, which is why both the lan0 interface and the CPU port get
programmed.
-- 
Florian

^ permalink raw reply

* [PATCH net-next] bpf: samples: Fix compile of test_lru_dist.c
From: David Ahern @ 2016-11-28  4:32 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, Martin KaFai Lau

Build of samples/bpf on debian/jessie fails with:

  HOSTCC  /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.o
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c: In function ‘main’:
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: error: variable ‘r’ has initializer but incomplete type
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
         ^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: error: ‘RLIM_INFINITY’ undeclared (first use in this function)
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                     ^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: note: each undeclared identifier is reported only once for each function it appears in
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess elements in struct initializer
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
         ^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near initialization for ‘r’)
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess elements in struct initializer
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near initialization for ‘r’)
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:16: error: storage size of ‘r’ isn’t known
  struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};

Add sys/resource.h to the include list

Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Cc: Martin KaFai Lau <kafai@fb.com>
---
 samples/bpf/test_lru_dist.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 2859977b7f37..bc4a2142eb91 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -16,6 +16,7 @@
 #include <sched.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
+#include <sys/resource.h>
 #include <fcntl.h>
 #include <stdlib.h>
 #include <time.h>
-- 
2.1.4

^ permalink raw reply related

* RE: [net,v2] neigh: fix the loop index error in neigh dump
From: 张胜举 @ 2016-11-28  4:50 UTC (permalink / raw)
  To: 'David Ahern', netdev
In-Reply-To: <6859d40b-0049-513a-6dc4-162d383ef7b9@cumulusnetworks.com>



> -----Original Message-----
> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 11:10 AM
> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> 
> On 11/27/16 7:56 PM, David Ahern wrote:
> > On 11/27/16 7:53 PM, 张胜举 wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> >>> Sent: Monday, November 28, 2016 10:39 AM
> >>> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
> >>> netdev@vger.kernel.org
> >>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> >>>
> >>> On 11/27/16 7:34 PM, 张胜举 wrote:
> >>>>> -----Original Message-----
> >>>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> >>>>> Sent: Monday, November 28, 2016 10:10 AM
> >>>>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
> >>>>> netdev@vger.kernel.org
> >>>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh
> >>>>> dump
> >>>>>
> >>>>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> >>>>>> Loop index in neigh dump function is not updated correctly under
> >>>>>> some circumstances, this patch will fix it.
> >>>>>
> >>>>> What's an example?
> >>>>
> >>>> If dev is filtered out, the original code goes to next loop without
> >>>> updating loop index 'idx'.
> >>>
> >>> And you have a use case with missing or redundant data? Or is your
> >>> comment based on a review of code only?
> >> It's on my code review. No use case currently,  this is uncommon to
> happen.
> >>
> >>
> >>>
> >>>>> You are completely rewriting the dump loops.
> >>>>
> >>>> I put 'idx++' into for loop,  so I replace 'goto' with 'continue'.
> >>>> The other change is style related.
> >>>
> >>> A "fixes" should not include 'style related' changes.
> >> Okay, I will send another version without style changes.
> >>
> >
> > Personally, I think you need to produce a use case that fails before
sending
> another patch. I have not seen a problem with this code.
> >
> 
> And looking back at 3f0ae05d6f I should not have acked it (reviewed it too
> quickly while on PTO). Your change is a no-op because of what idx
represents
> - the position in the hash list for devices relevant for the dump request.
> Same goes for the neigh dump so this patch is not needed.
> 
No, when dump request must be processed by multiple 'recv/recvmsg' system
calls, 
idx stores which dev/neigh the previous call have processed, so that next
call will scan 
from the right place.  

So no matter whether the dev/neigh is filtered, the idx should be increased
anyway.

It's hard to produce a use case, because we mostly have only one entity in
hash list. Even with
multiple entities, we also need the function to exit right at the place
where dev/neigh is filter out.

All other dump functiones for RT netlink keep this logic, you can refer
inet_dump_ifaddr() if you wish.

^ permalink raw reply

* [PATCH net-next 0/9] liquidio VF operations
From: Raghu Vatsavayi @ 2016-11-28  4:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, Raghu Vatsavayi

Hi Dave,

Following  patches add support for VF device specific operations
like mailbox, queues and register access. Please apply the 
patches in following order as these patches depend on each other.

Thanks


Raghu Vatsavayi (9):
  liquidio CN23XX: VF register definitions
  liquidio CN23XX: VF registration
  liquidio CN23XX: VF config setup
  liquidio CN23XX: VF queue setup
  liquidio CN23XX: VF register access
  liquidio CN23XX: init VF softcommand queues
  liquidio CN23XX: VF mailbox
  liquidio CN23XX: VF interrupt
  liquidio CN23XX: VF init and destroy

 drivers/net/ethernet/cavium/Kconfig                |  12 +
 drivers/net/ethernet/cavium/liquidio/Makefile      |  22 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    | 701 +++++++++++++++++++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |  48 ++
 .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h  | 274 ++++++++
 drivers/net/ethernet/cavium/liquidio/lio_core.c    |   7 -
 drivers/net/ethernet/cavium/liquidio/lio_main.c    |   6 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 614 ++++++++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  58 +-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   9 +-
 .../net/ethernet/cavium/liquidio/request_manager.c |  11 +-
 11 files changed, 1751 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next 1/9] liquidio CN23XX: VF register definitions
From: Raghu Vatsavayi @ 2016-11-28  4:51 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480308702-6261-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for CN23xx VF registers.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h  | 274 +++++++++++++++++++++
 1 file changed, 274 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
new file mode 100644
index 0000000..d33dd8f
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
@@ -0,0 +1,274 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+/*! \file cn23xx_vf_regs.h
+ * \brief Host Driver: Register Address and Register Mask values for
+ * Octeon CN23XX vf functions.
+ */
+
+#ifndef __CN23XX_VF_REGS_H__
+#define __CN23XX_VF_REGS_H__
+
+#define     CN23XX_CONFIG_XPANSION_BAR             0x38
+
+#define     CN23XX_CONFIG_PCIE_CAP                 0x70
+#define     CN23XX_CONFIG_PCIE_DEVCAP              0x74
+#define     CN23XX_CONFIG_PCIE_DEVCTL              0x78
+#define     CN23XX_CONFIG_PCIE_LINKCAP             0x7C
+#define     CN23XX_CONFIG_PCIE_LINKCTL             0x80
+#define     CN23XX_CONFIG_PCIE_SLOTCAP             0x84
+#define     CN23XX_CONFIG_PCIE_SLOTCTL             0x88
+
+#define     CN23XX_CONFIG_PCIE_FLTMSK              0x720
+
+/* The input jabber is used to determine the TSO max size.
+ * Due to H/W limitation, this need to be reduced to 60000
+ * in order to to H/W TSO and avoid the WQE malfarmation
+ * PKO_BUG_24989_WQE_LEN
+ */
+#define    CN23XX_DEFAULT_INPUT_JABBER             0xEA60 /*60000*/
+
+/* ##############  BAR0 Registers ################ */
+
+/* Each Input Queue register is at a 16-byte Offset in BAR0 */
+#define    CN23XX_VF_IQ_OFFSET                     0x20000
+
+/*###################### REQUEST QUEUE #########################*/
+
+/* 64 registers for Input Queue Instr Count - SLI_PKT_IN_DONE0_CNTS */
+#define    CN23XX_VF_SLI_IQ_INSTR_COUNT_START64     0x10040
+
+/* 64 registers for Input Queues Start Addr - SLI_PKT0_INSTR_BADDR */
+#define    CN23XX_VF_SLI_IQ_BASE_ADDR_START64       0x10010
+
+/* 64 registers for Input Doorbell - SLI_PKT0_INSTR_BAOFF_DBELL */
+#define    CN23XX_VF_SLI_IQ_DOORBELL_START          0x10020
+
+/* 64 registers for Input Queue size - SLI_PKT0_INSTR_FIFO_RSIZE */
+#define    CN23XX_VF_SLI_IQ_SIZE_START              0x10030
+
+/* 64 registers (64-bit) - ES, RO, NS, Arbitration for Input Queue Data &
+ * gather list fetches. SLI_PKT(0..63)_INPUT_CONTROL.
+ */
+#define    CN23XX_VF_SLI_IQ_PKT_CONTROL_START64     0x10000
+
+/*------- Request Queue Macros ---------*/
+#define CN23XX_VF_SLI_IQ_PKT_CONTROL64(iq)		\
+	(CN23XX_VF_SLI_IQ_PKT_CONTROL_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_BASE_ADDR64(iq)		\
+	(CN23XX_VF_SLI_IQ_BASE_ADDR_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_SIZE(iq)			\
+	(CN23XX_VF_SLI_IQ_SIZE_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_DOORBELL(iq)			\
+	(CN23XX_VF_SLI_IQ_DOORBELL_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_INSTR_COUNT64(iq)		\
+	(CN23XX_VF_SLI_IQ_INSTR_COUNT_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+/*------------------ Masks ----------------*/
+#define    CN23XX_PKT_INPUT_CTL_VF_NUM                  BIT_ULL(32)
+#define    CN23XX_PKT_INPUT_CTL_MAC_NUM                 BIT(29)
+/* Number of instructions to be read in one MAC read request.
+ * setting to Max value(4)
+ */
+#define    CN23XX_PKT_INPUT_CTL_RDSIZE                  (3 << 25)
+#define    CN23XX_PKT_INPUT_CTL_IS_64B                  BIT(24)
+#define    CN23XX_PKT_INPUT_CTL_RST                     BIT(23)
+#define    CN23XX_PKT_INPUT_CTL_QUIET                   BIT(28)
+#define    CN23XX_PKT_INPUT_CTL_RING_ENB                BIT(22)
+#define    CN23XX_PKT_INPUT_CTL_DATA_NS                 BIT(8)
+#define    CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP        BIT(6)
+#define    CN23XX_PKT_INPUT_CTL_DATA_RO                 BIT(5)
+#define    CN23XX_PKT_INPUT_CTL_USE_CSR                 BIT(4)
+#define    CN23XX_PKT_INPUT_CTL_GATHER_NS               BIT(3)
+#define    CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP      (2)
+#define    CN23XX_PKT_INPUT_CTL_GATHER_RO               (1)
+
+/** Rings per Virtual Function [RO] **/
+#define    CN23XX_PKT_INPUT_CTL_RPVF_MASK               (0x3F)
+#define    CN23XX_PKT_INPUT_CTL_RPVF_POS                (48)
+/* These bits[47:44][RO] give the Physical function number info within the MAC*/
+#define    CN23XX_PKT_INPUT_CTL_PF_NUM_MASK             (0x7)
+#define    CN23XX_PKT_INPUT_CTL_PF_NUM_POS              (45)
+/** These bits[43:32][RO] give the virtual function number info within the PF*/
+#define    CN23XX_PKT_INPUT_CTL_VF_NUM_MASK             (0x1FFF)
+#define    CN23XX_PKT_INPUT_CTL_VF_NUM_POS              (32)
+#define    CN23XX_PKT_INPUT_CTL_MAC_NUM_MASK            (0x3)
+#define    CN23XX_PKT_INPUT_CTL_MAC_NUM_POS             (29)
+#define    CN23XX_PKT_IN_DONE_WMARK_MASK                (0xFFFFULL)
+#define    CN23XX_PKT_IN_DONE_WMARK_BIT_POS             (32)
+#define    CN23XX_PKT_IN_DONE_CNT_MASK                  (0x00000000FFFFFFFFULL)
+
+#ifdef __LITTLE_ENDIAN_BITFIELD
+#define CN23XX_PKT_INPUT_CTL_MASK			\
+	(CN23XX_PKT_INPUT_CTL_RDSIZE			\
+	 | CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP	\
+	 | CN23XX_PKT_INPUT_CTL_USE_CSR)
+#else
+#define CN23XX_PKT_INPUT_CTL_MASK			\
+	(CN23XX_PKT_INPUT_CTL_RDSIZE			\
+	 | CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP	\
+	 | CN23XX_PKT_INPUT_CTL_USE_CSR			\
+	 | CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP)
+#endif
+
+/** Masks for SLI_PKT_IN_DONE(0..63)_CNTS Register */
+#define    CN23XX_IN_DONE_CNTS_PI_INT               BIT_ULL(62)
+#define    CN23XX_IN_DONE_CNTS_CINT_ENB             BIT_ULL(48)
+
+/*############################ OUTPUT QUEUE #########################*/
+
+/* 64 registers for Output queue control - SLI_PKT(0..63)_OUTPUT_CONTROL */
+#define    CN23XX_VF_SLI_OQ_PKT_CONTROL_START       0x10050
+
+/* 64 registers for Output queue buffer and info size - SLI_PKT0_OUT_SIZE */
+#define    CN23XX_VF_SLI_OQ0_BUFF_INFO_SIZE         0x10060
+
+/* 64 registers for Output Queue Start Addr - SLI_PKT0_SLIST_BADDR */
+#define    CN23XX_VF_SLI_OQ_BASE_ADDR_START64       0x10070
+
+/* 64 registers for Output Queue Packet Credits - SLI_PKT0_SLIST_BAOFF_DBELL */
+#define    CN23XX_VF_SLI_OQ_PKT_CREDITS_START       0x10080
+
+/* 64 registers for Output Queue size - SLI_PKT0_SLIST_FIFO_RSIZE */
+#define    CN23XX_VF_SLI_OQ_SIZE_START              0x10090
+
+/* 64 registers for Output Queue Packet Count - SLI_PKT0_CNTS */
+#define    CN23XX_VF_SLI_OQ_PKT_SENT_START          0x100B0
+
+/* 64 registers for Output Queue INT Levels - SLI_PKT0_INT_LEVELS */
+#define    CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64  0x100A0
+
+/* Each Output Queue register is at a 16-byte Offset in BAR0 */
+#define    CN23XX_VF_OQ_OFFSET                      0x20000
+
+/*------- Output Queue Macros ---------*/
+
+#define CN23XX_VF_SLI_OQ_PKT_CONTROL(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_CONTROL_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_BASE_ADDR64(oq)		\
+	(CN23XX_VF_SLI_OQ_BASE_ADDR_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_SIZE(oq)			\
+	(CN23XX_VF_SLI_OQ_SIZE_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_BUFF_INFO_SIZE(oq)		\
+	(CN23XX_VF_SLI_OQ0_BUFF_INFO_SIZE + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKTS_SENT(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_SENT_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_CREDITS_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+/* Macro's for accessing CNT and TIME separately from INT_LEVELS */
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_CNT(oq)	\
+	(CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_TIME(oq)	\
+	(CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 +	\
+	 ((oq) * CN23XX_VF_OQ_OFFSET) + 4)
+
+/*------------------ Masks ----------------*/
+#define    CN23XX_PKT_OUTPUT_CTL_TENB                  BIT(13)
+#define    CN23XX_PKT_OUTPUT_CTL_CENB                  BIT(12)
+#define    CN23XX_PKT_OUTPUT_CTL_IPTR                  BIT(11)
+#define    CN23XX_PKT_OUTPUT_CTL_ES                    BIT(9)
+#define    CN23XX_PKT_OUTPUT_CTL_NSR                   BIT(8)
+#define    CN23XX_PKT_OUTPUT_CTL_ROR                   BIT(7)
+#define    CN23XX_PKT_OUTPUT_CTL_DPTR                  BIT(6)
+#define    CN23XX_PKT_OUTPUT_CTL_BMODE                 BIT(5)
+#define    CN23XX_PKT_OUTPUT_CTL_ES_P                  BIT(3)
+#define    CN23XX_PKT_OUTPUT_CTL_NSR_P                 BIT(2)
+#define    CN23XX_PKT_OUTPUT_CTL_ROR_P                 BIT(1)
+#define    CN23XX_PKT_OUTPUT_CTL_RING_ENB              BIT(0)
+
+/*######################### Mailbox Reg Macros ########################*/
+#define    CN23XX_VF_SLI_PKT_MBOX_INT_START            0x10210
+#define    CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START         0x10200
+
+#define    CN23XX_SLI_MBOX_OFFSET                      0x20000
+#define    CN23XX_SLI_MBOX_SIG_IDX_OFFSET              0x8
+
+#define CN23XX_VF_SLI_PKT_MBOX_INT(q)	\
+	(CN23XX_VF_SLI_PKT_MBOX_INT_START + ((q) * CN23XX_SLI_MBOX_OFFSET))
+
+#define CN23XX_SLI_PKT_PF_VF_MBOX_SIG(q, idx)		\
+	(CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START +		\
+	 ((q) * CN23XX_SLI_MBOX_OFFSET +		\
+	  (idx) * CN23XX_SLI_MBOX_SIG_IDX_OFFSET))
+
+/*######################## INTERRUPTS #########################*/
+
+#define    CN23XX_VF_SLI_INT_SUM_START		  0x100D0
+
+#define CN23XX_VF_SLI_INT_SUM(q)			\
+	(CN23XX_VF_SLI_INT_SUM_START + ((q) * CN23XX_VF_IQ_OFFSET))
+
+/*------------------ Interrupt Masks ----------------*/
+
+#define    CN23XX_INTR_PO_INT                   BIT_ULL(63)
+#define    CN23XX_INTR_PI_INT                   BIT_ULL(62)
+#define    CN23XX_INTR_MBOX_INT                 BIT_ULL(61)
+#define    CN23XX_INTR_RESEND                   BIT_ULL(60)
+
+#define    CN23XX_INTR_CINT_ENB                 BIT_ULL(48)
+#define    CN23XX_INTR_MBOX_ENB                 BIT(0)
+
+/*############################ MIO #########################*/
+#define    CN23XX_MIO_PTP_CLOCK_CFG       0x0001070000000f00ULL
+#define    CN23XX_MIO_PTP_CLOCK_LO        0x0001070000000f08ULL
+#define    CN23XX_MIO_PTP_CLOCK_HI        0x0001070000000f10ULL
+#define    CN23XX_MIO_PTP_CLOCK_COMP      0x0001070000000f18ULL
+#define    CN23XX_MIO_PTP_TIMESTAMP       0x0001070000000f20ULL
+#define    CN23XX_MIO_PTP_EVT_CNT         0x0001070000000f28ULL
+#define    CN23XX_MIO_PTP_CKOUT_THRESH_LO 0x0001070000000f30ULL
+#define    CN23XX_MIO_PTP_CKOUT_THRESH_HI 0x0001070000000f38ULL
+#define    CN23XX_MIO_PTP_CKOUT_HI_INCR   0x0001070000000f40ULL
+#define    CN23XX_MIO_PTP_CKOUT_LO_INCR   0x0001070000000f48ULL
+#define    CN23XX_MIO_PTP_PPS_THRESH_LO   0x0001070000000f50ULL
+#define    CN23XX_MIO_PTP_PPS_THRESH_HI   0x0001070000000f58ULL
+#define    CN23XX_MIO_PTP_PPS_HI_INCR     0x0001070000000f60ULL
+#define    CN23XX_MIO_PTP_PPS_LO_INCR     0x0001070000000f68ULL
+
+/*############################ RST #########################*/
+#define    CN23XX_RST_BOOT                0x0001180006001600ULL
+
+/*######################## MSIX TABLE #########################*/
+
+#define    CN23XX_MSIX_TABLE_ADDR_START    0x0
+#define    CN23XX_MSIX_TABLE_DATA_START    0x8
+
+#define    CN23XX_MSIX_TABLE_SIZE          0x10
+#define    CN23XX_MSIX_TABLE_ENTRIES       0x41
+
+#define    CN23XX_MSIX_ENTRY_VECTOR_CTL    BIT_ULL(32)
+
+#define CN23XX_MSIX_TABLE_ADDR(idx)		\
+	(CN23XX_MSIX_TABLE_ADDR_START + ((idx) * CN23XX_MSIX_TABLE_SIZE))
+
+#define CN23XX_MSIX_TABLE_DATA(idx)		\
+	(CN23XX_MSIX_TABLE_DATA_START + ((idx) * CN23XX_MSIX_TABLE_SIZE))
+
+#endif
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/9] liquidio CN23XX: VF registration
From: Raghu Vatsavayi @ 2016-11-28  4:51 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480308702-6261-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for cn23xx VF probe and registration.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/Kconfig                |  12 +++
 drivers/net/ethernet/cavium/liquidio/Makefile      |  21 ++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |  34 ++++++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 120 +++++++++++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   4 +
 5 files changed, 191 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c

diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
index 92f411c..c0679c2 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -74,4 +74,16 @@ config OCTEON_MGMT_ETHERNET
 	  port on Cavium Networks' Octeon CN57XX, CN56XX, CN55XX,
 	  CN54XX, CN52XX, and CN6XXX chips.
 
+config LIQUIDIO_VF
+	tristate "Cavium LiquidIO VF support"
+	depends on 64BIT && PCI_MSI
+	select PTP_1588_CLOCK
+	---help---
+	  This driver supports Cavium LiquidIO Intelligent Server Adapter
+	  based on CN23XX chips.
+
+	  To compile this driver as a module, choose M here: The module
+	  will be called liquidio_vf. MSI-X interrupt support is required
+	  for this driver to work correctly
+
 endif # NET_VENDOR_CAVIUM
diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile
index 14958de..69d23fc 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -17,3 +17,24 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
 			octeon_nic.o
 
 liquidio-objs := lio_main.o octeon_console.o $(liquidio-y)
+
+obj-$(CONFIG_LIQUIDIO_VF) += liquidio_vf.o
+
+ifeq ($(CONFIG_LIQUIDIO)$(CONFIG_LIQUIDIO_VF), yy)
+	liquidio_vf-objs := lio_vf_main.o
+else
+liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
+			lio_core.o         \
+			request_manager.o  \
+			response_manager.o \
+			octeon_device.o    \
+			cn66xx_device.o    \
+			cn68xx_device.o    \
+			cn23xx_pf_device.o \
+			octeon_mailbox.o   \
+			octeon_mem_ops.o   \
+			octeon_droq.o      \
+			octeon_nic.o
+
+liquidio_vf-objs := lio_vf_main.o $(liquidio_vf-y)
+endif
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
new file mode 100644
index 0000000..015b6d4
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -0,0 +1,34 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+/*! \file  cn23xx_device.h
+ * \brief Host Driver: Routines that perform CN23XX specific operations.
+ */
+
+#ifndef __CN23XX_VF_DEVICE_H__
+#define __CN23XX_VF_DEVICE_H__
+
+#include "cn23xx_vf_regs.h"
+
+/* Register address and configuration for a CN23XX devices.
+ * If device specific changes need to be made then add a struct to include
+ * device specific fields as shown in the commented section
+ */
+struct octeon_cn23xx_vf {
+	struct octeon_config *conf;
+};
+#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
new file mode 100644
index 0000000..d1b1a24
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -0,0 +1,120 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+#include <linux/pci.h>
+#include <net/vxlan.h>
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+
+MODULE_AUTHOR("Cavium Networks, <support@cavium.com>");
+MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual Function Driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(LIQUIDIO_VERSION);
+
+struct octeon_device_priv {
+	/* Tasklet structures for this device. */
+	struct tasklet_struct droq_tasklet;
+	unsigned long napi_mask;
+};
+
+static int
+liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
+static void liquidio_vf_remove(struct pci_dev *pdev);
+
+static const struct pci_device_id liquidio_vf_pci_tbl[] = {
+	{
+		PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID,
+		PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0
+	},
+	{
+		0, 0, 0, 0, 0, 0, 0
+	}
+};
+MODULE_DEVICE_TABLE(pci, liquidio_vf_pci_tbl);
+
+static struct pci_driver liquidio_vf_pci_driver = {
+	.name		= "LiquidIO_VF",
+	.id_table	= liquidio_vf_pci_tbl,
+	.probe		= liquidio_vf_probe,
+	.remove		= liquidio_vf_remove,
+};
+
+/**
+ * \brief PCI probe handler
+ * @param pdev PCI device structure
+ * @param ent unused
+ */
+static int
+liquidio_vf_probe(struct pci_dev *pdev,
+		  const struct pci_device_id *ent __attribute__((unused)))
+{
+	struct octeon_device *oct_dev = NULL;
+
+	oct_dev = octeon_allocate_device(pdev->device,
+					 sizeof(struct octeon_device_priv));
+
+	if (!oct_dev) {
+		dev_err(&pdev->dev, "Unable to allocate device\n");
+		return -ENOMEM;
+	}
+
+	dev_info(&pdev->dev, "Initializing device %x:%x.\n",
+		 (u32)pdev->vendor, (u32)pdev->device);
+
+	/* Assign octeon_device for this device to the private data area. */
+	pci_set_drvdata(pdev, oct_dev);
+
+	/* set linux specific device pointer */
+	oct_dev->pci_dev = (void *)pdev;
+
+	return 0;
+}
+
+/**
+ * \brief Cleans up resources at unload time
+ * @param pdev PCI device structure
+ */
+static void liquidio_vf_remove(struct pci_dev *pdev)
+{
+	struct octeon_device *oct_dev = pci_get_drvdata(pdev);
+
+	dev_dbg(&oct_dev->pci_dev->dev, "Stopping device\n");
+
+	/* This octeon device has been removed. Update the global
+	 * data structure to reflect this. Free the device structure.
+	 */
+	octeon_free_device_mem(oct_dev);
+}
+
+static int __init liquidio_vf_init(void)
+{
+	octeon_init_device_list(0);
+	return pci_register_driver(&liquidio_vf_pci_driver);
+}
+
+static void __exit liquidio_vf_exit(void)
+{
+	pci_unregister_driver(&liquidio_vf_pci_driver);
+
+	pr_info("LiquidIO_VF network module is now unloaded\n");
+}
+
+module_init(liquidio_vf_init);
+module_exit(liquidio_vf_exit);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 79c8875..05bb0fd 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -28,6 +28,7 @@
 #include "cn66xx_regs.h"
 #include "cn66xx_device.h"
 #include "cn23xx_pf_device.h"
+#include "cn23xx_vf_device.h"
 
 /** Default configuration
  *  for CN66XX OCTEON Models.
@@ -672,6 +673,9 @@ static struct octeon_device *octeon_allocate_device_mem(u32 pci_id,
 	case OCTEON_CN23XX_PF_VID:
 		configsize = sizeof(struct octeon_cn23xx_pf);
 		break;
+	case OCTEON_CN23XX_VF_VID:
+		configsize = sizeof(struct octeon_cn23xx_vf);
+		break;
 	default:
 		pr_err("%s: Unknown PCI Device: 0x%x\n",
 		       __func__,
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/9] liquidio CN23XX: VF config setup
From: Raghu Vatsavayi @ 2016-11-28  4:51 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480308702-6261-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for setting up VF configuration.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/liquidio/Makefile      |   1 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    |  44 +++++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   2 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 136 +++++++++++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   3 +
 5 files changed, 186 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c

diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile
index 69d23fc..cca903a 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -31,6 +31,7 @@ liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
 			cn66xx_device.o    \
 			cn68xx_device.o    \
 			cn23xx_pf_device.o \
+			cn23xx_vf_device.o \
 			octeon_mailbox.o   \
 			octeon_mem_ops.o   \
 			octeon_droq.o      \
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
new file mode 100644
index 0000000..d683bda
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -0,0 +1,44 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+#include "cn23xx_vf_device.h"
+#include "octeon_main.h"
+
+int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
+{
+	struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+
+	if (octeon_map_pci_barx(oct, 0, 0))
+		return 1;
+
+	cn23xx->conf  = oct_get_config_info(oct, LIO_23XX);
+	if (!cn23xx->conf) {
+		dev_err(&oct->pci_dev->dev, "%s No Config found for CN23XX\n",
+			__func__);
+		octeon_unmap_pci_barx(oct, 0);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 015b6d4..9e4fb50 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -31,4 +31,6 @@
 struct octeon_cn23xx_vf {
 	struct octeon_config *conf;
 };
+
+int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
 #endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index d1b1a24..721ee66 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -22,6 +22,8 @@
 #include "octeon_iq.h"
 #include "response_manager.h"
 #include "octeon_device.h"
+#include "octeon_main.h"
+#include "cn23xx_vf_device.h"
 
 MODULE_AUTHOR("Cavium Networks, <support@cavium.com>");
 MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual Function Driver");
@@ -37,6 +39,7 @@ struct octeon_device_priv {
 static int
 liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
 static void liquidio_vf_remove(struct pci_dev *pdev);
+static int octeon_device_init(struct octeon_device *oct);
 
 static const struct pci_device_id liquidio_vf_pci_tbl[] = {
 	{
@@ -84,10 +87,78 @@ struct octeon_device_priv {
 	/* set linux specific device pointer */
 	oct_dev->pci_dev = (void *)pdev;
 
+	if (octeon_device_init(oct_dev)) {
+		liquidio_vf_remove(pdev);
+		return -ENOMEM;
+	}
+
+	dev_dbg(&oct_dev->pci_dev->dev, "Device is ready\n");
+
 	return 0;
 }
 
 /**
+ * \brief PCI FLR for each Octeon device.
+ * @param oct octeon device
+ */
+static void octeon_pci_flr(struct octeon_device *oct)
+{
+	u16 status;
+
+	pci_save_state(oct->pci_dev);
+
+	pci_cfg_access_lock(oct->pci_dev);
+
+	/* Quiesce the device completely */
+	pci_write_config_word(oct->pci_dev, PCI_COMMAND,
+			      PCI_COMMAND_INTX_DISABLE);
+
+	/* Wait for Transaction Pending bit clean */
+	msleep(100);
+	pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA, &status);
+	if (status & PCI_EXP_DEVSTA_TRPND) {
+		dev_info(&oct->pci_dev->dev, "Function reset incomplete after 100ms, sleeping for 5 seconds\n");
+		ssleep(5);
+		pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA,
+					  &status);
+		if (status & PCI_EXP_DEVSTA_TRPND)
+			dev_info(&oct->pci_dev->dev, "Function reset still incomplete after 5s, reset anyway\n");
+	}
+	pcie_capability_set_word(oct->pci_dev, PCI_EXP_DEVCTL,
+				 PCI_EXP_DEVCTL_BCR_FLR);
+	mdelay(100);
+
+	pci_cfg_access_unlock(oct->pci_dev);
+
+	pci_restore_state(oct->pci_dev);
+}
+
+/**
+ *\brief Destroy resources associated with octeon device
+ * @param pdev PCI device structure
+ * @param ent unused
+ */
+static void octeon_destroy_resources(struct octeon_device *oct)
+{
+	switch (atomic_read(&oct->status)) {
+	case OCT_DEV_PCI_MAP_DONE:
+		octeon_unmap_pci_barx(oct, 0);
+		octeon_unmap_pci_barx(oct, 1);
+
+	/* fallthrough */
+	case OCT_DEV_PCI_ENABLE_DONE:
+		pci_clear_master(oct->pci_dev);
+		/* Disable the device, releasing the PCI INT */
+		pci_disable_device(oct->pci_dev);
+
+	/* fallthrough */
+	case OCT_DEV_BEGIN_STATE:
+		/* Nothing to be done here either */
+		break;
+	}
+}
+
+/**
  * \brief Cleans up resources at unload time
  * @param pdev PCI device structure
  */
@@ -97,12 +168,77 @@ static void liquidio_vf_remove(struct pci_dev *pdev)
 
 	dev_dbg(&oct_dev->pci_dev->dev, "Stopping device\n");
 
+	/* Reset the octeon device and cleanup all memory allocated for
+	 * the octeon device by driver.
+	 */
+	octeon_destroy_resources(oct_dev);
+
+	dev_info(&oct_dev->pci_dev->dev, "Device removed\n");
+
 	/* This octeon device has been removed. Update the global
 	 * data structure to reflect this. Free the device structure.
 	 */
 	octeon_free_device_mem(oct_dev);
 }
 
+/**
+ * \brief PCI initialization for each Octeon device.
+ * @param oct octeon device
+ */
+static int octeon_pci_os_setup(struct octeon_device *oct)
+{
+#ifdef CONFIG_PCI_IOV
+	/* setup PCI stuff first */
+	if (!oct->pci_dev->physfn)
+		octeon_pci_flr(oct);
+#endif
+
+	if (pci_enable_device(oct->pci_dev)) {
+		dev_err(&oct->pci_dev->dev, "pci_enable_device failed\n");
+		return 1;
+	}
+
+	if (dma_set_mask_and_coherent(&oct->pci_dev->dev, DMA_BIT_MASK(64))) {
+		dev_err(&oct->pci_dev->dev, "Unexpected DMA device capability\n");
+		pci_disable_device(oct->pci_dev);
+		return 1;
+	}
+
+	/* Enable PCI DMA Master. */
+	pci_set_master(oct->pci_dev);
+
+	return 0;
+}
+
+/**
+ * \brief Device initialization for each Octeon device that is probed
+ * @param octeon_dev  octeon device
+ */
+static int octeon_device_init(struct octeon_device *oct)
+{
+	u32 rev_id;
+
+	atomic_set(&oct->status, OCT_DEV_BEGIN_STATE);
+
+	/* Enable access to the octeon device and make its DMA capability
+	 * known to the OS.
+	 */
+	if (octeon_pci_os_setup(oct))
+		return 1;
+	atomic_set(&oct->status, OCT_DEV_PCI_ENABLE_DONE);
+
+	oct->chip_id = OCTEON_CN23XX_VF_VID;
+	pci_read_config_dword(oct->pci_dev, 8, &rev_id);
+	oct->rev_id = rev_id & 0xff;
+
+	if (cn23xx_setup_octeon_vf_device(oct))
+		return 1;
+
+	atomic_set(&oct->status, OCT_DEV_PCI_MAP_DONE);
+
+	return 0;
+}
+
 static int __init liquidio_vf_init(void)
 {
 	octeon_init_device_list(0);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 05bb0fd..b4c8ee4 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -581,6 +581,8 @@ static void *__retrieve_octeon_config_info(struct octeon_device *oct,
 			ret =  (void *)&default_cn68xx_conf;
 		} else if (oct->chip_id == OCTEON_CN23XX_PF_VID) {
 			ret =  (void *)&default_cn23xx_conf;
+		} else if (oct->chip_id == OCTEON_CN23XX_VF_VID) {
+			ret =  (void *)&default_cn23xx_conf;
 		}
 		break;
 	default:
@@ -596,6 +598,7 @@ static int __verify_octeon_config_info(struct octeon_device *oct, void *conf)
 	case OCTEON_CN68XX:
 		return lio_validate_cn6xxx_config_info(oct, conf);
 	case OCTEON_CN23XX_PF_VID:
+	case OCTEON_CN23XX_VF_VID:
 		return 0;
 	default:
 		break;
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox