* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: John Fastabend @ 2016-11-28 2:26 UTC (permalink / raw)
To: Roi Dayan, Daniel Borkmann, Cong Wang
Cc: Linux Kernel Network Developers, Jiri Pirko
In-Reply-To: <583A7D67.50003@mellanox.com>
On 16-11-26 10:29 PM, Roi Dayan wrote:
>
>
> On 27/11/2016 06:47, Roi Dayan wrote:
>>
>>
>> On 27/11/2016 02:33, Daniel Borkmann wrote:
>>> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
>>>> On 11/26/2016 07:46 AM, Cong Wang wrote:
>>>>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann
>>>>> <daniel@iogearbox.net> wrote:
>>> [...]
>>>>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>>>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>>>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>>>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>>>>> Outstanding readers should either bail out due to if (!cl) or can
>>>>>> still
>>>>>> process the chain until read section ends, but during that time,
>>>>>> cl->q
>>>>>> resp. bstats should be good. Do you happen to know what's at address
>>>>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(),
>>>>>> but
>>>>>> at least on ingress (netif_receive_skb_internal()) we hold
>>>>>> rcu_read_lock()
>>>>>> here. The KASAN report is reliably happening at this location, right?
>>>>>
>>>>> I am confused as well, I don't see how it could be related to my
>>>>> patch yet.
>>>>> I will take a deep look in the weekend.
>>
>>
>>
>> Hi Cong,
>>
>> When reported the new trace I didn't mean it's related to your patch,
>> I just wanted to point it out it exposed something. I should have been
>> clear about it.
>>
>>
>>>>
>>>> Ok, I'm currently on the run. Got too late yesterday night, but I'll
>>>> write what I found in the evening today, not related to ingress though.
>>>
>>> Just pushed out my analysis to netdev under "[PATCH net] net, sched:
>>> respect
>>> rcu grace period on cls destruction". My conclusion is that both
>>> issues are
>>> actually separate, and that one is small enough where we could route
>>> it via
>>> net actually. Perhaps this at the same time shrinks your "[PATCH
>>> net-next]
>>> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
>>> reasonable size that it's suitable to net as well. Your
>>> ->delete()/->destroy()
>>> one is definitely needed, too. The tp->root one is independant of
>>> ->delete()/
>>> ->destroy() as they are different races and tp->root could also
>>> happen when
>>> you just destroy the whole tp directly. I think that seems like a
>>> good path
>>> forward to me.
>>>
>>> Thanks,
>>> Daniel
>>
>>
>>
>> Hi Daniel,
>>
>> As for the tainted kernel. I was in old (week or two) net-next tree
>> and only cherry-picked from latest net-next related patches to
>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted
>> modules.
>> I have the issue reproducing in that tree so wanted it to check it
>> with Cong's patch instead of latest net-next.
>> I'll try running reproducing the issue with your new patch and later
>> try latest net-next as well.
>>
>> Thanks,
>> Roi
>>
>
> Hi,
>
> I tested "[PATCH net] net, sched: respect rcu grace period on cls
> destruction" and could not reproduce my original issue.
Hi Roi,
Just so I'm 100% clear. No issue with just the above "respect rcu grace
period on cls destruction" per above statement.
> I rebased "[Patch net-next] net_sched: move the empty tp check from
> ->destroy() to ->delete()" over to test it in the same tree and got into
> a new trace in fl_delete.
In this case did you test with "net_sched: move the empty tp check from
->destroy() to ->delete()" _only_ or did this include both patches when
you see the error below.
>From my inspection we really need both patches to get correct behavior.
Thanks!
John
>
> [35659.012123] BUG: KASAN: wild-memory-access on address 1ffffffff803ca31
> [35659.020042] Write of size 1 by task ovs-vswitchd/20135
> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted:
> G O 4.9.0-rc3+ #18
> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
> [35659.043730] Call Trace:
> [35659.046619] [<ffffffff95b6dc42>] dump_stack+0x63/0x81
> [35659.052456] [<ffffffff955fbbf8>] kasan_report_error+0x408/0x4e0
> [35659.059402] [<ffffffff955fc2e8>] kasan_report+0x58/0x60
> [35659.065428] [<ffffffff952d5e8d>] ? call_rcu_sched+0x1d/0x20
> [35659.072119] [<ffffffffc01e0701>] ? fl_destroy_filter+0x21/0x30
> [cls_flower]
> [35659.080217] [<ffffffffc01e1ccf>] ? fl_delete+0x1df/0x2e0 [cls_flower]
> [35659.087580] [<ffffffff955fa4ca>] __asan_store1+0x4a/0x50
> [35659.093697] [<ffffffffc01e1ccf>] fl_delete+0x1df/0x2e0 [cls_flower]
> [35659.100870] [<ffffffff9653ecba>] tc_ctl_tfilter+0x10da/0x1b90
>
>
> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
> 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
> 801
> 802 rhashtable_remove_fast(&head->ht, &f->ht_node,
> 803 head->ht_params);
> 804 __fl_delete(tp, f);
> 805 *last = list_empty(&head->filters);
> 806 return 0;
> 807 }
>
>
> Thanks,
> Roi
^ permalink raw reply
* RE: [net,v2] neigh: fix the loop index error in neigh dump
From: 张胜举 @ 2016-11-28 2:34 UTC (permalink / raw)
To: 'David Ahern', netdev
In-Reply-To: <caa11887-00b9-d2f1-7c0b-5b5096b42f56@cumulusnetworks.com>
> -----Original Message-----
> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:10 AM
> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>
> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> > Loop index in neigh dump function is not updated correctly under some
> > circumstances, this patch will fix it.
>
> What's an example?
If dev is filtered out, the original code goes to next loop without updating
loop index 'idx'.
>
> >
> > Fixes: 16660f0bd9 ("net: Add support for filtering neigh dump by
> > device index")
> > Fixes: 21fdd092ac ("net: Add support for filtering neigh dump by
> > master device")
> >
> > Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
> > ---
> > net/core/neighbour.c | 39 ++++++++++++++++++---------------------
> > 1 file changed, 18 insertions(+), 21 deletions(-)
> >
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> > 2ae929f..ce32e9c 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct
> net_device *dev, int filter_idx)
> > return false;
> > }
> >
> > +static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
> > + int filter_master_idx)
> > +{
> > + if (neigh_ifindex_filtered(dev, filter_idx) ||
> > + neigh_master_filtered(dev, filter_master_idx))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff
*skb,
> > struct netlink_callback *cb)
> > {
> > @@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> > rcu_read_lock_bh();
> > nht = rcu_dereference_bh(tbl->nht);
> >
> > - for (h = s_h; h < (1 << nht->hash_shift); h++) {
> > - if (h > s_h)
> > - s_idx = 0;
> > + for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
> > for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
> > n != NULL;
> > - n = rcu_dereference_bh(n->next)) {
> > - if (!net_eq(dev_net(n->dev), net))
> > - continue;
> > - if (neigh_ifindex_filtered(n->dev, filter_idx))
> > + n = rcu_dereference_bh(n->next), idx++) {
> > + if (idx < s_idx || !net_eq(dev_net(n->dev), net))
> > continue;
> > - if (neigh_master_filtered(n->dev,
filter_master_idx))
> > + if (neigh_dump_filtered(n->dev, filter_idx,
> > + filter_master_idx))
> > continue;
> > - if (idx < s_idx)
> > - goto next;
> > if (neigh_fill_info(skb, n, NETLINK_CB(cb-
> >skb).portid,
> > cb->nlh->nlmsg_seq,
> > RTM_NEWNEIGH,
> > @@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> > rc = -1;
> > goto out;
> > }
> > -next:
> > - idx++;
> > }
> > }
> > rc = skb->len;
> > @@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct
> > neigh_table *tbl, struct sk_buff *skb,
> >
> > read_lock_bh(&tbl->lock);
> >
> > - for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
> > - if (h > s_h)
> > - s_idx = 0;
> > - for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
> > - if (pneigh_net(n) != net)
> > + for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
> > + for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next,
idx++)
> {
> > + if (idx < s_idx || pneigh_net(n) != net)
> > continue;
> > - if (idx < s_idx)
> > - goto next;
> > if (pneigh_fill_info(skb, n, NETLINK_CB(cb-
> >skb).portid,
> > cb->nlh->nlmsg_seq,
> > RTM_NEWNEIGH,
> > @@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table
> *tbl, struct sk_buff *skb,
> > rc = -1;
> > goto out;
> > }
> > - next:
> > - idx++;
> > }
> > }
> >
>
> This fix is way to be complicated to be fixing anything related to
16660f0bd9
> or 21fdd092ac. Both of those commits added a continue:
>
> if (neigh_ifindex_filtered(n->dev, filter_idx))
> continue;
> if (neigh_master_filtered(n->dev,
filter_master_idx))
> continue;
>
> At best the continue is replaced by 'goto next;' and I am not convinced
that is
> right.
>
> You are completely rewriting the dump loops.
I put 'idx++' into for loop, so I replace 'goto' with 'continue'. The
other change is style related.
Thanks,
Zhang Shengju
^ permalink raw reply
* Re: [net,v2] neigh: fix the loop index error in neigh dump
From: David Ahern @ 2016-11-28 2:39 UTC (permalink / raw)
To: 张胜举, netdev
In-Reply-To: <001c01d2491f$fcd53250$f67f96f0$@cmss.chinamobile.com>
On 11/27/16 7:34 PM, 张胜举 wrote:
>> -----Original Message-----
>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>> Sent: Monday, November 28, 2016 10:10 AM
>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
>> netdev@vger.kernel.org
>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>
>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>> Loop index in neigh dump function is not updated correctly under some
>>> circumstances, this patch will fix it.
>>
>> What's an example?
>
> If dev is filtered out, the original code goes to next loop without updating
> loop index 'idx'.
And you have a use case with missing or redundant data? Or is your comment based on a review of code only?
>> You are completely rewriting the dump loops.
>
> I put 'idx++' into for loop, so I replace 'goto' with 'continue'. The
> other change is style related.
A "fixes" should not include 'style related' changes.
^ permalink raw reply
* [PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation
From: Florian Fainelli @ 2016-11-28 2:45 UTC (permalink / raw)
To: netdev
Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
peppe.cavallaro, timur, jbrunet, Florian Fainelli
Hi all,
This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.
Changes in v3:
- add Timur's feedback into patch 3
Changes in v2:
- clarify a few things in the RGMII section, add a paragraph about common issues
with RGMII delay mismatches
Florian Fainelli (4):
Documentation: net: phy: remove description of function pointers
Documentation: net: phy: Add a paragraph about pause frames/flow
control
Documentation: net: phy: Add blurb about RGMII
Documentation: net: phy: Add links to several standards documents
Documentation/networking/phy.txt | 140 +++++++++++++++++++++++++++++----------
1 file changed, 105 insertions(+), 35 deletions(-)
--
2.9.3
^ permalink raw reply
* [PATCH net-next v3 1/4] Documentation: net: phy: remove description of function pointers
From: Florian Fainelli @ 2016-11-28 2:45 UTC (permalink / raw)
To: netdev
Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Documentation/networking/phy.txt | 35 ++---------------------------------
1 file changed, 2 insertions(+), 33 deletions(-)
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 7ab9404a8412..4b25c0f24201 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -251,39 +251,8 @@ Writing a PHY driver
PHY_BASIC_FEATURES, but you can look in include/mii.h for other
features.
- Each driver consists of a number of function pointers:
-
- soft_reset: perform a PHY software reset
- config_init: configures PHY into a sane state after a reset.
- For instance, a Davicom PHY requires descrambling disabled.
- probe: Allocate phy->priv, optionally refuse to bind.
- PHY may not have been reset or had fixups run yet.
- suspend/resume: power management
- config_aneg: Changes the speed/duplex/negotiation settings
- aneg_done: Determines the auto-negotiation result
- read_status: Reads the current speed/duplex/negotiation settings
- ack_interrupt: Clear a pending interrupt
- did_interrupt: Checks if the PHY generated an interrupt
- config_intr: Enable or disable interrupts
- remove: Does any driver take-down
- ts_info: Queries about the HW timestamping status
- match_phy_device: used for Clause 45 capable PHYs to match devices
- in package and ensure they are compatible
- hwtstamp: Set the PHY HW timestamping configuration
- rxtstamp: Requests a receive timestamp at the PHY level for a 'skb'
- txtsamp: Requests a transmit timestamp at the PHY level for a 'skb'
- set_wol: Enable Wake-on-LAN at the PHY level
- get_wol: Get the Wake-on-LAN status at the PHY level
- link_change_notify: called to inform the core is about to change the
- link state, can be used to work around bogus PHY between state changes
- read_mmd_indirect: Read PHY MMD indirect register
- write_mmd_indirect: Write PHY MMD indirect register
- module_info: Get the size and type of an EEPROM contained in an plug-in
- module
- module_eeprom: Get EEPROM information of a plug-in module
- get_sset_count: Get number of strings sets that get_strings will count
- get_strings: Get strings from requested objects (statistics)
- get_stats: Get the extended statistics from the PHY device
+ Each driver consists of a number of function pointers, documented
+ in include/linux/phy.h under the phy_driver structure.
Of these, only config_aneg and read_status are required to be
assigned by the driver code. The rest are optional. Also, it is
--
2.9.3
^ permalink raw reply related
* [PATCH net-next v3 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control
From: Florian Fainelli @ 2016-11-28 2:45 UTC (permalink / raw)
To: netdev
Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>
Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Documentation/networking/phy.txt | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 4b25c0f24201..9a42a9414cea 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything
values pruned from them which don't make sense for your controller (a 10/100
controller may be connected to a gigabit capable PHY, so you would need to
mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
- for these bitfields. Note that you should not SET any bits, or the PHY may
- get put into an unsupported state.
+ for these bitfields. Note that you should not SET any bits, except the
+ SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
+ put into an unsupported state.
Lastly, once the controller is ready to handle network traffic, you call
phy_start(phydev). This tells the PAL that you are ready, and configures the
@@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything
When you want to disconnect from the network (even if just briefly), you call
phy_stop(phydev).
+Pause frames / flow control
+
+ The PHY does not participate directly in flow control/pause frames except by
+ making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
+ MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
+ controller supports such a thing. Since flow control/pause frames generation
+ involves the Ethernet MAC driver, it is recommended that this driver takes care
+ of properly indicating advertisement and support for such features by setting
+ the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
+ either before or after phy_connect() and/or as a result of implementing the
+ ethtool::set_pauseparam feature.
+
+
Keeping Close Tabs on the PAL
It is possible that the PAL's built-in state machine needs a little help to
--
2.9.3
^ permalink raw reply related
* [PATCH net-next v3 3/4] Documentation: net: phy: Add blurb about RGMII
From: Florian Fainelli @ 2016-11-28 2:45 UTC (permalink / raw)
To: netdev
Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>
RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Documentation/networking/phy.txt | 77 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 9a42a9414cea..c7ba84b5d912 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -65,6 +65,83 @@ The MDIO bus
drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
+(RG)MII/electrical interface considerations
+
+ The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
+ electrical signal interface using a synchronous 125Mhz clock signal and several
+ data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
+ between the clock line (RXC or TXC) and the data lines to let the PHY (clock
+ sink) have enough setup and hold times to sample the data lines correctly. The
+ PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
+ the PHY driver and optionally the MAC driver, implement the required delay. The
+ values of phy_interface_t must be understood from the perspective of the PHY
+ device itself, leading to the following:
+
+ * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
+ internal delay by itself, it assumes that either the Ethernet MAC (if capable
+ or the PCB traces) insert the correct 1.5-2ns delay
+
+ * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
+ for the transmit data lines (TXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
+ for the receive data lines (RXD[3:0]) processed by the PHY device
+
+ * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
+ both transmit AND receive data lines from/to the PHY device
+
+ Whenever possible, use the PHY side RGMII delay for these reasons:
+
+ * PHY devices may offer sub-nanosecond granularity in how they allow a
+ receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
+ precision may be required to account for differences in PCB trace lengths
+
+ * PHY devices are typically qualified for a large range of applications
+ (industrial, medical, automotive...), and they provide a constant and
+ reliable delay across temperature/pressure/voltage ranges
+
+ * PHY device drivers in PHYLIB being reusable by nature, being able to
+ configure correctly a specified delay enables more designs with similar delay
+ requirements to be operate correctly
+
+ For cases where the PHY is not capable of providing this delay, but the
+ Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
+ should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
+ configured correctly in order to provide the required transmit and/or receive
+ side delay from the perspective of the PHY device. Conversely, if the Ethernet
+ MAC driver looks at the phy_interface_t value, for any other mode but
+ PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
+ disabled.
+
+ In case neither the Ethernet MAC, nor the PHY are capable of providing the
+ required delays, as defined per the RGMII standard, several options may be
+ available:
+
+ * Some SoCs may offer a pin pad/mux/controller capable of configuring a given
+ set of pins'strength, delays, and voltage; and it may be a suitable
+ option to insert the expected 2ns RGMII delay.
+
+ * Modifying the PCB design to include a fixed delay (e.g: using a specifically
+ designed serpentine), which may not require software configuration at all.
+
+Common problems with RGMII delay mismatch
+
+ When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
+ will most likely result in the clock and data line signals to be unstable when
+ the PHY or MAC take a snapshot of these signals to translate them into logical
+ 1 or 0 states and reconstruct the data being transmitted/received. Typical
+ symptoms include:
+
+ * Transmission/reception partially works, and there is frequent or occasional
+ packet loss observed
+
+ * Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
+ or just discard them all
+
+ * Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
+ (since there is enough setup/hold time in that case)
+
+
Connecting to a PHY
Sometime during startup, the network driver needs to establish a connection
--
2.9.3
^ permalink raw reply related
* [PATCH net-next v3 4/4] Documentation: net: phy: Add links to several standards documents
From: Florian Fainelli @ 2016-11-28 2:45 UTC (permalink / raw)
To: netdev
Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
peppe.cavallaro, timur, jbrunet, Florian Fainelli
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Documentation/networking/phy.txt | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index c7ba84b5d912..e017d933d530 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -407,3 +407,13 @@ Board Fixups
The stubs set one of the two matching criteria, and set the other one to
match anything.
+Standards
+
+ IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
+ http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
+
+ RGMII v1.3:
+ http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
+
+ RGMII v2.0:
+ http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf
--
2.9.3
^ permalink raw reply related
* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: John Fastabend @ 2016-11-28 2:51 UTC (permalink / raw)
To: Roi Dayan, Daniel Borkmann, Cong Wang
Cc: Linux Kernel Network Developers, Jiri Pirko
In-Reply-To: <583B95CE.7080309@gmail.com>
On 16-11-27 06:26 PM, John Fastabend wrote:
> On 16-11-26 10:29 PM, Roi Dayan wrote:
>>
>>
>> On 27/11/2016 06:47, Roi Dayan wrote:
>>>
>>>
>>> On 27/11/2016 02:33, Daniel Borkmann wrote:
>>>> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
>>>>> On 11/26/2016 07:46 AM, Cong Wang wrote:
>>>>>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann
>>>>>> <daniel@iogearbox.net> wrote:
>>>> [...]
>>>>>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>>>>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>>>>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>>>>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>>>>>> Outstanding readers should either bail out due to if (!cl) or can
>>>>>>> still
>>>>>>> process the chain until read section ends, but during that time,
>>>>>>> cl->q
>>>>>>> resp. bstats should be good. Do you happen to know what's at address
>>>>>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(),
>>>>>>> but
>>>>>>> at least on ingress (netif_receive_skb_internal()) we hold
>>>>>>> rcu_read_lock()
>>>>>>> here. The KASAN report is reliably happening at this location, right?
>>>>>>
>>>>>> I am confused as well, I don't see how it could be related to my
>>>>>> patch yet.
>>>>>> I will take a deep look in the weekend.
>>>
>>>
>>>
>>> Hi Cong,
>>>
>>> When reported the new trace I didn't mean it's related to your patch,
>>> I just wanted to point it out it exposed something. I should have been
>>> clear about it.
>>>
>>>
>>>>>
>>>>> Ok, I'm currently on the run. Got too late yesterday night, but I'll
>>>>> write what I found in the evening today, not related to ingress though.
>>>>
>>>> Just pushed out my analysis to netdev under "[PATCH net] net, sched:
>>>> respect
>>>> rcu grace period on cls destruction". My conclusion is that both
>>>> issues are
>>>> actually separate, and that one is small enough where we could route
>>>> it via
>>>> net actually. Perhaps this at the same time shrinks your "[PATCH
>>>> net-next]
>>>> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
>>>> reasonable size that it's suitable to net as well. Your
>>>> ->delete()/->destroy()
>>>> one is definitely needed, too. The tp->root one is independant of
>>>> ->delete()/
>>>> ->destroy() as they are different races and tp->root could also
>>>> happen when
>>>> you just destroy the whole tp directly. I think that seems like a
>>>> good path
>>>> forward to me.
>>>>
>>>> Thanks,
>>>> Daniel
>>>
>>>
>>>
>>> Hi Daniel,
>>>
>>> As for the tainted kernel. I was in old (week or two) net-next tree
>>> and only cherry-picked from latest net-next related patches to
>>> Mellanox HCA, cls_api, cls_flower, devlink. so those are the tainted
>>> modules.
>>> I have the issue reproducing in that tree so wanted it to check it
>>> with Cong's patch instead of latest net-next.
>>> I'll try running reproducing the issue with your new patch and later
>>> try latest net-next as well.
>>>
>>> Thanks,
>>> Roi
>>>
>>
>> Hi,
>>
>> I tested "[PATCH net] net, sched: respect rcu grace period on cls
>> destruction" and could not reproduce my original issue.
>
> Hi Roi,
>
> Just so I'm 100% clear. No issue with just the above "respect rcu grace
> period on cls destruction" per above statement.
>
>> I rebased "[Patch net-next] net_sched: move the empty tp check from
>> ->destroy() to ->delete()" over to test it in the same tree and got into
>> a new trace in fl_delete.
>
> In this case did you test with "net_sched: move the empty tp check from
> ->destroy() to ->delete()" _only_ or did this include both patches when
> you see the error below.
>
> From my inspection we really need both patches to get correct behavior.
>
> Thanks!
> John
Ah dang nevermind I just read both patches in detail and applying them
both at the same time is nonsense. Let me reply with comments directly
to the patches.
Thanks. sorry for the noise.
>
>>
>> [35659.012123] BUG: KASAN: wild-memory-access on address 1ffffffff803ca31
>> [35659.020042] Write of size 1 by task ovs-vswitchd/20135
>> [35659.025878] CPU: 19 PID: 20135 Comm: ovs-vswitchd Tainted:
>> G O 4.9.0-rc3+ #18
>> [35659.035948] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
>> [35659.043730] Call Trace:
>> [35659.046619] [<ffffffff95b6dc42>] dump_stack+0x63/0x81
>> [35659.052456] [<ffffffff955fbbf8>] kasan_report_error+0x408/0x4e0
>> [35659.059402] [<ffffffff955fc2e8>] kasan_report+0x58/0x60
>> [35659.065428] [<ffffffff952d5e8d>] ? call_rcu_sched+0x1d/0x20
>> [35659.072119] [<ffffffffc01e0701>] ? fl_destroy_filter+0x21/0x30
>> [cls_flower]
>> [35659.080217] [<ffffffffc01e1ccf>] ? fl_delete+0x1df/0x2e0 [cls_flower]
>> [35659.087580] [<ffffffff955fa4ca>] __asan_store1+0x4a/0x50
>> [35659.093697] [<ffffffffc01e1ccf>] fl_delete+0x1df/0x2e0 [cls_flower]
>> [35659.100870] [<ffffffff9653ecba>] tc_ctl_tfilter+0x10da/0x1b90
>>
>>
>> 0x1d02 is in fl_delete (net/sched/cls_flower.c:805).
>> 800 struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
>> 801
>> 802 rhashtable_remove_fast(&head->ht, &f->ht_node,
>> 803 head->ht_params);
>> 804 __fl_delete(tp, f);
>> 805 *last = list_empty(&head->filters);
>> 806 return 0;
>> 807 }
>>
>>
>> Thanks,
>> Roi
>
^ permalink raw reply
* [PATCH] net: handle no dst on skb in icmp6_send
From: David Ahern @ 2016-11-28 2:52 UTC (permalink / raw)
To: netdev; +Cc: andreyknvl, David Ahern
Andrey reported the following while fuzzing the kernel with syzkaller:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[<ffffffff833617ec>] [<ffffffff833617ec>]
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0 EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS: 00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
[<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
[< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
[<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
[<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
[<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
...
icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
cases the dst->dev should be preferred for determining the L3 domain
if the dst has been set on the skb. Fallback to the skb->dev if it has
not. This covers the case reported here where icmp6_send is invoked on
Rx before the route lookup.
Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
net/ipv6/icmp.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 7370ad2e693a..2772004ba5a1 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -447,8 +447,10 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
if (__ipv6_addr_needs_scope_id(addr_type))
iif = skb->dev->ifindex;
- else
- iif = l3mdev_master_ifindex(skb_dst(skb)->dev);
+ else {
+ dst = skb_dst(skb);
+ iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev);
+ }
/*
* Must not send error if the source does not uniquely
--
2.1.4
^ permalink raw reply related
* RE: [net,v2] neigh: fix the loop index error in neigh dump
From: 张胜举 @ 2016-11-28 2:53 UTC (permalink / raw)
To: 'David Ahern', netdev
In-Reply-To: <6d1a324e-e29e-e2dd-a6fc-1e9b4455cb3d@cumulusnetworks.com>
> -----Original Message-----
> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 10:39 AM
> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>
> On 11/27/16 7:34 PM, 张胜举 wrote:
> >> -----Original Message-----
> >> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> >> Sent: Monday, November 28, 2016 10:10 AM
> >> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
> >> netdev@vger.kernel.org
> >> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> >>
> >> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> >>> Loop index in neigh dump function is not updated correctly under
> >>> some circumstances, this patch will fix it.
> >>
> >> What's an example?
> >
> > If dev is filtered out, the original code goes to next loop without
> > updating loop index 'idx'.
>
> And you have a use case with missing or redundant data? Or is your
> comment based on a review of code only?
It's on my code review. No use case currently, this is uncommon to happen.
>
> >> You are completely rewriting the dump loops.
> >
> > I put 'idx++' into for loop, so I replace 'goto' with 'continue'.
> > The other change is style related.
>
> A "fixes" should not include 'style related' changes.
Okay, I will send another version without style changes.
^ permalink raw reply
* Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
From: John Fastabend @ 2016-11-28 2:55 UTC (permalink / raw)
To: Daniel Borkmann, davem; +Cc: xiyou.wangcong, roid, ast, hannes, jiri, netdev
In-Reply-To: <0d6d89f885033f1739e97f7f3372ae6e1db72892.1480204343.git.daniel@iogearbox.net>
On 16-11-26 04:18 PM, Daniel Borkmann wrote:
> Roi reported a crash in flower where tp->root was NULL in ->classify()
> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
> this doesn't respect RCU grace period for them, and as a result, still
> outstanding readers from tc_classify() will try to blindly dereference
> a NULL tp->root.
>
> The tp->root object is strictly private to the classifier implementation
> and holds internal data the core such as tc_ctl_tfilter() doesn't know
> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
> is only checked for NULL in ->get() callback, but nowhere else. This is
> misleading and seemed to be copied from old classifier code that was not
> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
> fix NULL pointer dereference") moved tp->root initialization into ->init()
> routine, where before it was part of ->change(), so ->get() had to deal
> with tp->root being NULL back then, so that was indeed a valid case, after
> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
> in packet classifiers"); but the NULLifying was reintroduced with the
> RCUification, but it's not correct for every classifier implementation.
>
> In the cases that are fixed here with one exception of cls_cgroup, tp->root
> object is allocated and initialized inside ->init() callback, which is always
> performed at a point in time after we allocate a new tp, which means tp and
> thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
> handler, same for the tp which is kfree_rcu()'ed right when we return
> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
> for such classifiers is always tied to the tp lifetime. The RCU callback
> invocation for the two kfree_rcu() could be out of order, but that's fine
> since both are independent.
>
> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
> means that 1) we don't need a useless NULL check in fast-path and, 2) that
> outstanding readers of that tp in tc_classify() can still execute under
> respect with RCU grace period as it is actually expected.
>
> Things that haven't been touched here: cls_fw and cls_route. They each
> handle tp->root being NULL in ->classify() path for historic reasons, so
> their ->destroy() implementation can stay as is. If someone actually
> cares, they could get cleaned up at some point to avoid the test in fast
> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
> !head should anyone actually be using/testing it, so it at least aligns with
> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
> destruction (to a sleepable context) after RCU grace period as concurrent
> readers might still access it. (Note that in this case we need to hold module
> reference to keep work callback address intact, since we only wait on module
> unload for all call_rcu()s to finish.)
>
> This fixes one race to bring RCU grace period guarantees back. Next step
> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
> proto tp when all filters are gone") to get the order of unlinking the tp
> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
> removal be done through the prior ->delete() callback. Both are independant
> issues. Once we have that right, we can then clean tp->root up for a number
> of classifiers by not making them RCU pointers, which requires a new callback
> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
> tp->root from there.
Thanks looks good to me and appreciate the detailed commit message.
Acked-by: John Fastabend <john.r.fastabend@intel.com>
>
> Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
> Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
> Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
> Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU")
> Reported-by: Roi Dayan <roid@mellanox.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Roi Dayan <roid@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> ---
^ permalink raw reply
* Re: [net,v2] neigh: fix the loop index error in neigh dump
From: David Ahern @ 2016-11-28 2:56 UTC (permalink / raw)
To: 张胜举, netdev
In-Reply-To: <001d01d24922$9a16b1e0$ce4415a0$@cmss.chinamobile.com>
On 11/27/16 7:53 PM, 张胜举 wrote:
>
>
>> -----Original Message-----
>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>> Sent: Monday, November 28, 2016 10:39 AM
>> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
>> netdev@vger.kernel.org
>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>
>> On 11/27/16 7:34 PM, 张胜举 wrote:
>>>> -----Original Message-----
>>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>>>> Sent: Monday, November 28, 2016 10:10 AM
>>>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
>>>> netdev@vger.kernel.org
>>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>>
>>>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>>>> Loop index in neigh dump function is not updated correctly under
>>>>> some circumstances, this patch will fix it.
>>>>
>>>> What's an example?
>>>
>>> If dev is filtered out, the original code goes to next loop without
>>> updating loop index 'idx'.
>>
>> And you have a use case with missing or redundant data? Or is your
>> comment based on a review of code only?
> It's on my code review. No use case currently, this is uncommon to happen.
>
>
>>
>>>> You are completely rewriting the dump loops.
>>>
>>> I put 'idx++' into for loop, so I replace 'goto' with 'continue'.
>>> The other change is style related.
>>
>> A "fixes" should not include 'style related' changes.
> Okay, I will send another version without style changes.
>
Personally, I think you need to produce a use case that fails before sending another patch. I have not seen a problem with this code.
^ permalink raw reply
* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: John Fastabend @ 2016-11-28 2:57 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: roid, jiri, Daniel Borkmann
In-Reply-To: <1479952708-26763-1-git-send-email-xiyou.wangcong@gmail.com>
On 16-11-23 05:58 PM, Cong Wang wrote:
> Roi reported we could have a race condition where in ->classify() path
> we dereference tp->root and meanwhile a parallel ->destroy() makes it
> a NULL.
>
> This is possible because ->destroy() could be called when deleting
> a filter to check if we are the last one in tp, this tp is still
> linked and visible at that time.
>
> The root cause of this problem is the semantic of ->destroy(), it
> does two things (for non-force case):
>
> 1) check if tp is empty
> 2) if tp is empty we could really destroy it
>
> and its caller, if cares, needs to check its return value to see if
> it is really destroyed. Therefore we can't unlink tp unless we know
> it is empty.
>
> As suggested by Daniel, we could actually move the test logic to ->delete()
> so that we can safely unlink tp after ->delete() tells us the last one is
> just deleted and before ->destroy().
>
> What's more, even we unlink it before ->destroy(), it could still have
> readers since we don't wait for a grace period here, we should not modify
> tp->root in ->destroy() either.
>
> Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
> Reported-by: Roi Dayan <roid@mellanox.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
Hi Cong,
Thanks a lot for doing this. Can you rebase it on top of Daniel's patch
though,
[PATCH net] net, sched: respect rcu grace period on cls destruction
And then push the NULL pointer work for the cls_fw and cls_route
classifiers into another patch.
Then I believe the last thing to make this correct is to convert the
call_rcu() paths to call_rcu_bh().
.John
^ permalink raw reply
* Re: [net,v2] neigh: fix the loop index error in neigh dump
From: David Ahern @ 2016-11-28 3:09 UTC (permalink / raw)
To: 张胜举, netdev
In-Reply-To: <a01fcd54-c0ed-9d05-743a-27592d845c56@cumulusnetworks.com>
On 11/27/16 7:56 PM, David Ahern wrote:
> On 11/27/16 7:53 PM, 张胜举 wrote:
>>
>>
>>> -----Original Message-----
>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>>> Sent: Monday, November 28, 2016 10:39 AM
>>> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
>>> netdev@vger.kernel.org
>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>
>>> On 11/27/16 7:34 PM, 张胜举 wrote:
>>>>> -----Original Message-----
>>>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
>>>>> Sent: Monday, November 28, 2016 10:10 AM
>>>>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
>>>>> netdev@vger.kernel.org
>>>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>>>>>
>>>>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
>>>>>> Loop index in neigh dump function is not updated correctly under
>>>>>> some circumstances, this patch will fix it.
>>>>>
>>>>> What's an example?
>>>>
>>>> If dev is filtered out, the original code goes to next loop without
>>>> updating loop index 'idx'.
>>>
>>> And you have a use case with missing or redundant data? Or is your
>>> comment based on a review of code only?
>> It's on my code review. No use case currently, this is uncommon to happen.
>>
>>
>>>
>>>>> You are completely rewriting the dump loops.
>>>>
>>>> I put 'idx++' into for loop, so I replace 'goto' with 'continue'.
>>>> The other change is style related.
>>>
>>> A "fixes" should not include 'style related' changes.
>> Okay, I will send another version without style changes.
>>
>
> Personally, I think you need to produce a use case that fails before sending another patch. I have not seen a problem with this code.
>
And looking back at 3f0ae05d6f I should not have acked it (reviewed it too quickly while on PTO). Your change is a no-op because of what idx represents - the position in the hash list for devices relevant for the dump request. Same goes for the neigh dump so this patch is not needed.
^ permalink raw reply
* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: Michael S. Tsirkin @ 2016-11-28 3:36 UTC (permalink / raw)
To: John Fastabend
Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <5838ABF3.8060308@gmail.com>
On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>
> >>>> This adds XDP support to virtio_net. Some requirements must be
> >>>> met for XDP to be enabled depending on the mode. First it will
> >>>> only be supported with LRO disabled so that data is not pushed
> >>>> across multiple buffers. The MTU must be less than a page size
> >>>> to avoid having to handle XDP across multiple pages.
> >>>>
> >>>> If mergeable receive is enabled this first series only supports
> >>>> the case where header and data are in the same buf which we can
> >>>> check when a packet is received by looking at num_buf. If the
> >>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>> does not happen and both header and data is put in a single buffer
> >>>> as expected so we check this when XDP programs are loaded. Note I
> >>>> have only tested this with Linux vhost backend.
> >>>>
> >>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>> met then XDP is allowed.
> >>>>
> >>>> A follow on patch can be generated to solve the mergeable receive
> >>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>> be handled has easily.
> >>>
> >>>
> >>> I would very much prefer support for other layouts without drops
> >>> before merging this.
> >>> header by itself can certainly be handled by skipping it.
> >>> People wanted to use that e.g. for zero copy.
> >>
> >> OK fair enough I'll do this now rather than push it out.
> >>
>
> Hi Michael,
>
> The header skip logic however complicates the xmit handling a fair
> amount. Specifically when we release the buffers after xmit then
> both the hdr and data portions need to be released which requires
> some tracking.
I thought you disable all checksum offloads so why not discard the
header immediately?
> Is the header split logic actually in use somewhere today? It looks
> like its not being used in Linux case. And zero copy RX is currently as
> best I can tell not supported anywhere so I would prefer not to
> complicate the XDP path at the moment with a possible future feature.
Well it's part of the documented interface so we never
know who implemented it. Normally if we want to make
restrictions we would do the reverse and add a feature.
We can do this easily, but I'd like to first look into
just handling all possible inputs as the spec asks us to.
I'm a bit too busy with other stuff next week but will
look into this a week after that if you don't beat me to it.
> >>>
> >>> Anything else can be handled by copying the packet.
>
> Any idea how to test this? At the moment I have some code to linearize
> the data in all cases with more than a single buffer. But wasn't clear
> to me which features I could negotiate with vhost/qemu to get more than
> a single buffer in the receive path.
>
> Thanks,
> John
ATM you need to hack qemu. Here's a hack to make header completely
separate.
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..4866144 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
offset = n->host_hdr_len;
total += n->guest_hdr_len;
guest_offset = n->guest_hdr_len;
+ continue;
} else {
guest_offset = 0;
}
here's one that should cap the 1st s/g to 100 bytes:
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..7943004 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
offset = n->host_hdr_len;
total += n->guest_hdr_len;
guest_offset = n->guest_hdr_len;
+ sg.iov_len = MIN(sg.iov_len, 100);
} else {
guest_offset = 0;
}
^ permalink raw reply related
* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: John Fastabend @ 2016-11-28 3:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <20161128052211-mutt-send-email-mst@kernel.org>
On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
>> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
>>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
>>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>>>>>
>>>>>> This adds XDP support to virtio_net. Some requirements must be
>>>>>> met for XDP to be enabled depending on the mode. First it will
>>>>>> only be supported with LRO disabled so that data is not pushed
>>>>>> across multiple buffers. The MTU must be less than a page size
>>>>>> to avoid having to handle XDP across multiple pages.
>>>>>>
>>>>>> If mergeable receive is enabled this first series only supports
>>>>>> the case where header and data are in the same buf which we can
>>>>>> check when a packet is received by looking at num_buf. If the
>>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
>>>>>> is dropped and a warning is thrown. When any_header_sg is set this
>>>>>> does not happen and both header and data is put in a single buffer
>>>>>> as expected so we check this when XDP programs are loaded. Note I
>>>>>> have only tested this with Linux vhost backend.
>>>>>>
>>>>>> If big packets mode is enabled and MTU/LRO conditions above are
>>>>>> met then XDP is allowed.
>>>>>>
>>>>>> A follow on patch can be generated to solve the mergeable receive
>>>>>> case with num_bufs equal to 2. Buffers greater than two may not
>>>>>> be handled has easily.
>>>>>
>>>>>
>>>>> I would very much prefer support for other layouts without drops
>>>>> before merging this.
>>>>> header by itself can certainly be handled by skipping it.
>>>>> People wanted to use that e.g. for zero copy.
>>>>
>>>> OK fair enough I'll do this now rather than push it out.
>>>>
>>
>> Hi Michael,
>>
>> The header skip logic however complicates the xmit handling a fair
>> amount. Specifically when we release the buffers after xmit then
>> both the hdr and data portions need to be released which requires
>> some tracking.
>
> I thought you disable all checksum offloads so why not discard the
> header immediately?
Well in the "normal" case where the header is part of the same buffer
we keep it to use the same space for the header on the TX path.
If we discard it in the header split case we have to push the header
somewhere else. In the skb case the cb[] region is used it looks like.
In our case I guess free space at the end of the page could be used.
My thinking is if we handle the general case of more than one buffer
being used with a copy we can handle the case above using the same
logic and no need to handle it as a special case. It seems to be an odd
case that doesn't really exist anyways. At least not in qemu/Linux. I
have not tested anything else.
>
>> Is the header split logic actually in use somewhere today? It looks
>> like its not being used in Linux case. And zero copy RX is currently as
>> best I can tell not supported anywhere so I would prefer not to
>> complicate the XDP path at the moment with a possible future feature.
>
> Well it's part of the documented interface so we never
> know who implemented it. Normally if we want to make
> restrictions we would do the reverse and add a feature.
>
> We can do this easily, but I'd like to first look into
> just handling all possible inputs as the spec asks us to.
> I'm a bit too busy with other stuff next week but will
> look into this a week after that if you don't beat me to it.
>
Well I've almost got it working now with some logic to copy everything
into a single page if we hit this case so should be OK but slow. I'll
finish testing this and send it out hopefully in the next few days.
>>>>>
>>>>> Anything else can be handled by copying the packet.
>>
>> Any idea how to test this? At the moment I have some code to linearize
>> the data in all cases with more than a single buffer. But wasn't clear
>> to me which features I could negotiate with vhost/qemu to get more than
>> a single buffer in the receive path.
>>
>> Thanks,
>> John
>
> ATM you need to hack qemu. Here's a hack to make header completely
> separate.
>
Perfect! hacking qemu for testing is no problem this helps a lot thanks
and saves me time trying to figure out how to get qemu to do this.
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..4866144 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> offset = n->host_hdr_len;
> total += n->guest_hdr_len;
> guest_offset = n->guest_hdr_len;
> + continue;
> } else {
> guest_offset = 0;
> }
>
>
>
> here's one that should cap the 1st s/g to 100 bytes:
>
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..7943004 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> offset = n->host_hdr_len;
> total += n->guest_hdr_len;
> guest_offset = n->guest_hdr_len;
> + sg.iov_len = MIN(sg.iov_len, 100);
> } else {
> guest_offset = 0;
> }
>
^ permalink raw reply
* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
From: Michael S. Tsirkin @ 2016-11-28 4:07 UTC (permalink / raw)
To: John Fastabend
Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <583BAAD9.4080408@gmail.com>
On Sun, Nov 27, 2016 at 07:56:09PM -0800, John Fastabend wrote:
> On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>>>
> >>>>>> This adds XDP support to virtio_net. Some requirements must be
> >>>>>> met for XDP to be enabled depending on the mode. First it will
> >>>>>> only be supported with LRO disabled so that data is not pushed
> >>>>>> across multiple buffers. The MTU must be less than a page size
> >>>>>> to avoid having to handle XDP across multiple pages.
> >>>>>>
> >>>>>> If mergeable receive is enabled this first series only supports
> >>>>>> the case where header and data are in the same buf which we can
> >>>>>> check when a packet is received by looking at num_buf. If the
> >>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>>>> does not happen and both header and data is put in a single buffer
> >>>>>> as expected so we check this when XDP programs are loaded. Note I
> >>>>>> have only tested this with Linux vhost backend.
> >>>>>>
> >>>>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>>>> met then XDP is allowed.
> >>>>>>
> >>>>>> A follow on patch can be generated to solve the mergeable receive
> >>>>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>>>> be handled has easily.
> >>>>>
> >>>>>
> >>>>> I would very much prefer support for other layouts without drops
> >>>>> before merging this.
> >>>>> header by itself can certainly be handled by skipping it.
> >>>>> People wanted to use that e.g. for zero copy.
> >>>>
> >>>> OK fair enough I'll do this now rather than push it out.
> >>>>
> >>
> >> Hi Michael,
> >>
> >> The header skip logic however complicates the xmit handling a fair
> >> amount. Specifically when we release the buffers after xmit then
> >> both the hdr and data portions need to be released which requires
> >> some tracking.
> >
> > I thought you disable all checksum offloads so why not discard the
> > header immediately?
>
> Well in the "normal" case where the header is part of the same buffer
> we keep it to use the same space for the header on the TX path.
>
> If we discard it in the header split case we have to push the header
> somewhere else. In the skb case the cb[] region is used it looks like.
> In our case I guess free space at the end of the page could be used.
You don't have to put start of page in a buffer, you
can put an offset there. Will result in some waste in the
common case, but it's just several bytes so likely not a big deal.
> My thinking is if we handle the general case of more than one buffer
> being used with a copy we can handle the case above using the same
> logic and no need to handle it as a special case. It seems to be an odd
> case that doesn't really exist anyways. At least not in qemu/Linux. I
> have not tested anything else.
OK
> >
> >> Is the header split logic actually in use somewhere today? It looks
> >> like its not being used in Linux case. And zero copy RX is currently as
> >> best I can tell not supported anywhere so I would prefer not to
> >> complicate the XDP path at the moment with a possible future feature.
> >
> > Well it's part of the documented interface so we never
> > know who implemented it. Normally if we want to make
> > restrictions we would do the reverse and add a feature.
> >
> > We can do this easily, but I'd like to first look into
> > just handling all possible inputs as the spec asks us to.
> > I'm a bit too busy with other stuff next week but will
> > look into this a week after that if you don't beat me to it.
> >
>
> Well I've almost got it working now with some logic to copy everything
> into a single page if we hit this case so should be OK but slow. I'll
> finish testing this and send it out hopefully in the next few days.
>
> >>>>>
> >>>>> Anything else can be handled by copying the packet.
> >>
> >> Any idea how to test this? At the moment I have some code to linearize
> >> the data in all cases with more than a single buffer. But wasn't clear
> >> to me which features I could negotiate with vhost/qemu to get more than
> >> a single buffer in the receive path.
> >>
> >> Thanks,
> >> John
> >
> > ATM you need to hack qemu. Here's a hack to make header completely
> > separate.
> >
>
> Perfect! hacking qemu for testing is no problem this helps a lot thanks
> and saves me time trying to figure out how to get qemu to do this.
Pls note I didn't try this at all, so might not work, but should
give you the idea.
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..4866144 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> > offset = n->host_hdr_len;
> > total += n->guest_hdr_len;
> > guest_offset = n->guest_hdr_len;
> > + continue;
> > } else {
> > guest_offset = 0;
> > }
> >
> >
> >
> > here's one that should cap the 1st s/g to 100 bytes:
> >
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..7943004 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> > offset = n->host_hdr_len;
> > total += n->guest_hdr_len;
> > guest_offset = n->guest_hdr_len;
> > + sg.iov_len = MIN(sg.iov_len, 100);
> > } else {
> > guest_offset = 0;
> > }
> >
^ permalink raw reply
* Re: [RFC net-next 2/3] net: dsa: Propagate VLAN add/del to CPU port(s)
From: Florian Fainelli @ 2016-11-28 4:30 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: davem, bridge, stephen, andrew, jiri, idosch
In-Reply-To: <87eg23zcf5.fsf@ketchup.i-did-not-set--mail-host-address--so-tickle-me>
On 11/22/2016 08:50 AM, Vivien Didelot wrote:
> Hi Florian,
>
> Open question: will we need to do the same for FDB and MDB objects?
(overlooked that question early this week), I do expect that this could
be helpful for FDB and MBD objects as well, yes.
>
> Florian Fainelli <f.fainelli@gmail.com> writes:
>
>> Now that the bridge layer can call into switchdev to signal programming
>> requests targeting the bridge master device itself, allow the switch
>> drivers to implement separate programming of downstream and
>> upstream/management ports.
>>
>> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> net/dsa/slave.c | 45 +++++++++++++++++++++++++++++++++------------
>> 1 file changed, 33 insertions(+), 12 deletions(-)
>>
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index d0c7bce88743..18288261b964 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -223,35 +223,30 @@ static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
>> return 0;
>> }
>>
>> -static int dsa_slave_port_vlan_add(struct net_device *dev,
>> +static int dsa_slave_port_vlan_add(struct dsa_switch *ds, int port,
>> const struct switchdev_obj_port_vlan *vlan,
>> struct switchdev_trans *trans)
>> {
>> - struct dsa_slave_priv *p = netdev_priv(dev);
>> - struct dsa_switch *ds = p->parent;
>>
>
> Extra newline ^.
>
>> if (switchdev_trans_ph_prepare(trans)) {
>> if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
>> return -EOPNOTSUPP;
>>
>> - return ds->ops->port_vlan_prepare(ds, p->port, vlan, trans);
>> + return ds->ops->port_vlan_prepare(ds, port, vlan, trans);
>> }
>>
>> - ds->ops->port_vlan_add(ds, p->port, vlan, trans);
>> + ds->ops->port_vlan_add(ds, port, vlan, trans);
>>
>> return 0;
>> }
>>
>> -static int dsa_slave_port_vlan_del(struct net_device *dev,
>> +static int dsa_slave_port_vlan_del(struct dsa_switch *ds, int port,
>> const struct switchdev_obj_port_vlan *vlan)
>> {
>> - struct dsa_slave_priv *p = netdev_priv(dev);
>> - struct dsa_switch *ds = p->parent;
>> -
>> if (!ds->ops->port_vlan_del)
>> return -EOPNOTSUPP;
>>
>> - return ds->ops->port_vlan_del(ds, p->port, vlan);
>> + return ds->ops->port_vlan_del(ds, port, vlan);
>> }
>>
>> static int dsa_slave_port_vlan_dump(struct net_device *dev,
>> @@ -465,8 +460,21 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>> const struct switchdev_obj *obj,
>> struct switchdev_trans *trans)
>> {
>> + struct dsa_slave_priv *p = netdev_priv(dev);
>> + struct dsa_switch *ds = p->parent;
>> + int port = p->port;
>> int err;
>>
>> + /* Here we may be called with an orig_dev which is different from dev,
>> + * on purpose, to receive request coming from e.g the bridge master
>> + * device. Although there are no network device associated with CPU/DSA
>> + * ports, we may still have programming operation for these ports.
>> + */
>> + if (obj->orig_dev == p->bridge_dev) {
>> + ds = ds->dst->ds[0];
>> + port = ds->dst->cpu_port;
>> + }
>> +
>> /* For the prepare phase, ensure the full set of changes is feasable in
>> * one go in order to signal a failure properly. If an operation is not
>> * supported, return -EOPNOTSUPP.
>> @@ -483,7 +491,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>> trans);
>> break;
>> case SWITCHDEV_OBJ_ID_PORT_VLAN:
>> - err = dsa_slave_port_vlan_add(dev,
>> + err = dsa_slave_port_vlan_add(ds, port,
>> SWITCHDEV_OBJ_PORT_VLAN(obj),
>> trans);
>
> Note that dsa_slave_port_vlan_add() will be called N times, N being the
> number of bridge ports. This is not an issue for the moment though.
> Programming it only once requires caching, so leave it for an eventual
> future patch.
>
> When issuing the following command (lan0 being a member of br0):
>
> # bridge vlan add vid 42 dev lan0
>
> the CPU port is also programmed as tagged in VLAN 42. Is that expected?
The first time the VLAN id is programmed to either lan0 or br0, and it
did not exist prior to that call, it also gets populated into the bridge
VLAN database, which is why both the lan0 interface and the CPU port get
programmed.
--
Florian
^ permalink raw reply
* [PATCH net-next] bpf: samples: Fix compile of test_lru_dist.c
From: David Ahern @ 2016-11-28 4:32 UTC (permalink / raw)
To: netdev; +Cc: David Ahern, Martin KaFai Lau
Build of samples/bpf on debian/jessie fails with:
HOSTCC /home/dsa/kernel-3.git/samples/bpf/test_lru_dist.o
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c: In function ‘main’:
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: error: variable ‘r’ has initializer but incomplete type
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: error: ‘RLIM_INFINITY’ undeclared (first use in this function)
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:21: note: each undeclared identifier is reported only once for each function it appears in
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess elements in struct initializer
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
^
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near initialization for ‘r’)
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: excess elements in struct initializer
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:9: warning: (near initialization for ‘r’)
/home/dsa/kernel-3.git/samples/bpf/test_lru_dist.c:490:16: error: storage size of ‘r’ isn’t known
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
Add sys/resource.h to the include list
Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Cc: Martin KaFai Lau <kafai@fb.com>
---
samples/bpf/test_lru_dist.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 2859977b7f37..bc4a2142eb91 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -16,6 +16,7 @@
#include <sched.h>
#include <sys/wait.h>
#include <sys/stat.h>
+#include <sys/resource.h>
#include <fcntl.h>
#include <stdlib.h>
#include <time.h>
--
2.1.4
^ permalink raw reply related
* RE: [net,v2] neigh: fix the loop index error in neigh dump
From: 张胜举 @ 2016-11-28 4:50 UTC (permalink / raw)
To: 'David Ahern', netdev
In-Reply-To: <6859d40b-0049-513a-6dc4-162d383ef7b9@cumulusnetworks.com>
> -----Original Message-----
> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> Sent: Monday, November 28, 2016 11:10 AM
> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
> netdev@vger.kernel.org
> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
>
> On 11/27/16 7:56 PM, David Ahern wrote:
> > On 11/27/16 7:53 PM, 张胜举 wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> >>> Sent: Monday, November 28, 2016 10:39 AM
> >>> To: 张胜举 <zhangshengju@cmss.chinamobile.com>;
> >>> netdev@vger.kernel.org
> >>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh dump
> >>>
> >>> On 11/27/16 7:34 PM, 张胜举 wrote:
> >>>>> -----Original Message-----
> >>>>> From: David Ahern [mailto:dsa@cumulusnetworks.com]
> >>>>> Sent: Monday, November 28, 2016 10:10 AM
> >>>>> To: Zhang Shengju <zhangshengju@cmss.chinamobile.com>;
> >>>>> netdev@vger.kernel.org
> >>>>> Subject: Re: [net,v2] neigh: fix the loop index error in neigh
> >>>>> dump
> >>>>>
> >>>>> On 11/27/16 6:32 PM, Zhang Shengju wrote:
> >>>>>> Loop index in neigh dump function is not updated correctly under
> >>>>>> some circumstances, this patch will fix it.
> >>>>>
> >>>>> What's an example?
> >>>>
> >>>> If dev is filtered out, the original code goes to next loop without
> >>>> updating loop index 'idx'.
> >>>
> >>> And you have a use case with missing or redundant data? Or is your
> >>> comment based on a review of code only?
> >> It's on my code review. No use case currently, this is uncommon to
> happen.
> >>
> >>
> >>>
> >>>>> You are completely rewriting the dump loops.
> >>>>
> >>>> I put 'idx++' into for loop, so I replace 'goto' with 'continue'.
> >>>> The other change is style related.
> >>>
> >>> A "fixes" should not include 'style related' changes.
> >> Okay, I will send another version without style changes.
> >>
> >
> > Personally, I think you need to produce a use case that fails before
sending
> another patch. I have not seen a problem with this code.
> >
>
> And looking back at 3f0ae05d6f I should not have acked it (reviewed it too
> quickly while on PTO). Your change is a no-op because of what idx
represents
> - the position in the hash list for devices relevant for the dump request.
> Same goes for the neigh dump so this patch is not needed.
>
No, when dump request must be processed by multiple 'recv/recvmsg' system
calls,
idx stores which dev/neigh the previous call have processed, so that next
call will scan
from the right place.
So no matter whether the dev/neigh is filtered, the idx should be increased
anyway.
It's hard to produce a use case, because we mostly have only one entity in
hash list. Even with
multiple entities, we also need the function to exit right at the place
where dev/neigh is filter out.
All other dump functiones for RT netlink keep this logic, you can refer
inet_dump_ifaddr() if you wish.
^ permalink raw reply
* [PATCH net-next 0/9] liquidio VF operations
From: Raghu Vatsavayi @ 2016-11-28 4:51 UTC (permalink / raw)
To: davem; +Cc: netdev, Raghu Vatsavayi
Hi Dave,
Following patches add support for VF device specific operations
like mailbox, queues and register access. Please apply the
patches in following order as these patches depend on each other.
Thanks
Raghu Vatsavayi (9):
liquidio CN23XX: VF register definitions
liquidio CN23XX: VF registration
liquidio CN23XX: VF config setup
liquidio CN23XX: VF queue setup
liquidio CN23XX: VF register access
liquidio CN23XX: init VF softcommand queues
liquidio CN23XX: VF mailbox
liquidio CN23XX: VF interrupt
liquidio CN23XX: VF init and destroy
drivers/net/ethernet/cavium/Kconfig | 12 +
drivers/net/ethernet/cavium/liquidio/Makefile | 22 +
.../ethernet/cavium/liquidio/cn23xx_vf_device.c | 701 +++++++++++++++++++++
.../ethernet/cavium/liquidio/cn23xx_vf_device.h | 48 ++
.../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h | 274 ++++++++
drivers/net/ethernet/cavium/liquidio/lio_core.c | 7 -
drivers/net/ethernet/cavium/liquidio/lio_main.c | 6 +-
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 614 ++++++++++++++++++
.../net/ethernet/cavium/liquidio/octeon_device.c | 58 +-
.../net/ethernet/cavium/liquidio/octeon_device.h | 9 +-
.../net/ethernet/cavium/liquidio/request_manager.c | 11 +-
11 files changed, 1751 insertions(+), 11 deletions(-)
create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next 1/9] liquidio CN23XX: VF register definitions
From: Raghu Vatsavayi @ 2016-11-28 4:51 UTC (permalink / raw)
To: davem
Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
Satanand Burla, Felix Manlunas
In-Reply-To: <1480308702-6261-1-git-send-email-rvatsavayi@caviumnetworks.com>
Adds support for CN23xx VF registers.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
.../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h | 274 +++++++++++++++++++++
1 file changed, 274 insertions(+)
create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
new file mode 100644
index 0000000..d33dd8f
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
@@ -0,0 +1,274 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ * Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT. See the GNU General Public License for more details.
+ ***********************************************************************/
+/*! \file cn23xx_vf_regs.h
+ * \brief Host Driver: Register Address and Register Mask values for
+ * Octeon CN23XX vf functions.
+ */
+
+#ifndef __CN23XX_VF_REGS_H__
+#define __CN23XX_VF_REGS_H__
+
+#define CN23XX_CONFIG_XPANSION_BAR 0x38
+
+#define CN23XX_CONFIG_PCIE_CAP 0x70
+#define CN23XX_CONFIG_PCIE_DEVCAP 0x74
+#define CN23XX_CONFIG_PCIE_DEVCTL 0x78
+#define CN23XX_CONFIG_PCIE_LINKCAP 0x7C
+#define CN23XX_CONFIG_PCIE_LINKCTL 0x80
+#define CN23XX_CONFIG_PCIE_SLOTCAP 0x84
+#define CN23XX_CONFIG_PCIE_SLOTCTL 0x88
+
+#define CN23XX_CONFIG_PCIE_FLTMSK 0x720
+
+/* The input jabber is used to determine the TSO max size.
+ * Due to H/W limitation, this need to be reduced to 60000
+ * in order to to H/W TSO and avoid the WQE malfarmation
+ * PKO_BUG_24989_WQE_LEN
+ */
+#define CN23XX_DEFAULT_INPUT_JABBER 0xEA60 /*60000*/
+
+/* ############## BAR0 Registers ################ */
+
+/* Each Input Queue register is at a 16-byte Offset in BAR0 */
+#define CN23XX_VF_IQ_OFFSET 0x20000
+
+/*###################### REQUEST QUEUE #########################*/
+
+/* 64 registers for Input Queue Instr Count - SLI_PKT_IN_DONE0_CNTS */
+#define CN23XX_VF_SLI_IQ_INSTR_COUNT_START64 0x10040
+
+/* 64 registers for Input Queues Start Addr - SLI_PKT0_INSTR_BADDR */
+#define CN23XX_VF_SLI_IQ_BASE_ADDR_START64 0x10010
+
+/* 64 registers for Input Doorbell - SLI_PKT0_INSTR_BAOFF_DBELL */
+#define CN23XX_VF_SLI_IQ_DOORBELL_START 0x10020
+
+/* 64 registers for Input Queue size - SLI_PKT0_INSTR_FIFO_RSIZE */
+#define CN23XX_VF_SLI_IQ_SIZE_START 0x10030
+
+/* 64 registers (64-bit) - ES, RO, NS, Arbitration for Input Queue Data &
+ * gather list fetches. SLI_PKT(0..63)_INPUT_CONTROL.
+ */
+#define CN23XX_VF_SLI_IQ_PKT_CONTROL_START64 0x10000
+
+/*------- Request Queue Macros ---------*/
+#define CN23XX_VF_SLI_IQ_PKT_CONTROL64(iq) \
+ (CN23XX_VF_SLI_IQ_PKT_CONTROL_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_BASE_ADDR64(iq) \
+ (CN23XX_VF_SLI_IQ_BASE_ADDR_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_SIZE(iq) \
+ (CN23XX_VF_SLI_IQ_SIZE_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_DOORBELL(iq) \
+ (CN23XX_VF_SLI_IQ_DOORBELL_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_INSTR_COUNT64(iq) \
+ (CN23XX_VF_SLI_IQ_INSTR_COUNT_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+/*------------------ Masks ----------------*/
+#define CN23XX_PKT_INPUT_CTL_VF_NUM BIT_ULL(32)
+#define CN23XX_PKT_INPUT_CTL_MAC_NUM BIT(29)
+/* Number of instructions to be read in one MAC read request.
+ * setting to Max value(4)
+ */
+#define CN23XX_PKT_INPUT_CTL_RDSIZE (3 << 25)
+#define CN23XX_PKT_INPUT_CTL_IS_64B BIT(24)
+#define CN23XX_PKT_INPUT_CTL_RST BIT(23)
+#define CN23XX_PKT_INPUT_CTL_QUIET BIT(28)
+#define CN23XX_PKT_INPUT_CTL_RING_ENB BIT(22)
+#define CN23XX_PKT_INPUT_CTL_DATA_NS BIT(8)
+#define CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP BIT(6)
+#define CN23XX_PKT_INPUT_CTL_DATA_RO BIT(5)
+#define CN23XX_PKT_INPUT_CTL_USE_CSR BIT(4)
+#define CN23XX_PKT_INPUT_CTL_GATHER_NS BIT(3)
+#define CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP (2)
+#define CN23XX_PKT_INPUT_CTL_GATHER_RO (1)
+
+/** Rings per Virtual Function [RO] **/
+#define CN23XX_PKT_INPUT_CTL_RPVF_MASK (0x3F)
+#define CN23XX_PKT_INPUT_CTL_RPVF_POS (48)
+/* These bits[47:44][RO] give the Physical function number info within the MAC*/
+#define CN23XX_PKT_INPUT_CTL_PF_NUM_MASK (0x7)
+#define CN23XX_PKT_INPUT_CTL_PF_NUM_POS (45)
+/** These bits[43:32][RO] give the virtual function number info within the PF*/
+#define CN23XX_PKT_INPUT_CTL_VF_NUM_MASK (0x1FFF)
+#define CN23XX_PKT_INPUT_CTL_VF_NUM_POS (32)
+#define CN23XX_PKT_INPUT_CTL_MAC_NUM_MASK (0x3)
+#define CN23XX_PKT_INPUT_CTL_MAC_NUM_POS (29)
+#define CN23XX_PKT_IN_DONE_WMARK_MASK (0xFFFFULL)
+#define CN23XX_PKT_IN_DONE_WMARK_BIT_POS (32)
+#define CN23XX_PKT_IN_DONE_CNT_MASK (0x00000000FFFFFFFFULL)
+
+#ifdef __LITTLE_ENDIAN_BITFIELD
+#define CN23XX_PKT_INPUT_CTL_MASK \
+ (CN23XX_PKT_INPUT_CTL_RDSIZE \
+ | CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP \
+ | CN23XX_PKT_INPUT_CTL_USE_CSR)
+#else
+#define CN23XX_PKT_INPUT_CTL_MASK \
+ (CN23XX_PKT_INPUT_CTL_RDSIZE \
+ | CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP \
+ | CN23XX_PKT_INPUT_CTL_USE_CSR \
+ | CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP)
+#endif
+
+/** Masks for SLI_PKT_IN_DONE(0..63)_CNTS Register */
+#define CN23XX_IN_DONE_CNTS_PI_INT BIT_ULL(62)
+#define CN23XX_IN_DONE_CNTS_CINT_ENB BIT_ULL(48)
+
+/*############################ OUTPUT QUEUE #########################*/
+
+/* 64 registers for Output queue control - SLI_PKT(0..63)_OUTPUT_CONTROL */
+#define CN23XX_VF_SLI_OQ_PKT_CONTROL_START 0x10050
+
+/* 64 registers for Output queue buffer and info size - SLI_PKT0_OUT_SIZE */
+#define CN23XX_VF_SLI_OQ0_BUFF_INFO_SIZE 0x10060
+
+/* 64 registers for Output Queue Start Addr - SLI_PKT0_SLIST_BADDR */
+#define CN23XX_VF_SLI_OQ_BASE_ADDR_START64 0x10070
+
+/* 64 registers for Output Queue Packet Credits - SLI_PKT0_SLIST_BAOFF_DBELL */
+#define CN23XX_VF_SLI_OQ_PKT_CREDITS_START 0x10080
+
+/* 64 registers for Output Queue size - SLI_PKT0_SLIST_FIFO_RSIZE */
+#define CN23XX_VF_SLI_OQ_SIZE_START 0x10090
+
+/* 64 registers for Output Queue Packet Count - SLI_PKT0_CNTS */
+#define CN23XX_VF_SLI_OQ_PKT_SENT_START 0x100B0
+
+/* 64 registers for Output Queue INT Levels - SLI_PKT0_INT_LEVELS */
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 0x100A0
+
+/* Each Output Queue register is at a 16-byte Offset in BAR0 */
+#define CN23XX_VF_OQ_OFFSET 0x20000
+
+/*------- Output Queue Macros ---------*/
+
+#define CN23XX_VF_SLI_OQ_PKT_CONTROL(oq) \
+ (CN23XX_VF_SLI_OQ_PKT_CONTROL_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_BASE_ADDR64(oq) \
+ (CN23XX_VF_SLI_OQ_BASE_ADDR_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_SIZE(oq) \
+ (CN23XX_VF_SLI_OQ_SIZE_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_BUFF_INFO_SIZE(oq) \
+ (CN23XX_VF_SLI_OQ0_BUFF_INFO_SIZE + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKTS_SENT(oq) \
+ (CN23XX_VF_SLI_OQ_PKT_SENT_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq) \
+ (CN23XX_VF_SLI_OQ_PKT_CREDITS_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(oq) \
+ (CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+/* Macro's for accessing CNT and TIME separately from INT_LEVELS */
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_CNT(oq) \
+ (CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_TIME(oq) \
+ (CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + \
+ ((oq) * CN23XX_VF_OQ_OFFSET) + 4)
+
+/*------------------ Masks ----------------*/
+#define CN23XX_PKT_OUTPUT_CTL_TENB BIT(13)
+#define CN23XX_PKT_OUTPUT_CTL_CENB BIT(12)
+#define CN23XX_PKT_OUTPUT_CTL_IPTR BIT(11)
+#define CN23XX_PKT_OUTPUT_CTL_ES BIT(9)
+#define CN23XX_PKT_OUTPUT_CTL_NSR BIT(8)
+#define CN23XX_PKT_OUTPUT_CTL_ROR BIT(7)
+#define CN23XX_PKT_OUTPUT_CTL_DPTR BIT(6)
+#define CN23XX_PKT_OUTPUT_CTL_BMODE BIT(5)
+#define CN23XX_PKT_OUTPUT_CTL_ES_P BIT(3)
+#define CN23XX_PKT_OUTPUT_CTL_NSR_P BIT(2)
+#define CN23XX_PKT_OUTPUT_CTL_ROR_P BIT(1)
+#define CN23XX_PKT_OUTPUT_CTL_RING_ENB BIT(0)
+
+/*######################### Mailbox Reg Macros ########################*/
+#define CN23XX_VF_SLI_PKT_MBOX_INT_START 0x10210
+#define CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START 0x10200
+
+#define CN23XX_SLI_MBOX_OFFSET 0x20000
+#define CN23XX_SLI_MBOX_SIG_IDX_OFFSET 0x8
+
+#define CN23XX_VF_SLI_PKT_MBOX_INT(q) \
+ (CN23XX_VF_SLI_PKT_MBOX_INT_START + ((q) * CN23XX_SLI_MBOX_OFFSET))
+
+#define CN23XX_SLI_PKT_PF_VF_MBOX_SIG(q, idx) \
+ (CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START + \
+ ((q) * CN23XX_SLI_MBOX_OFFSET + \
+ (idx) * CN23XX_SLI_MBOX_SIG_IDX_OFFSET))
+
+/*######################## INTERRUPTS #########################*/
+
+#define CN23XX_VF_SLI_INT_SUM_START 0x100D0
+
+#define CN23XX_VF_SLI_INT_SUM(q) \
+ (CN23XX_VF_SLI_INT_SUM_START + ((q) * CN23XX_VF_IQ_OFFSET))
+
+/*------------------ Interrupt Masks ----------------*/
+
+#define CN23XX_INTR_PO_INT BIT_ULL(63)
+#define CN23XX_INTR_PI_INT BIT_ULL(62)
+#define CN23XX_INTR_MBOX_INT BIT_ULL(61)
+#define CN23XX_INTR_RESEND BIT_ULL(60)
+
+#define CN23XX_INTR_CINT_ENB BIT_ULL(48)
+#define CN23XX_INTR_MBOX_ENB BIT(0)
+
+/*############################ MIO #########################*/
+#define CN23XX_MIO_PTP_CLOCK_CFG 0x0001070000000f00ULL
+#define CN23XX_MIO_PTP_CLOCK_LO 0x0001070000000f08ULL
+#define CN23XX_MIO_PTP_CLOCK_HI 0x0001070000000f10ULL
+#define CN23XX_MIO_PTP_CLOCK_COMP 0x0001070000000f18ULL
+#define CN23XX_MIO_PTP_TIMESTAMP 0x0001070000000f20ULL
+#define CN23XX_MIO_PTP_EVT_CNT 0x0001070000000f28ULL
+#define CN23XX_MIO_PTP_CKOUT_THRESH_LO 0x0001070000000f30ULL
+#define CN23XX_MIO_PTP_CKOUT_THRESH_HI 0x0001070000000f38ULL
+#define CN23XX_MIO_PTP_CKOUT_HI_INCR 0x0001070000000f40ULL
+#define CN23XX_MIO_PTP_CKOUT_LO_INCR 0x0001070000000f48ULL
+#define CN23XX_MIO_PTP_PPS_THRESH_LO 0x0001070000000f50ULL
+#define CN23XX_MIO_PTP_PPS_THRESH_HI 0x0001070000000f58ULL
+#define CN23XX_MIO_PTP_PPS_HI_INCR 0x0001070000000f60ULL
+#define CN23XX_MIO_PTP_PPS_LO_INCR 0x0001070000000f68ULL
+
+/*############################ RST #########################*/
+#define CN23XX_RST_BOOT 0x0001180006001600ULL
+
+/*######################## MSIX TABLE #########################*/
+
+#define CN23XX_MSIX_TABLE_ADDR_START 0x0
+#define CN23XX_MSIX_TABLE_DATA_START 0x8
+
+#define CN23XX_MSIX_TABLE_SIZE 0x10
+#define CN23XX_MSIX_TABLE_ENTRIES 0x41
+
+#define CN23XX_MSIX_ENTRY_VECTOR_CTL BIT_ULL(32)
+
+#define CN23XX_MSIX_TABLE_ADDR(idx) \
+ (CN23XX_MSIX_TABLE_ADDR_START + ((idx) * CN23XX_MSIX_TABLE_SIZE))
+
+#define CN23XX_MSIX_TABLE_DATA(idx) \
+ (CN23XX_MSIX_TABLE_DATA_START + ((idx) * CN23XX_MSIX_TABLE_SIZE))
+
+#endif
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 2/9] liquidio CN23XX: VF registration
From: Raghu Vatsavayi @ 2016-11-28 4:51 UTC (permalink / raw)
To: davem
Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
Satanand Burla, Felix Manlunas
In-Reply-To: <1480308702-6261-1-git-send-email-rvatsavayi@caviumnetworks.com>
Adds support for cn23xx VF probe and registration.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
drivers/net/ethernet/cavium/Kconfig | 12 +++
drivers/net/ethernet/cavium/liquidio/Makefile | 21 ++++
.../ethernet/cavium/liquidio/cn23xx_vf_device.h | 34 ++++++
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 120 +++++++++++++++++++++
.../net/ethernet/cavium/liquidio/octeon_device.c | 4 +
5 files changed, 191 insertions(+)
create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
index 92f411c..c0679c2 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -74,4 +74,16 @@ config OCTEON_MGMT_ETHERNET
port on Cavium Networks' Octeon CN57XX, CN56XX, CN55XX,
CN54XX, CN52XX, and CN6XXX chips.
+config LIQUIDIO_VF
+ tristate "Cavium LiquidIO VF support"
+ depends on 64BIT && PCI_MSI
+ select PTP_1588_CLOCK
+ ---help---
+ This driver supports Cavium LiquidIO Intelligent Server Adapter
+ based on CN23XX chips.
+
+ To compile this driver as a module, choose M here: The module
+ will be called liquidio_vf. MSI-X interrupt support is required
+ for this driver to work correctly
+
endif # NET_VENDOR_CAVIUM
diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile
index 14958de..69d23fc 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -17,3 +17,24 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
octeon_nic.o
liquidio-objs := lio_main.o octeon_console.o $(liquidio-y)
+
+obj-$(CONFIG_LIQUIDIO_VF) += liquidio_vf.o
+
+ifeq ($(CONFIG_LIQUIDIO)$(CONFIG_LIQUIDIO_VF), yy)
+ liquidio_vf-objs := lio_vf_main.o
+else
+liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
+ lio_core.o \
+ request_manager.o \
+ response_manager.o \
+ octeon_device.o \
+ cn66xx_device.o \
+ cn68xx_device.o \
+ cn23xx_pf_device.o \
+ octeon_mailbox.o \
+ octeon_mem_ops.o \
+ octeon_droq.o \
+ octeon_nic.o
+
+liquidio_vf-objs := lio_vf_main.o $(liquidio_vf-y)
+endif
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
new file mode 100644
index 0000000..015b6d4
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -0,0 +1,34 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ * Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT. See the GNU General Public License for more details.
+ ***********************************************************************/
+/*! \file cn23xx_device.h
+ * \brief Host Driver: Routines that perform CN23XX specific operations.
+ */
+
+#ifndef __CN23XX_VF_DEVICE_H__
+#define __CN23XX_VF_DEVICE_H__
+
+#include "cn23xx_vf_regs.h"
+
+/* Register address and configuration for a CN23XX devices.
+ * If device specific changes need to be made then add a struct to include
+ * device specific fields as shown in the commented section
+ */
+struct octeon_cn23xx_vf {
+ struct octeon_config *conf;
+};
+#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
new file mode 100644
index 0000000..d1b1a24
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -0,0 +1,120 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ * Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT. See the GNU General Public License for more details.
+ ***********************************************************************/
+#include <linux/pci.h>
+#include <net/vxlan.h>
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+
+MODULE_AUTHOR("Cavium Networks, <support@cavium.com>");
+MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual Function Driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(LIQUIDIO_VERSION);
+
+struct octeon_device_priv {
+ /* Tasklet structures for this device. */
+ struct tasklet_struct droq_tasklet;
+ unsigned long napi_mask;
+};
+
+static int
+liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
+static void liquidio_vf_remove(struct pci_dev *pdev);
+
+static const struct pci_device_id liquidio_vf_pci_tbl[] = {
+ {
+ PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID,
+ PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0
+ },
+ {
+ 0, 0, 0, 0, 0, 0, 0
+ }
+};
+MODULE_DEVICE_TABLE(pci, liquidio_vf_pci_tbl);
+
+static struct pci_driver liquidio_vf_pci_driver = {
+ .name = "LiquidIO_VF",
+ .id_table = liquidio_vf_pci_tbl,
+ .probe = liquidio_vf_probe,
+ .remove = liquidio_vf_remove,
+};
+
+/**
+ * \brief PCI probe handler
+ * @param pdev PCI device structure
+ * @param ent unused
+ */
+static int
+liquidio_vf_probe(struct pci_dev *pdev,
+ const struct pci_device_id *ent __attribute__((unused)))
+{
+ struct octeon_device *oct_dev = NULL;
+
+ oct_dev = octeon_allocate_device(pdev->device,
+ sizeof(struct octeon_device_priv));
+
+ if (!oct_dev) {
+ dev_err(&pdev->dev, "Unable to allocate device\n");
+ return -ENOMEM;
+ }
+
+ dev_info(&pdev->dev, "Initializing device %x:%x.\n",
+ (u32)pdev->vendor, (u32)pdev->device);
+
+ /* Assign octeon_device for this device to the private data area. */
+ pci_set_drvdata(pdev, oct_dev);
+
+ /* set linux specific device pointer */
+ oct_dev->pci_dev = (void *)pdev;
+
+ return 0;
+}
+
+/**
+ * \brief Cleans up resources at unload time
+ * @param pdev PCI device structure
+ */
+static void liquidio_vf_remove(struct pci_dev *pdev)
+{
+ struct octeon_device *oct_dev = pci_get_drvdata(pdev);
+
+ dev_dbg(&oct_dev->pci_dev->dev, "Stopping device\n");
+
+ /* This octeon device has been removed. Update the global
+ * data structure to reflect this. Free the device structure.
+ */
+ octeon_free_device_mem(oct_dev);
+}
+
+static int __init liquidio_vf_init(void)
+{
+ octeon_init_device_list(0);
+ return pci_register_driver(&liquidio_vf_pci_driver);
+}
+
+static void __exit liquidio_vf_exit(void)
+{
+ pci_unregister_driver(&liquidio_vf_pci_driver);
+
+ pr_info("LiquidIO_VF network module is now unloaded\n");
+}
+
+module_init(liquidio_vf_init);
+module_exit(liquidio_vf_exit);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 79c8875..05bb0fd 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -28,6 +28,7 @@
#include "cn66xx_regs.h"
#include "cn66xx_device.h"
#include "cn23xx_pf_device.h"
+#include "cn23xx_vf_device.h"
/** Default configuration
* for CN66XX OCTEON Models.
@@ -672,6 +673,9 @@ static struct octeon_device *octeon_allocate_device_mem(u32 pci_id,
case OCTEON_CN23XX_PF_VID:
configsize = sizeof(struct octeon_cn23xx_pf);
break;
+ case OCTEON_CN23XX_VF_VID:
+ configsize = sizeof(struct octeon_cn23xx_vf);
+ break;
default:
pr_err("%s: Unknown PCI Device: 0x%x\n",
__func__,
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 3/9] liquidio CN23XX: VF config setup
From: Raghu Vatsavayi @ 2016-11-28 4:51 UTC (permalink / raw)
To: davem
Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
Satanand Burla, Felix Manlunas
In-Reply-To: <1480308702-6261-1-git-send-email-rvatsavayi@caviumnetworks.com>
Adds support for setting up VF configuration.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
drivers/net/ethernet/cavium/liquidio/Makefile | 1 +
.../ethernet/cavium/liquidio/cn23xx_vf_device.c | 44 +++++++
.../ethernet/cavium/liquidio/cn23xx_vf_device.h | 2 +
drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 136 +++++++++++++++++++++
.../net/ethernet/cavium/liquidio/octeon_device.c | 3 +
5 files changed, 186 insertions(+)
create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile
index 69d23fc..cca903a 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -31,6 +31,7 @@ liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
cn66xx_device.o \
cn68xx_device.o \
cn23xx_pf_device.o \
+ cn23xx_vf_device.o \
octeon_mailbox.o \
octeon_mem_ops.o \
octeon_droq.o \
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
new file mode 100644
index 0000000..d683bda
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -0,0 +1,44 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ * Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT. See the GNU General Public License for more details.
+ ***********************************************************************/
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+#include "cn23xx_vf_device.h"
+#include "octeon_main.h"
+
+int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
+{
+ struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+
+ if (octeon_map_pci_barx(oct, 0, 0))
+ return 1;
+
+ cn23xx->conf = oct_get_config_info(oct, LIO_23XX);
+ if (!cn23xx->conf) {
+ dev_err(&oct->pci_dev->dev, "%s No Config found for CN23XX\n",
+ __func__);
+ octeon_unmap_pci_barx(oct, 0);
+ return 1;
+ }
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 015b6d4..9e4fb50 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -31,4 +31,6 @@
struct octeon_cn23xx_vf {
struct octeon_config *conf;
};
+
+int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index d1b1a24..721ee66 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -22,6 +22,8 @@
#include "octeon_iq.h"
#include "response_manager.h"
#include "octeon_device.h"
+#include "octeon_main.h"
+#include "cn23xx_vf_device.h"
MODULE_AUTHOR("Cavium Networks, <support@cavium.com>");
MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual Function Driver");
@@ -37,6 +39,7 @@ struct octeon_device_priv {
static int
liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void liquidio_vf_remove(struct pci_dev *pdev);
+static int octeon_device_init(struct octeon_device *oct);
static const struct pci_device_id liquidio_vf_pci_tbl[] = {
{
@@ -84,10 +87,78 @@ struct octeon_device_priv {
/* set linux specific device pointer */
oct_dev->pci_dev = (void *)pdev;
+ if (octeon_device_init(oct_dev)) {
+ liquidio_vf_remove(pdev);
+ return -ENOMEM;
+ }
+
+ dev_dbg(&oct_dev->pci_dev->dev, "Device is ready\n");
+
return 0;
}
/**
+ * \brief PCI FLR for each Octeon device.
+ * @param oct octeon device
+ */
+static void octeon_pci_flr(struct octeon_device *oct)
+{
+ u16 status;
+
+ pci_save_state(oct->pci_dev);
+
+ pci_cfg_access_lock(oct->pci_dev);
+
+ /* Quiesce the device completely */
+ pci_write_config_word(oct->pci_dev, PCI_COMMAND,
+ PCI_COMMAND_INTX_DISABLE);
+
+ /* Wait for Transaction Pending bit clean */
+ msleep(100);
+ pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA, &status);
+ if (status & PCI_EXP_DEVSTA_TRPND) {
+ dev_info(&oct->pci_dev->dev, "Function reset incomplete after 100ms, sleeping for 5 seconds\n");
+ ssleep(5);
+ pcie_capability_read_word(oct->pci_dev, PCI_EXP_DEVSTA,
+ &status);
+ if (status & PCI_EXP_DEVSTA_TRPND)
+ dev_info(&oct->pci_dev->dev, "Function reset still incomplete after 5s, reset anyway\n");
+ }
+ pcie_capability_set_word(oct->pci_dev, PCI_EXP_DEVCTL,
+ PCI_EXP_DEVCTL_BCR_FLR);
+ mdelay(100);
+
+ pci_cfg_access_unlock(oct->pci_dev);
+
+ pci_restore_state(oct->pci_dev);
+}
+
+/**
+ *\brief Destroy resources associated with octeon device
+ * @param pdev PCI device structure
+ * @param ent unused
+ */
+static void octeon_destroy_resources(struct octeon_device *oct)
+{
+ switch (atomic_read(&oct->status)) {
+ case OCT_DEV_PCI_MAP_DONE:
+ octeon_unmap_pci_barx(oct, 0);
+ octeon_unmap_pci_barx(oct, 1);
+
+ /* fallthrough */
+ case OCT_DEV_PCI_ENABLE_DONE:
+ pci_clear_master(oct->pci_dev);
+ /* Disable the device, releasing the PCI INT */
+ pci_disable_device(oct->pci_dev);
+
+ /* fallthrough */
+ case OCT_DEV_BEGIN_STATE:
+ /* Nothing to be done here either */
+ break;
+ }
+}
+
+/**
* \brief Cleans up resources at unload time
* @param pdev PCI device structure
*/
@@ -97,12 +168,77 @@ static void liquidio_vf_remove(struct pci_dev *pdev)
dev_dbg(&oct_dev->pci_dev->dev, "Stopping device\n");
+ /* Reset the octeon device and cleanup all memory allocated for
+ * the octeon device by driver.
+ */
+ octeon_destroy_resources(oct_dev);
+
+ dev_info(&oct_dev->pci_dev->dev, "Device removed\n");
+
/* This octeon device has been removed. Update the global
* data structure to reflect this. Free the device structure.
*/
octeon_free_device_mem(oct_dev);
}
+/**
+ * \brief PCI initialization for each Octeon device.
+ * @param oct octeon device
+ */
+static int octeon_pci_os_setup(struct octeon_device *oct)
+{
+#ifdef CONFIG_PCI_IOV
+ /* setup PCI stuff first */
+ if (!oct->pci_dev->physfn)
+ octeon_pci_flr(oct);
+#endif
+
+ if (pci_enable_device(oct->pci_dev)) {
+ dev_err(&oct->pci_dev->dev, "pci_enable_device failed\n");
+ return 1;
+ }
+
+ if (dma_set_mask_and_coherent(&oct->pci_dev->dev, DMA_BIT_MASK(64))) {
+ dev_err(&oct->pci_dev->dev, "Unexpected DMA device capability\n");
+ pci_disable_device(oct->pci_dev);
+ return 1;
+ }
+
+ /* Enable PCI DMA Master. */
+ pci_set_master(oct->pci_dev);
+
+ return 0;
+}
+
+/**
+ * \brief Device initialization for each Octeon device that is probed
+ * @param octeon_dev octeon device
+ */
+static int octeon_device_init(struct octeon_device *oct)
+{
+ u32 rev_id;
+
+ atomic_set(&oct->status, OCT_DEV_BEGIN_STATE);
+
+ /* Enable access to the octeon device and make its DMA capability
+ * known to the OS.
+ */
+ if (octeon_pci_os_setup(oct))
+ return 1;
+ atomic_set(&oct->status, OCT_DEV_PCI_ENABLE_DONE);
+
+ oct->chip_id = OCTEON_CN23XX_VF_VID;
+ pci_read_config_dword(oct->pci_dev, 8, &rev_id);
+ oct->rev_id = rev_id & 0xff;
+
+ if (cn23xx_setup_octeon_vf_device(oct))
+ return 1;
+
+ atomic_set(&oct->status, OCT_DEV_PCI_MAP_DONE);
+
+ return 0;
+}
+
static int __init liquidio_vf_init(void)
{
octeon_init_device_list(0);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 05bb0fd..b4c8ee4 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -581,6 +581,8 @@ static void *__retrieve_octeon_config_info(struct octeon_device *oct,
ret = (void *)&default_cn68xx_conf;
} else if (oct->chip_id == OCTEON_CN23XX_PF_VID) {
ret = (void *)&default_cn23xx_conf;
+ } else if (oct->chip_id == OCTEON_CN23XX_VF_VID) {
+ ret = (void *)&default_cn23xx_conf;
}
break;
default:
@@ -596,6 +598,7 @@ static int __verify_octeon_config_info(struct octeon_device *oct, void *conf)
case OCTEON_CN68XX:
return lio_validate_cn6xxx_config_info(oct, conf);
case OCTEON_CN23XX_PF_VID:
+ case OCTEON_CN23XX_VF_VID:
return 0;
default:
break;
--
1.8.3.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox