* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Benjamin LaHaise @ 2014-12-15 18:03 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Jamal Hadi Salim, Jiri Pirko, sfeldma, tgraf, john.fastabend,
stephen, linville, vyasevic, netdev, davem, shm, gospo
In-Reply-To: <548F1852.7090507@cumulusnetworks.com>
On Mon, Dec 15, 2014 at 09:20:18AM -0800, Roopa Prabhu wrote:
> On 12/15/14, 7:26 AM, Jamal Hadi Salim wrote:
> >
> >Sorry - i didnt quiet follow the discussion, but i can see the value
> >of propagating things from parent to children netdevs as part of the
> >generic approach. And in that spirit:
> >
> >Ben's patches (and I am sure the cumulus folk do this) expose ports.
> >i.e you boot up the hardware and you see ports. You can then put these
> >ports in a bridge and you can offload fdbs and do other parametrization
> >to the ASIC. IOW, this only becomes a bridge because you created one
> >in the kernel and attached bridge ports to it.
> >
> >Lets say i didnt want a bridge. I want instead to take these exposed
> >ports and create a bond (and maybe play with LACP). How does this
> >propagation from parent->child->child work then? I think the idea
> >of just bonding and not exposing it as a switch is a reasonable use
> >case.
>
> We have not come to pure bonding and lacp yet (but i have mentioned it
> in many contexts before).
> The use case you mention is offloading bond attributes. This will be
> addressed as part of ongoing switchdev work
> for all other offloads (bonds, vxlans etc).
> Right now we are only talking bridge port attribute offload
> (learn/flood/port state etc). This could still be a bridge port
> attribute on a bond
> when the bond is a bridge port.
This raises the question: do we track which attributes are configured
onto a port in the switchdev code (in an attempt to maintain the state
on behalf of the underlying device), or do we simply pass them in as
attributes of the config request? With stacking, I can see the need
for different layers to add different attributes to the config for a
given switch port. Things like bonding would need to make note of the
bond interface with a common identifier so that the switch can figure
out to put the different ports into the same group.
The rtl8366s has support for one 802.3ad group, so it looks like I will
need to tackle this.
-ben
> >Also how does it work when i start doing L3 and the bond's port doesnt
> >support L3? Is it time to revive the thing we called TheThing in Du?
> yes, exactly. This is what i had indicated in my initial emails on this
> thread when the stacked devices topic came up.
> Since there was reluctance in introducing a switch device (theThing), My
> current patch tries to do that without a switch device.
> Since this is still l2, and we are dealing with bridge port attributes,
> my current patch traverses the stacked netdevs to call the
> ndo_bridge_setlink on the switch port.
>
> When it comes to l3, we can follow the same.., but as discussed in Du,
> there are other reasons where a switch device becomes necessary.
> I can submit an alternate series to cover the switch device approach for
> l2 as well.
>
> Thanks,
> Roopa
>
--
"Thought is the essence of where you are now."
^ permalink raw reply
* [PATCH net 3/3] net: dsa: bcm_sf2: always select FIXED_PHY
From: Florian Fainelli @ 2014-12-15 17:57 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
In-Reply-To: <1418666235-3109-1-git-send-email-f.fainelli@gmail.com>
There is no need to do the following:
select FIXED_PHY if NET_DSA_BCM_SF2=y, as this implies that we will not be
able to build and/or run the driver correctly when built as a module,
which is no longer an issue since commit 37e9a6904520 ("net: phy: export
fixed_phy_register()").
Fixes: 246d7f773c13ca ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/dsa/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 7cf8f4ac281f..48e62a34f7f2 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -59,7 +59,7 @@ config NET_DSA_BCM_SF2
depends on HAS_IOMEM
select NET_DSA
select NET_DSA_TAG_BRCM
- select FIXED_PHY if NET_DSA_BCM_SF2=y
+ select FIXED_PHY
select BCM7XXX_PHY
select MDIO_BCM_UNIMAC
---help---
--
2.1.0
^ permalink raw reply related
* [PATCH net 2/3] net: systemport: always select FIXED_PHY
From: Florian Fainelli @ 2014-12-15 17:57 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
In-Reply-To: <1418666235-3109-1-git-send-email-f.fainelli@gmail.com>
There is no need to do the following:
select FIXED_PHY if SYSTEMPORT=y, as this implies that we will not be able
to build and/or run the driver correctly when built as a module, which
is no longer an issue since commit 37e9a6904520 ("net: phy: export
fixed_phy_register()")
Fixes: a3862db2d3c4 ("net: systemport: hook SYSTEMPORT driver in the build")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/broadcom/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index f4789a059d54..41a3c9804427 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -155,7 +155,7 @@ config SYSTEMPORT
depends on OF
select MII
select PHYLIB
- select FIXED_PHY if SYSTEMPORT=y
+ select FIXED_PHY
help
This driver supports the built-in Ethernet MACs found in the
Broadcom BCM7xxx Set Top Box family chipset using an internal
--
2.1.0
^ permalink raw reply related
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: John Fastabend @ 2014-12-15 17:57 UTC (permalink / raw)
To: Arad, Ronen
Cc: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko,
netdev@vger.kernel.org, sfeldma@gmail.com, bcrl@kvack.org,
tgraf@suug.ch, stephen@networkplumber.org, linville@tuxdriver.com,
vyasevic@redhat.com, davem@davemloft.net, shm@cumulusnetworks.com,
gospo@cumulusnetworks.com
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505D999EA@ORSMSX106.amr.corp.intel.com>
On 12/15/2014 09:25 AM, Arad, Ronen wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of Jamal Hadi Salim
>> Sent: Monday, December 15, 2014 5:26 PM
>> To: Roopa Prabhu; Jiri Pirko
>> Cc: sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
>> john.fastabend@gmail.com; stephen@networkplumber.org;
>> linville@tuxdriver.com; vyasevic@redhat.com; netdev@vger.kernel.org;
>> davem@davemloft.net; shm@cumulusnetworks.com;
>> gospo@cumulusnetworks.com
>> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
>> bridge port attributes
>>
>>
>> Sorry - i didnt quiet follow the discussion, but i can see the value of
>> propagating things from parent to children netdevs as part of the generic
>> approach. And in that spirit:
>>
>> Ben's patches (and I am sure the cumulus folk do this) expose ports.
>> i.e you boot up the hardware and you see ports. You can then put these
>> ports in a bridge and you can offload fdbs and do other parametrization to
>> the ASIC. IOW, this only becomes a bridge because you created one in the
>> kernel and attached bridge ports to it.
>>
>> Lets say i didnt want a bridge. I want instead to take these exposed ports and
>> create a bond (and maybe play with LACP). How does this propagation from
>> parent->child->child work then? I think the idea of just bonding and not
>> exposing it as a switch is a reasonable use case.
>
> Are you saying that the software should reflect the same
> functionality the HW provides?
> In other words is creating a bridge device mandatory for supporting
> standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
No it shouldn't be mandatory. And I don't think it is at the moment.
Users are free to manage the FDB tables directly via netlink or
configure the software bridge to sync them. This seems like a good
model to follow to me and we should try to get as many features as
it makes sense to follow this model.
> VLAN-bridging including port VLAN membership, VLAN filtering, PVID,
> Egress un-tagging could be supported without an explicit bridge device
> when port devices implement bridge ndos (ndo_bridge_{set,del,get}link).
> What is lost is the ability to have common handling of VLAN-aware FDB in
> the bridge module.
not sure what is lost here? Its seems the SW bridge does (or at least
could) support the same vlan capabilities. And the bridge could push
these into hardware when Roopa's offload bit is set. Or if users want
to manage everything outside bridge calling the ndo_bridge_ ops directly
works as well. By the way I believe the software linux bridge supports
most of those features you listed today. If we missed something we can
add it.
> Do we expect switch port devices to support L2 functionality both
> ways (with and without an explicit bridge device)?
My opinion yes. But in fact the driver shouldn't care what is driving
it. The paths should be the same for direct user manipulation via
netlink and SELF|MASTER bit, bridge module, or any other in-kernel
sub-system.
> Will the decision about using a bridge device or avoiding it be left
> to the end-user?
Its a user policy decision. Again the offload bit gets us this in
a reasonably configurable way IMO.
> (This requires switch port drivers to be able to work and provide
> similar functionality in both setups).
Right, but if the drivers "care" who is calling their ndo ops something
is seriously broken. For the driver it should not need to know anything
about the callers so it doesn't matter to the driver if its a netlink
call from user space or an internal call fro bridge.ko
> I think that we need to outline the handling of L3 as it could
> determine or at least impact some of the answers to my above questions.
L3 should follow the same model. Admittedly I've not worked through the
L3 cases closely but I don't see why we can't apply the same model.
And maybe this is where we need to introduce a container to hold some
state as Jamal says. The easiest way to see this will be to look at
some proposed code.
> cheers,
> ronen
>
>> Also how does it work when i start doing L3 and the bond's port doesnt
>> support L3? Is it time to revive the thing we called TheThing in Du?
>>
>> cheers,
>> jamal
>>
>> On 12/14/14 14:41, Roopa Prabhu wrote:
>>> On 12/14/14, 7:35 AM, Jiri Pirko wrote:
>>
>> [..chopped off for brevity and saving electrons..]
>>
>> cheers,
>> jamal
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
--
John Fastabend Intel Corporation
^ permalink raw reply
* [PATCH net 1/3] net: bcmgenet: always select FIXED_PHY
From: Florian Fainelli @ 2014-12-15 17:57 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
In-Reply-To: <1418666235-3109-1-git-send-email-f.fainelli@gmail.com>
There is no need to do the following:
select FIXED_PHY if BCMGENET=y, as this implies that we will not be able
to build and/or run the driver correctly when built as a module, which
is no longer an issue since commit 37e9a6904520 ("net: phy: export
fixed_phy_register()")
Fixes: b0ba512e225d ("net: bcmgenet: enable driver to work without device tree"
Fixes: bdaa53bde55f ("net: bcmgenet: hook into the build system")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/ethernet/broadcom/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 888247ad9068..f4789a059d54 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -64,7 +64,7 @@ config BCMGENET
tristate "Broadcom GENET internal MAC support"
select MII
select PHYLIB
- select FIXED_PHY if BCMGENET=y
+ select FIXED_PHY
select BCM7XXX_PHY
help
This driver supports the built-in Ethernet MACs found in the
--
2.1.0
^ permalink raw reply related
* [PATCH net 0/3] net: broadcom: fix FIXED_PHY dependencies
From: Florian Fainelli @ 2014-12-15 17:57 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
Hi David,
This patch series removes the bogus "select FIXED_PHY if FOO=y" that I have
been using in GENET, SYSTEMPORT and the SF2 DSA switch driver.
Thanks!
Florian Fainelli (3):
net: bcmgenet: always select FIXED_PHY
net: systemport: always select FIXED_PHY
net: dsa: bcm_sf2: always select FIXED_PHY
drivers/net/dsa/Kconfig | 2 +-
drivers/net/ethernet/broadcom/Kconfig | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
--
2.1.0
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Benjamin LaHaise @ 2014-12-15 17:57 UTC (permalink / raw)
To: Arad, Ronen
Cc: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko,
netdev@vger.kernel.org, sfeldma@gmail.com, tgraf@suug.ch,
john.fastabend@gmail.com, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505D999EA@ORSMSX106.amr.corp.intel.com>
On Mon, Dec 15, 2014 at 05:25:00PM +0000, Arad, Ronen wrote:
> > Ben's patches (and I am sure the cumulus folk do this) expose ports.
> > i.e you boot up the hardware and you see ports. You can then put these
> > ports in a bridge and you can offload fdbs and do other parametrization to
> > the ASIC. IOW, this only becomes a bridge because you created one in the
> > kernel and attached bridge ports to it.
> >
> > Lets say i didnt want a bridge. I want instead to take these exposed ports and
> > create a bond (and maybe play with LACP). How does this propagation from
> > parent->child->child work then? I think the idea of just bonding and not
> > exposing it as a switch is a reasonable use case.
>
> Are you saying that the software should reflect the same functionality the HW provides?
> In other words is creating a bridge device mandatory for supporting standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
Re-using the existing Linux APIs for bridging is the approach I've taken
with the rtl8366s driver I've been working on. It makes no sense to create
entirely new APIs for these operations.
> VLAN-bridging including port VLAN membership, VLAN filtering, PVID, Egress un-tagging could be supported without an explicit bridge device when port devices implement bridge ndos (ndo_bridge_{set,del,get}link). What is lost is the ability to have common handling of VLAN-aware FDB in the bridge module.
> Do we expect switch port devices to support L2 functionality both ways (with and without an explicit bridge device)?
I've already got egress untagging working if the VLANs are configured using
vconfig and added to a bridge. The lack of this functionality in the "new"
API for bridging VLANs is a huge complaint I have about the new bridge
configuration API. I feel that API change was a step backwards in terms
of functionality. It also conflates the different FDBs for different VLANs
into the same hash table, again, something i find rather messy compared to
the 1 FDB to 1 bridge state the bridge device was in originally.
-ben
> Will the decision about using a bridge device or avoiding it be left to the end-user?
> (This requires switch port drivers to be able to work and provide similar functionality in both setups).
> I think that we need to outline the handling of L3 as it could determine or at least impact some of the answers to my above questions.
>
> cheers,
> ronen
>
> > Also how does it work when i start doing L3 and the bond's port doesnt
> > support L3? Is it time to revive the thing we called TheThing in Du?
> >
> > cheers,
> > jamal
> >
> > On 12/14/14 14:41, Roopa Prabhu wrote:
> > > On 12/14/14, 7:35 AM, Jiri Pirko wrote:
> >
> > [..chopped off for brevity and saving electrons..]
> >
> > cheers,
> > jamal
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> > of a message to majordomo@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
--
"Thought is the essence of where you are now."
^ permalink raw reply
* Re: [PATCH net-next v9 0/3] add hisilicon hip04 ethernet driver
From: Arnd Bergmann @ 2014-12-15 13:29 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Ding Tianhong, zhangfei.gao, davem, linux, f.fainelli,
sergei.shtylyov, mark.rutland, David.Laight, eric.dumazet, xuwei5,
netdev, devicetree
In-Reply-To: <1418298150-4944-1-git-send-email-dingtianhong@huawei.com>
On Thursday 11 December 2014 19:42:27 Ding Tianhong wrote:
> v9:
> - There is no tx completion interrupts to free DMAd Tx packets, it means taht
> we rely on new tx packets arriving to run the destructors of completed packets,
> which open up space in their sockets's send queues. Sometimes we don't get such
> new packets causing Tx to stall, a single UDP transmitter is a good example of
> this situation, so we need a clean up workqueue to reclaims completed packets,
> the workqueue will only free the last packets which is already stay for several jiffies.
> Also fix some format cleanups.
You must have missed the reply in which David Miller explained why
you can't call skb_orphan and rely on an occasional cleanup of the
queue. Please use something like the patch below:
- drop the broken skb_orphan call
- drop the workqueue
- batch cleanup based on tx_coalesce_frames/usecs for better throughput
- use a reasonable default tx timeout (200us, could be shorted
based on measurements) with a range timer
- fix napi poll function return value
- use a lockless queue for cleanup
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Feel free to fold this into your patch rather than keeping it separate.
Completely untested, probably contains some bugs. Please ask if you
have questions about this.
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 9d37b670db6a..d85c5287654e 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -12,6 +12,7 @@
#include <linux/etherdevice.h>
#include <linux/platform_device.h>
#include <linux/interrupt.h>
+#include <linux/ktime.h>
#include <linux/of_address.h>
#include <linux/phy.h>
#include <linux/of_mdio.h>
@@ -157,9 +158,11 @@ struct hip04_priv {
struct sk_buff *tx_skb[TX_DESC_NUM];
dma_addr_t tx_phys[TX_DESC_NUM];
unsigned int tx_head;
- unsigned int tx_tail;
- int tx_count;
- unsigned long last_tx;
+
+ /* FIXME: make these adjustable through ethtool */
+ int tx_coalesce_frames;
+ int tx_coalesce_usecs;
+ struct hrtimer tx_coalesce_timer;
unsigned char *rx_buf[RX_DESC_NUM];
dma_addr_t rx_phys[RX_DESC_NUM];
@@ -171,10 +174,15 @@ struct hip04_priv {
struct regmap *map;
struct work_struct tx_timeout_task;
- struct workqueue_struct *wq;
- struct delayed_work tx_clean_task;
+ /* written only by tx cleanup */
+ unsigned int tx_tail ____cacheline_aligned_in_smp;
};
+static inline unsigned int tx_count(unsigned int head, unsigned int tail)
+{
+ return (head - tail) % (TX_DESC_NUM - 1);
+}
+
static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
{
struct hip04_priv *priv = netdev_priv(ndev);
@@ -351,18 +359,21 @@ static int hip04_set_mac_address(struct net_device *ndev, void *addr)
return 0;
}
-static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+static int hip04_tx_reclaim(struct net_device *ndev, bool force)
{
struct hip04_priv *priv = netdev_priv(ndev);
unsigned tx_tail = priv->tx_tail;
struct tx_desc *desc;
unsigned int bytes_compl = 0, pkts_compl = 0;
+ unsigned int count;
- if (priv->tx_count == 0)
+ smp_rmb();
+ count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
+ if (count == 0)
goto out;
- while ((tx_tail != priv->tx_head) || (priv->tx_count == TX_DESC_NUM)) {
- desc = &priv->tx_desc[priv->tx_tail];
+ while (count) {
+ desc = &priv->tx_desc[tx_tail];
if (desc->send_addr != 0) {
if (force)
desc->send_addr = 0;
@@ -381,57 +392,37 @@ static void hip04_tx_reclaim(struct net_device *ndev, bool force)
dev_kfree_skb(priv->tx_skb[tx_tail]);
priv->tx_skb[tx_tail] = NULL;
tx_tail = TX_NEXT(tx_tail);
- priv->tx_count--;
-
- if (priv->tx_count <= 0)
- break;
+ count--;
}
priv->tx_tail = tx_tail;
+ smp_wmb(); /* Ensure tx_tail visible to xmit */
- /* Ensure tx_tail & tx_count visible to xmit */
- smp_mb();
out:
-
if (pkts_compl || bytes_compl)
netdev_completed_queue(ndev, pkts_compl, bytes_compl);
- if (unlikely(netif_queue_stopped(ndev)) &&
- (priv->tx_count < TX_DESC_NUM))
+ if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
netif_wake_queue(ndev);
-}
-static void hip04_tx_clean_monitor(struct work_struct *work)
-{
- struct hip04_priv *priv = container_of(work, struct hip04_priv,
- tx_clean_task.work);
- struct net_device *ndev = priv->ndev;
- int delta_in_ticks = msecs_to_jiffies(1000);
-
- if (!time_in_range(jiffies, priv->last_tx,
- priv->last_tx + delta_in_ticks)) {
- netif_tx_lock(ndev);
- hip04_tx_reclaim(ndev, false);
- netif_tx_unlock(ndev);
- }
- queue_delayed_work(priv->wq, &priv->tx_clean_task, delta_in_ticks);
+ return count;
}
static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
struct hip04_priv *priv = netdev_priv(ndev);
struct net_device_stats *stats = &ndev->stats;
- unsigned int tx_head = priv->tx_head;
+ unsigned int tx_head = priv->tx_head, count;
struct tx_desc *desc = &priv->tx_desc[tx_head];
dma_addr_t phys;
- if (priv->tx_count >= TX_DESC_NUM) {
+ smp_rmb();
+ count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
+ if (count == (TX_DESC_NUM - 1)) {
netif_stop_queue(ndev);
return NETDEV_TX_BUSY;
}
- hip04_tx_reclaim(ndev, false);
-
phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
if (dma_mapping_error(&ndev->dev, phys)) {
dev_kfree_skb(skb);
@@ -447,20 +438,33 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
desc->wb_addr = cpu_to_be32(phys);
skb_tx_timestamp(skb);
- /* Don't wait up for transmitted skbs to be freed. */
- skb_orphan(skb);
-
+ /* FIXME: eliminate this mmio access if xmit_more is set */
hip04_set_xmit_desc(priv, phys);
priv->tx_head = TX_NEXT(tx_head);
+ count++;
netdev_sent_queue(ndev, skb->len);
stats->tx_bytes += skb->len;
stats->tx_packets++;
- priv->tx_count++;
- priv->last_tx = jiffies;
- /* Ensure tx_head & tx_count update visible to tx reclaim */
- smp_mb();
+ /* Ensure tx_head update visible to tx reclaim */
+ smp_wmb();
+
+ /* queue is getting full, better start cleaning up now */
+ if (count >= priv->tx_coalesce_frames) {
+ if (napi_schedule_prep(&priv->napi)) {
+ /* disable rx interrupt and timer */
+ priv->reg_inten &= ~(RCV_INT);
+ writel_relaxed(DEF_INT_MASK & ~RCV_INT,
+ priv->base + PPE_INTEN);
+ hrtimer_cancel(&priv->tx_coalesce_timer);
+ __napi_schedule(&priv->napi);
+ }
+ } else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
+ /* cleanup not pending yet, start a new timer */
+ hrtimer_start_expires(&priv->tx_coalesce_timer,
+ HRTIMER_MODE_REL);
+ }
return NETDEV_TX_OK;
}
@@ -477,6 +481,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
bool last = false;
dma_addr_t phys;
int rx = 0;
+ int tx_remaining;
u16 len;
u32 err;
@@ -513,11 +518,11 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
buf = netdev_alloc_frag(priv->rx_buf_size);
if (!buf)
- return -ENOMEM;
+ goto done;
phys = dma_map_single(&ndev->dev, buf,
RX_BUF_SIZE, DMA_FROM_DEVICE);
if (dma_mapping_error(&ndev->dev, phys))
- return -EIO;
+ goto done;
priv->rx_buf[priv->rx_head] = buf;
priv->rx_phys[priv->rx_head] = phys;
hip04_set_recv_desc(priv, phys);
@@ -537,6 +542,11 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
}
napi_complete(napi);
done:
+ /* clean up tx descriptors and start a new timer if necessary */
+ tx_remaining = hip04_tx_reclaim(ndev, false);
+ if (rx < budget && tx_remaining)
+ hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
+
return rx;
}
@@ -547,6 +557,9 @@ static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
struct net_device_stats *stats = &ndev->stats;
u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+ if (!ists)
+ return IRQ_NONE;
+
writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
if (unlikely(ists & DEF_INT_ERR)) {
@@ -560,16 +573,32 @@ static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
}
}
- if (ists & RCV_INT) {
+ if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
/* disable rx interrupt */
priv->reg_inten &= ~(RCV_INT);
- writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
- napi_schedule(&priv->napi);
+ writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+ hrtimer_cancel(&priv->tx_coalesce_timer);
+ __napi_schedule(&priv->napi);
}
return IRQ_HANDLED;
}
+enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
+{
+ struct hip04_priv *priv;
+ priv = container_of(hrtimer, struct hip04_priv, tx_coalesce_timer);
+
+ if (napi_schedule_prep(&priv->napi)) {
+ /* disable rx interrupt */
+ priv->reg_inten &= ~(RCV_INT);
+ writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+ __napi_schedule(&priv->napi);
+ }
+
+ return HRTIMER_NORESTART;
+}
+
static void hip04_adjust_link(struct net_device *ndev)
{
struct hip04_priv *priv = netdev_priv(ndev);
@@ -589,7 +618,6 @@ static int hip04_mac_open(struct net_device *ndev)
priv->rx_head = 0;
priv->tx_head = 0;
priv->tx_tail = 0;
- priv->tx_count = 0;
hip04_reset_ppe(priv);
for (i = 0; i < RX_DESC_NUM; i++) {
@@ -612,9 +640,6 @@ static int hip04_mac_open(struct net_device *ndev)
hip04_mac_enable(ndev);
napi_enable(&priv->napi);
- INIT_DELAYED_WORK(&priv->tx_clean_task, hip04_tx_clean_monitor);
- queue_delayed_work(priv->wq, &priv->tx_clean_task, 0);
-
return 0;
}
@@ -623,8 +648,6 @@ static int hip04_mac_stop(struct net_device *ndev)
struct hip04_priv *priv = netdev_priv(ndev);
int i;
- cancel_delayed_work_sync(&priv->tx_clean_task);
-
napi_disable(&priv->napi);
netif_stop_queue(ndev);
hip04_mac_disable(ndev);
@@ -725,6 +748,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
struct hip04_priv *priv;
struct resource *res;
unsigned int irq;
+ ktime_t txtime;
int ret;
ndev = alloc_etherdev(sizeof(struct hip04_priv));
@@ -751,6 +775,21 @@ static int hip04_mac_probe(struct platform_device *pdev)
priv->port = arg.args[0];
priv->chan = arg.args[1] * RX_DESC_NUM;
+ hrtimer_init(&priv->tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+ /*
+ * BQL will try to keep the TX queue as short as possible, but it can't
+ * be faster than tx_coalesce_usecs, so we need a fast timeout here,
+ * but also long enough to gather up enough frames to ensure we don't
+ * get more interrupts than necessary.
+ * 200us is enough for 16 frames of 1500 bytes at gigabit ethernet rate
+ */
+ priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
+ priv->tx_coalesce_usecs = 200;
+ /* allow timer to fire after half the time at the earliest */
+ txtime = ktime_set(0, priv->tx_coalesce_usecs * NSEC_PER_USEC / 2);
+ hrtimer_set_expires_range(&priv->tx_coalesce_timer, txtime, txtime);
+
priv->map = syscon_node_to_regmap(arg.np);
if (IS_ERR(priv->map)) {
dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
@@ -788,12 +827,6 @@ static int hip04_mac_probe(struct platform_device *pdev)
}
}
- priv->wq = create_singlethread_workqueue(ndev->name);
- if (!priv->wq) {
- ret = -ENOMEM;
- goto init_fail;
- }
-
INIT_WORK(&priv->tx_timeout_task, hip04_tx_timeout_task);
ether_setup(ndev);
@@ -848,8 +881,6 @@ static int hip04_remove(struct platform_device *pdev)
free_irq(ndev->irq, ndev);
of_node_put(priv->phy_node);
cancel_work_sync(&priv->tx_timeout_task);
- if (priv->wq)
- destroy_workqueue(priv->wq);
free_netdev(ndev);
return 0;
^ permalink raw reply related
* RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Arad, Ronen @ 2014-12-15 17:25 UTC (permalink / raw)
To: Jamal Hadi Salim, Roopa Prabhu, Jiri Pirko,
netdev@vger.kernel.org
Cc: sfeldma@gmail.com, bcrl@kvack.org, tgraf@suug.ch,
john.fastabend@gmail.com, stephen@networkplumber.org,
linville@tuxdriver.com, vyasevic@redhat.com, davem@davemloft.net,
shm@cumulusnetworks.com, gospo@cumulusnetworks.com
In-Reply-To: <548EFD94.8070905@mojatatu.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Jamal Hadi Salim
> Sent: Monday, December 15, 2014 5:26 PM
> To: Roopa Prabhu; Jiri Pirko
> Cc: sfeldma@gmail.com; bcrl@kvack.org; tgraf@suug.ch;
> john.fastabend@gmail.com; stephen@networkplumber.org;
> linville@tuxdriver.com; vyasevic@redhat.com; netdev@vger.kernel.org;
> davem@davemloft.net; shm@cumulusnetworks.com;
> gospo@cumulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
>
> Sorry - i didnt quiet follow the discussion, but i can see the value of
> propagating things from parent to children netdevs as part of the generic
> approach. And in that spirit:
>
> Ben's patches (and I am sure the cumulus folk do this) expose ports.
> i.e you boot up the hardware and you see ports. You can then put these
> ports in a bridge and you can offload fdbs and do other parametrization to
> the ASIC. IOW, this only becomes a bridge because you created one in the
> kernel and attached bridge ports to it.
>
> Lets say i didnt want a bridge. I want instead to take these exposed ports and
> create a bond (and maybe play with LACP). How does this propagation from
> parent->child->child work then? I think the idea of just bonding and not
> exposing it as a switch is a reasonable use case.
Are you saying that the software should reflect the same functionality the HW provides?
In other words is creating a bridge device mandatory for supporting standard VLAN-bridging (as in IEEE 802.1Q) in the HW?
VLAN-bridging including port VLAN membership, VLAN filtering, PVID, Egress un-tagging could be supported without an explicit bridge device when port devices implement bridge ndos (ndo_bridge_{set,del,get}link). What is lost is the ability to have common handling of VLAN-aware FDB in the bridge module.
Do we expect switch port devices to support L2 functionality both ways (with and without an explicit bridge device)?
Will the decision about using a bridge device or avoiding it be left to the end-user?
(This requires switch port drivers to be able to work and provide similar functionality in both setups).
I think that we need to outline the handling of L3 as it could determine or at least impact some of the answers to my above questions.
cheers,
ronen
> Also how does it work when i start doing L3 and the bond's port doesnt
> support L3? Is it time to revive the thing we called TheThing in Du?
>
> cheers,
> jamal
>
> On 12/14/14 14:41, Roopa Prabhu wrote:
> > On 12/14/14, 7:35 AM, Jiri Pirko wrote:
>
> [..chopped off for brevity and saving electrons..]
>
> cheers,
> jamal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH iproute2 v4] ip: Simplify executing ip cmd within network ns
From: Nicolas Dichtel @ 2014-12-15 17:30 UTC (permalink / raw)
To: Jiri Pirko, vadim4j; +Cc: netdev
In-Reply-To: <20141213152024.GB1849@nanopsycho.orion>
Le 13/12/2014 16:20, Jiri Pirko a écrit :
> Sat, Dec 13, 2014 at 02:32:10PM CET, vadim4j@gmail.com wrote:
>> On Sat, Dec 13, 2014 at 10:58:03AM +0200, vadim4j@gmail.com wrote:
>>> On Sat, Dec 13, 2014 at 10:42:43AM +0200, vadim4j@gmail.com wrote:
>>>> On Sat, Dec 13, 2014 at 09:29:36AM +0100, Jiri Pirko wrote:
>>>>> Fri, Dec 12, 2014 at 11:15:07PM CET, vadim4j@gmail.com wrote:
>>>>>> From: Vadim Kochan <vadim4j@gmail.com>
>>>>>>
>>>>>> Added new '-netns' option to simplify executing following cmd:
>>>>>>
>>>>>> ip netns exec NETNS ip OPTIONS COMMAND OBJECT
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ip -n[etns] NETNS OPTIONS COMMAND OBJECT
>>>>>>
>>>>>> e.g.:
>>>>>>
>>>>>> ip -net vnet0 link add br0 type bridge
>>>>>> ip -n vnet0 link
>>>>>>
>>>>>> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
>>>>>
>>>>>
>>>>> This looks good. I'm still missing support in tc, bridge, etc. I think
>>>>> it would be great to do this in the same patch/patchset.
>>>>>
>>>> I planned to do this in the future patches after this main
>>>> changes will be accepted. Actually adding this option to other
>>>> tools is trivial.
>>>>
>>>> Anyway may be I will re-send v5 with supporting of these tools if I will have time.
>>>>
>>>> Regards,
>>>
>>> BTW, some tools already have '-n' option, so I think only '-net' can be
>>> used in such cases.
>
>
> Yep, that is my point. I would like to have the same option for all.
Agreed. The real option name is '-netns'. The fact that '-n' will work comes
from how 'ip' is implemented. This kind of shortcut will depend on each tool
implementation. But again, the *real* option name is '-netns' ;-)
Regards,
Nicolas
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del bridge port attributes
From: Roopa Prabhu @ 2014-12-15 17:20 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jiri Pirko, sfeldma, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic, netdev, davem, shm, gospo
In-Reply-To: <548EFD94.8070905@mojatatu.com>
On 12/15/14, 7:26 AM, Jamal Hadi Salim wrote:
>
> Sorry - i didnt quiet follow the discussion, but i can see the value
> of propagating things from parent to children netdevs as part of the
> generic approach. And in that spirit:
>
> Ben's patches (and I am sure the cumulus folk do this) expose ports.
> i.e you boot up the hardware and you see ports. You can then put these
> ports in a bridge and you can offload fdbs and do other parametrization
> to the ASIC. IOW, this only becomes a bridge because you created one
> in the kernel and attached bridge ports to it.
>
> Lets say i didnt want a bridge. I want instead to take these exposed
> ports and create a bond (and maybe play with LACP). How does this
> propagation from parent->child->child work then? I think the idea
> of just bonding and not exposing it as a switch is a reasonable use
> case.
We have not come to pure bonding and lacp yet (but i have mentioned it
in many contexts before).
The use case you mention is offloading bond attributes. This will be
addressed as part of ongoing switchdev work
for all other offloads (bonds, vxlans etc).
Right now we are only talking bridge port attribute offload
(learn/flood/port state etc). This could still be a bridge port
attribute on a bond
when the bond is a bridge port.
> Also how does it work when i start doing L3 and the bond's port doesnt
> support L3? Is it time to revive the thing we called TheThing in Du?
yes, exactly. This is what i had indicated in my initial emails on this
thread when the stacked devices topic came up.
Since there was reluctance in introducing a switch device (theThing), My
current patch tries to do that without a switch device.
Since this is still l2, and we are dealing with bridge port attributes,
my current patch traverses the stacked netdevs to call the
ndo_bridge_setlink on the switch port.
When it comes to l3, we can follow the same.., but as discussed in Du,
there are other reasons where a switch device becomes necessary.
I can submit an alternate series to cover the switch device approach for
l2 as well.
Thanks,
Roopa
^ permalink raw reply
* Re: [PATCH 0/2] net/macb: fix multiqueue support patch up
From: David Miller @ 2014-12-15 16:51 UTC (permalink / raw)
To: cyrille.pitchen
Cc: nicolas.ferre, linux-arm-kernel, netdev, soren.brinkmann,
linux-kernel
In-Reply-To: <cover.1418651689.git.cyrille.pitchen@atmel.com>
From: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Date: Mon, 15 Dec 2014 15:13:30 +0100
> This series of patches is a fixup for the multiqueue support patch.
>
> The first patch fixes a bug introduced by the multiqueue support patch.
> The second one doesn't fix a bug but simplify the source code by removing
> useless calls to devm_free_irq() since we use managed device resources for
> IRQs.
>
> They were applied on the net-next tree and tested with a sama5d36ek board.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH] rds: Fix min() warning in rds_message_inc_copy_to_user()
From: David Miller @ 2014-12-15 16:49 UTC (permalink / raw)
To: geert; +Cc: viro, netdev, linux-kernel
In-Reply-To: <1418646102-32415-1-git-send-email-geert@linux-m68k.org>
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 15 Dec 2014 13:21:42 +0100
> net/rds/message.c: In function ‘rds_message_inc_copy_to_user’:
> net/rds/message.c:328: warning: comparison of distinct pointer types lacks a cast
>
> Use min_t(unsigned long, ...) like is done in
> rds_message_copy_from_user().
>
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Also applied, thanks Geert.
^ permalink raw reply
* Re: [PATCH] net: stmmac: sti: Fix uninitialized pointer dereference if !OF
From: David Miller @ 2014-12-15 16:48 UTC (permalink / raw)
To: geert; +Cc: peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <1418642751-23308-1-git-send-email-geert@linux-m68k.org>
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Mon, 15 Dec 2014 12:25:51 +0100
> If CONFIG_OF is not set:
>
> drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c: In function ‘sti_dwmac_parse_data’:
> drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c:318: warning: ‘rs’ is used uninitialized in this function
>
> of_property_read_string() will return -ENOSYS in this case, and rs will
> be an uninitialized pointer.
>
> While the fallback clock selection is already selected correctly in this
> case, the string comparisons should be skipped too, else the system will
> crash while dereferencing the uninitialized pointer.
>
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Applied.
^ permalink raw reply
* Re: [PATCH net v3] net: smc91x: Fix build without gpiolib
From: David Miller @ 2014-12-15 16:47 UTC (permalink / raw)
To: tklauser; +Cc: nico, tony, netdev
In-Reply-To: <1418634147-13442-1-git-send-email-tklauser@distanz.ch>
From: Tobias Klauser <tklauser@distanz.ch>
Date: Mon, 15 Dec 2014 10:02:27 +0100
> If GPIOLIB=n the following build errors occur:
>
> drivers/net/ethernet/smsc/smc91x.c: In function 'try_toggle_control_gpio':
> drivers/net/ethernet/smsc/smc91x.c:2204:2: error: implicit declaration of function 'devm_gpiod_get_index' [-Werror=implicit-function-declaration]
> drivers/net/ethernet/smsc/smc91x.c:2204:7: warning: assignment makes pointer from integer without a cast [enabled by default]
> drivers/net/ethernet/smsc/smc91x.c:2213:2: error: implicit declaration of function 'gpiod_direction_output' [-Werror=implicit-function-declaration]
> drivers/net/ethernet/smsc/smc91x.c:2216:3: error: implicit declaration of function 'devm_gpiod_put' [-Werror=implicit-function-declaration]
> drivers/net/ethernet/smsc/smc91x.c:2222:2: error: implicit declaration of function 'gpiod_set_value_cansleep' [-Werror=implicit-function-declaration]
>
> Fix this by letting the driver depend on GPIOLIB if OF is selected.
>
> Fixes: 7d2911c4381 ("net: smc91x: Fix gpios for device tree based booting")
> Cc: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Applied.
^ permalink raw reply
* Re: [PATCH net, v2] gre: fix the inner mac header in nbma tunnel xmit path
From: David Miller @ 2014-12-15 16:46 UTC (permalink / raw)
To: timo.teras; +Cc: netdev, therbert, alexander.h.duyck
In-Reply-To: <1418628253-4950-1-git-send-email-timo.teras@iki.fi>
From: Timo Teräs <timo.teras@iki.fi>
Date: Mon, 15 Dec 2014 09:24:13 +0200
> The NBMA GRE tunnels temporarily push GRE header that contain the
> per-packet NBMA destination on the skb via header ops early in xmit
> path. It is the later pulled before the real GRE header is constructed.
>
> The inner mac was thus set differently in nbma case: the GRE header
> has been pushed by neighbor layer, and mac header points to beginning
> of the temporary gre header (set by dev_queue_xmit).
>
> Now that the offloads expect mac header to point to the gre payload,
> fix the xmit patch to:
> - pull first the temporary gre header away
> - and reset mac header to point to gre payload
>
> This fixes tso to work again with nbma tunnels.
>
> Fixes: 14051f0452a2 ("gre: Use inner mac length when computing tunnel length")
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net-next] fib_trie.txt: fix typo
From: David Miller @ 2014-12-15 16:45 UTC (permalink / raw)
To: duanj.fnst; +Cc: netdev
In-Reply-To: <548E86AD.3090702@cn.fujitsu.com>
From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Mon, 15 Dec 2014 14:58:53 +0800
> Fix the typo, there should be "It".
> On the other hand, fix whitespace errors detected by checkpatch.pl
>
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [PATCH] cirrus: cs89x0: fix time comparison
From: David Miller @ 2014-12-15 16:44 UTC (permalink / raw)
To: asaf.vertz; +Cc: ebiederm, julia.lawall, himangi774, netdev, linux-kernel
In-Reply-To: <20141214083418.GA25910@ubuntu>
From: Asaf Vertz <asaf.vertz@tandemg.com>
Date: Sun, 14 Dec 2014 10:34:18 +0200
> To be future-proof and for better readability the time comparisons are
> modified to use time_before, time_after, and time_after_eq instead of
> plain, error-prone math.
>
> Signed-off-by: Asaf Vertz <asaf.vertz@tandemg.com>
Applied.
^ permalink raw reply
* Re: [RFC PATCH net-next 1/1] net: Support for switch port configuration
From: Roopa Prabhu @ 2014-12-15 16:44 UTC (permalink / raw)
To: Thomas Graf
Cc: Varlese, Marco, John Fastabend, Jiri Pirko,
netdev@vger.kernel.org, stephen@networkplumber.org,
Fastabend, John R, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <20141215144027.GA21262@casper.infradead.org>
On 12/15/14, 6:40 AM, Thomas Graf wrote:
> On 12/15/14 at 02:29pm, Varlese, Marco wrote:
>>> All of these are highly generic and should *not* be passed through from user
>>> space to the driver directly but rather be properly abstracted as Roopa
>>> proposed. The value of this API is abstraction.
>> How would you let the user enable/disable features then? For instance, how would the user enable/disable flooding for broadcast packets (BFLOODING) on a given port? What I was proposing is to have a list of attributes (to be added in if_link.h) which can be tuned by the user using a tool like iproute2. What do you propose?
> Excellent, I agree with what you are saying. What set me off is that
> the patch does not reflect that yet. Instead, the patch introduces
> a pure Netlink pass-through API to the driver.
>
> I would expect the patch to:
> 1. Parse the Netlink messages and be aware of individual attributes
> 2. Validate them
> 3. Pass the configuration to the driver using an API that can also
> be consumed from in-kernel users.
yes, exactly.
>
>> I think I have seen Roopa posting his updated ndo patch and getting some feedback by few people already and as long as I will be able to accomplish the use case described here I am happy with his way.
> I think Roopa's patches are supplementary. Not all switchdev users
> will be backed with a Linux Bridge. I therefore welcome your patches
> very much.
>
> The overlap is in the ndo. I think both the API you propose and
> Roopa's bridge code should use the same NDO.
>> I do not have an example right now of a vendor specific attribute but I was just saying that might happen (i.e. someone will have a feature not implemented by others?).
> That's fine. Once we have them we can consider adding vendor specific
> extensions.
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Josef Bacik @ 2014-12-15 16:42 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng
Cc: Martin KaFai Lau, netdev@vger.kernel.org, David S. Miller,
Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
Kernel Team
In-Reply-To: <1418659395.9773.13.camel@edumazet-glaptop2.roam.corp.google.com>
On 12/15/2014 11:03 AM, Eric Dumazet wrote:
> On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>
>> I think patches 1 and 3 are good additions, since they establish
>> few permanent points of instrumentation in tcp stack.
>> Patches 4-5 look more like use cases of tracepoints established
>> before. They may feel like simple additions and, no doubt,
>> they are useful, but since they expose things via tracing
>> infra they become part of api and cannot be changed later,
>> when more stats would be needed.
>> I think systemtap like scripting on top of patches 1 and 3
>> should solve your use case ?
>> Also, have you looked at recent eBPF work?
>> Though it's not completely ready yet, soon it should
>> be able to do the same stats collection as you have
>> in 4/5 without adding permanent pieces to the kernel.
>
> So it looks like web10g like interfaces are very often requested by
> various teams.
>
> And we have many different views on how to hack this. I am astonished by
> number of hacks I saw about this stuff going on.
>
> What about a clean way, extending current TCP_INFO, which is both
> available as a getsockopt() for socket owners and ss/iproute2
> information for 'external entities'
>
> If we consider web10g info needed, then adding a ftrace/eBPF like
> interface is simply yet another piece of code we need to maintain,
> and the argument of 'this should cost nothing if not activated' is
> nonsense since major players need to constantly monitor TCP metrics and
> behavior.
>
> It seems both FaceBook and Google are working on a subset of web10g.
>
> I suggest we meet together and establish a common ground, preferably
> after Christmas holidays.
>
We've set up something for exactly this case at the end of January but
have yet to get a response from Google. If any of the Google people
cc'ed (or really anybody, its not a strictly FB/Google thing) is
interested please email me directly and I'll send you the details, we
will be meeting face to face in the bay area at the end of January. Thanks,
Josef
^ permalink raw reply
* Re: [PATCH net 0/2] mlx4 driver fixes for 3.19-rc1
From: David Miller @ 2014-12-15 16:36 UTC (permalink / raw)
To: ogerlitz; +Cc: netdev, matanb, amirv, talal
In-Reply-To: <1418566685-9855-1-git-send-email-ogerlitz@mellanox.com>
From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun, 14 Dec 2014 16:18:03 +0200
> Just fixes for two small issues introduced in the 3.19 merge window
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH 0/9 net-next] rhashtable: Per bucket locks & deferred table resizing
From: Thomas Graf @ 2014-12-15 16:27 UTC (permalink / raw)
To: David Miller; +Cc: netdev, herbert, paulmck, edumazet, john.r.fastabend, josh
In-Reply-To: <20141215.111838.992055066540772910.davem@davemloft.net>
On 12/15/14 at 11:18am, David Miller wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Mon, 15 Dec 2014 13:00:50 +0000
>
> > Meant for "net-next". Apologies for the missing prefix.
>
> Which is not open for submission right now, please resubmit when it is
> re-openned.
Will do. I wanted to expose it as soon as it's done as Herbert is
working on rehashing and I wanted him and everyone else to be aware
of this.
^ permalink raw reply
* Re: [RFC PATCH net-next 1/1] net: Support for switch port configuration
From: Roopa Prabhu @ 2014-12-15 16:18 UTC (permalink / raw)
To: Varlese, Marco
Cc: Jiri Pirko, John Fastabend, netdev@vger.kernel.org,
stephen@networkplumber.org, Fastabend, John R, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B104AC533A@IRSMSX108.ger.corp.intel.com>
On 12/15/14, 1:39 AM, Varlese, Marco wrote:
>> -----Original Message-----
>> From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
>> Sent: Saturday, December 13, 2014 7:06 AM
>> To: Varlese, Marco
>> Cc: Jiri Pirko; John Fastabend; netdev@vger.kernel.org;
>> stephen@networkplumber.org; Fastabend, John R; sfeldma@gmail.com;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [RFC PATCH net-next 1/1] net: Support for switch port
>> configuration
>>
>> On 12/12/14, 1:19 AM, Varlese, Marco wrote:
>>>> -----Original Message-----
>>>> From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
>>>> Sent: Thursday, December 11, 2014 5:41 PM
>>>> To: Jiri Pirko
>>>> Cc: Varlese, Marco; John Fastabend; netdev@vger.kernel.org;
>>>> stephen@networkplumber.org; Fastabend, John R; sfeldma@gmail.com;
>>>> linux-kernel@vger.kernel.org
>>>> Subject: Re: [RFC PATCH net-next 1/1] net: Support for switch port
>>>> configuration
>>>>
>>>> On 12/11/14, 8:56 AM, Jiri Pirko wrote:
>>>>> Thu, Dec 11, 2014 at 05:37:46PM CET, roopa@cumulusnetworks.com
>> wrote:
>>>>>> On 12/11/14, 3:01 AM, Jiri Pirko wrote:
>>>>>>> Thu, Dec 11, 2014 at 10:59:42AM CET, marco.varlese@intel.com wrote:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: John Fastabend [mailto:john.fastabend@gmail.com]
>>>>>>>>> Sent: Wednesday, December 10, 2014 5:04 PM
>>>>>>>>> To: Jiri Pirko
>>>>>>>>> Cc: Varlese, Marco; netdev@vger.kernel.org;
>>>>>>>>> stephen@networkplumber.org; Fastabend, John R;
>>>>>>>>> roopa@cumulusnetworks.com; sfeldma@gmail.com; linux-
>>>>>>>>> kernel@vger.kernel.org
>>>>>>>>> Subject: Re: [RFC PATCH net-next 1/1] net: Support for switch
>>>>>>>>> port configuration
>>>>>>>>>
>>>>>>>>> On 12/10/2014 08:50 AM, Jiri Pirko wrote:
>>>>>>>>>> Wed, Dec 10, 2014 at 05:23:40PM CET, marco.varlese@intel.com
>>>> wrote:
>>>>>>>>>>> From: Marco Varlese <marco.varlese@intel.com>
>>>>>>>>>>>
>>>>>>>>>>> Switch hardware offers a list of attributes that are
>>>>>>>>>>> configurable on a per port basis.
>>>>>>>>>>> This patch provides a mechanism to configure switch ports by
>>>>>>>>>>> adding an NDO for setting specific values to specific attributes.
>>>>>>>>>>> There will be a separate patch that extends iproute2 to call
>>>>>>>>>>> the new NDO.
>>>>>>>>>> What are these attributes? Can you give some examples. I'm
>>>>>>>>>> asking because there is a plan to pass generic attributes to
>>>>>>>>>> switch ports replacing current specific
>>>>>>>>>> ndo_switch_port_stp_update. In this case, bridge is setting that
>> attribute.
>>>>>>>>>> Is there need to set something directly from userspace or does
>>>>>>>>>> it make rather sense to use involved bridge/ovs/bond ? I think
>>>>>>>>>> that both will be needed.
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> I think for many attributes it would be best to have both. The
>>>>>>>>> in kernel callers and netlink userspace can use the same driver
>> ndo_ops.
>>>>>>>>> But then we don't _require_ any specific bridge/ovs/etc module.
>>>>>>>>> And we may have some attributes that are not specific to any
>>>>>>>>> existing software module. I'm guessing Marco has some examples
>>>>>>>>> of
>>>> these.
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> John Fastabend Intel Corporation
>>>>>>>> We do have a need to configure the attributes directly from
>>>>>>>> user-space
>>>> and I have identified the tool to do that in iproute2.
>>>>>>>> An example of attributes are:
>>>>>>>> * enabling/disabling of learning of source addresses on a given
>>>>>>>> port (you can imagine the attribute called LEARNING for example);
>>>>>>>> * internal loopback control (i.e. LOOPBACK) which will control
>>>>>>>> how the flow of traffic behaves from the switch fabric towards an
>>>>>>>> egress port;
>>>>>>>> * flooding for broadcast/multicast/unicast type of packets (i.e.
>>>>>>>> BFLOODING, MFLOODING, UFLOODING);
>>>>>>>>
>>>>>>>> Some attributes would be of the type enabled/disabled while other
>>>>>>>> will
>>>> allow specific values to allow the user to configure different
>>>> behaviours of that feature on that particular port on that platform.
>>>>>>>> One thing to mention - as John stated as well - there might be
>>>>>>>> some
>>>> attributes that are not specific to any software module but rather
>>>> have to do with the actual hardware/platform to configure.
>>>>>>>> I hope this clarifies some points.
>>>>>>> It does. Makes sense. We need to expose this attr set/get for both
>>>>>>> in-kernel and userspace use cases.
>>>>>>>
>>>>>>> Please adjust you patch for this. Also, as a second patch, it
>>>>>>> would be great if you can convert ndo_switch_port_stp_update to
>>>>>>> this new
>>>> ndo.
>>>>>> Why are we exposing generic switch attribute get/set from userspace
>>>>>> ?. We already have specific attributes for learning/flooding which
>>>>>> can be extended further.
>>>>> Yes, but that is for PF_BRIDGE and bridge specific attributes. There
>>>>> might be another generic attrs, no?
>>>> I cant think of any. And plus, the whole point of switchdev l2
>>>> offloads was to map these to existing bridge attributes. And we
>>>> already have a match for some of the attributes that marco wants.
>>>>
>>>> If there is a need for such attributes, i don't see why it is needed
>>>> for switch devices only.
>>>> It is needed for any hw (nics etc). And, a precedence to this is to
>>>> do it via ethtool.
>>>>
>>>> Having said that, am sure we will find a need for this in the future.
>>>> And having a netlink attribute always helps.
>>>>
>>>> Today, it seems like these can be mapped to existing attributes that
>>>> are settable via ndo_bridge_setlink/getlink.
>>>>
>>>>>> And for in kernel api....i had a sample patch in my RFC series
>>>>>> (Which i was going to resubmit, until it was decided that we will
>>>>>> use existing api around
>>>>>> ndo_bridge_setlink/ndo_bridge_getlink):
>>>>>> http://www.spinics.net/lists/netdev/msg305473.html
>>>>> Yes, this might become handy for other generic non-bridge attrs.
>>>>>
>>>>>> Thanks,
>>>>>> Roopa
>>>>>>
>>>>>>
>>>>>>
>>> The list I provided is only a subset of the attributes we will need to be
>> exposed. I do have more and I'm sure that more will come in the future. As I
>> mentioned in few posts earlier, some attributes are generic and some are
>> not.
>>> I did not consider ethtool for few reasons but the main one is that I was
>> under the impression that netlink was preferred in many circumstances over
>> the ethotool_ops.
>> That is correct. I don't think anybody hinted that you should extend ethtool.
>>> Plus, all the cases I have identified so far are going to nicely fit into the
>> setlink set of operations.
>> Would be better if you submitted your iproute2 patch with this patch.
>>
>> I do plan to resubmit my generic ndo patch soon.
>>
>> Thanks,
>> Roopa
> I honestly do not understand what extra "help" the iproute2 would have brought to this RFC: that patch simply adds a new section for the iproute2 help and a new args parser for the input. From an infrastructure perspective is leveraging what netlink messages are using RTM_SETLINK hence hooking up eventually in the do_setlink(). Sure, obviously contains all the attributes I have in mind but from an infrastructure patch perspective I don't think that you would have gained much in seeing it.
correct. But you mentioned iproute2 changes in your patch comment. And
since i was not getting a clear understanding of what these attributes
were...from your current patch..., i thought your iproute2 patch might
shed some light on how you plan to handle these attributes.
>
> Anyway, good to know you're reworking you generic patch. I'll keep an eye out for your new NDO.
>
>
> Thanks,
> Marco
>
^ permalink raw reply
* Re: [PATCH 0/9 net-next] rhashtable: Per bucket locks & deferred table resizing
From: David Miller @ 2014-12-15 16:18 UTC (permalink / raw)
To: tgraf; +Cc: netdev, kernel, herbert, paulmck, edumazet, john.r.fastabend,
josh
In-Reply-To: <20141215130050.GA9628@casper.infradead.org>
From: Thomas Graf <tgraf@suug.ch>
Date: Mon, 15 Dec 2014 13:00:50 +0000
> Meant for "net-next". Apologies for the missing prefix.
Which is not open for submission right now, please resubmit when it is
re-openned.
^ permalink raw reply
* Re: [RFC PATCH net-next 0/5] tcp: TCP tracer
From: Blake Matheny @ 2014-12-15 16:08 UTC (permalink / raw)
To: Eric Dumazet, Alexei Starovoitov, Laurent Chavey, Yuchung Cheng
Cc: Martin Lau, netdev@vger.kernel.org, David S. Miller,
Hannes Frederic Sowa, Steven Rostedt, Lawrence Brakmo,
Josef Bacik, Kernel Team
In-Reply-To: <1418659395.9773.13.camel@edumazet-glaptop2.roam.corp.google.com>
We have an additional set of patches for web10g that builds on these
tracepoints. It can be made to work either way, but I agree the idea of
something like a sockopt would be really nice.
-Blake
On 12/15/14, 8:03 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>On Sun, 2014-12-14 at 22:55 -0800, Alexei Starovoitov wrote:
>
>> I think patches 1 and 3 are good additions, since they establish
>> few permanent points of instrumentation in tcp stack.
>> Patches 4-5 look more like use cases of tracepoints established
>> before. They may feel like simple additions and, no doubt,
>> they are useful, but since they expose things via tracing
>> infra they become part of api and cannot be changed later,
>> when more stats would be needed.
>> I think systemtap like scripting on top of patches 1 and 3
>> should solve your use case ?
>> Also, have you looked at recent eBPF work?
>> Though it's not completely ready yet, soon it should
>> be able to do the same stats collection as you have
>> in 4/5 without adding permanent pieces to the kernel.
>
>So it looks like web10g like interfaces are very often requested by
>various teams.
>
>And we have many different views on how to hack this. I am astonished by
>number of hacks I saw about this stuff going on.
>
>What about a clean way, extending current TCP_INFO, which is both
>available as a getsockopt() for socket owners and ss/iproute2
>information for 'external entities'
>
>If we consider web10g info needed, then adding a ftrace/eBPF like
>interface is simply yet another piece of code we need to maintain,
>and the argument of 'this should cost nothing if not activated' is
>nonsense since major players need to constantly monitor TCP metrics and
>behavior.
>
>It seems both FaceBook and Google are working on a subset of web10g.
>
>I suggest we meet together and establish a common ground, preferably
>after Christmas holidays.
>
>Thanks
>
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox