* [PATCH net-next v2 1/2] bridge: remove mode 'swdev'
From: roopa @ 2014-12-06 1:16 UTC (permalink / raw)
To: jiri, sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic
Cc: netdev, davem, shm, gospo, Roopa Prabhu
From: Roopa Prabhu <roopa@cumulusnetworks.com>
swdev mode was introduced to indicate switchdev offloads
for bridging from user space. But user can
use BRIDGE_FLAGS_SELF to directly call into the
hw switch port driver today. swdev mode is not required anymore.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
include/uapi/linux/if_bridge.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 296a556..da17e45 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -105,7 +105,6 @@ struct __fdb_entry {
#define BRIDGE_MODE_VEB 0 /* Default loopback mode */
#define BRIDGE_MODE_VEPA 1 /* 802.1Qbg defined VEPA mode */
-#define BRIDGE_MODE_SWDEV 2 /* Full switch device offload */
/* Bridge management nested attributes
* [IFLA_AF_SPEC] = {
--
1.7.10.4
^ permalink raw reply related
* [PATCH net-next v2 0/2] remove bridge BRIDGE_MODE_SWDEV
From: roopa @ 2014-12-06 1:16 UTC (permalink / raw)
To: jiri, sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic
Cc: netdev, davem, shm, gospo, Roopa Prabhu
From: Roopa Prabhu <roopa@cumulusnetworks.com>
Roopa Prabhu (2):
bridge: remove mode 'swdev'
rocker: remove swdev mode
drivers/net/ethernet/rocker/rocker.c | 18 +-----------------
include/uapi/linux/if_bridge.h | 1 -
2 files changed, 1 insertion(+), 18 deletions(-)
--
1.7.10.4
^ permalink raw reply
* Re: iproute2/nstat: Bug in displaying icmp stats
From: Eric Dumazet @ 2014-12-06 1:13 UTC (permalink / raw)
To: Vijay Subramanian; +Cc: netdev
In-Reply-To: <1417828247.15618.19.camel@edumazet-glaptop2.roam.corp.google.com>
On Fri, 2014-12-05 at 17:10 -0800, Eric Dumazet wrote:
> On Fri, 2014-12-05 at 15:35 -0800, Vijay Subramanian wrote:
> > Hi,
> >
> > I noticed nstat is displaying icmp stats incorrectly.
> >
> > $ cat /proc/net/snmp | grep Icmp | head -2 | awk '{print $1 " " $2 " "
> > $3 " " $4}'
> > Icmp: InMsgs InErrors InCsumErrors
> > Icmp: 215 0 0
> >
> > $ nstat -az | grep IcmpIn | head -3
> > IcmpInMsgs 0 0.0
> > IcmpInErrors 215 0.0
> > IcmpInCsumErrors 0 0.0
> >
> > For example, as seen in /proc/net/snmp, IcmpInMsgs should be 215 but
> > that value is assigned to IcmpInErrors.
> >
> > The issue seems to be the way the values are populated.
> >
> > $vim +209 misc/nstat.c
> > -----x----
> >
> > /* Trick to skip "dummy" trailing ICMP MIB in 2.4 */
> >
> > if (strcmp(idbuf, "IcmpOutAddrMaskReps") == 0)
> >
> > idbuf[5] = 0;
> >
> > else
> >
> > n = n->next;
> >
> > -----x------
> >
> > It seems "IcmpOutAddrMaskReps" is processed twice and values assigned
> > are off by one.
> >
> > Any idea what the code is doing for 2.4 kernel and how to fix this?
> >
> > vijay
>
> According to
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
>
> This was fixed in 2002 :
>
> commit b838c6d5c189d03d3db56c2774de451cf041a39f
> Author: Erik Schoenfelder <schoenfr@gaaertner.de>
> Date: Sun Sep 29 03:56:52 2002 -0700
>
> net/ipv4/proc.c: Dont print dummy member of icmp_mib.
>
> diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
> index 649620a..3f2dbcb 100644
> --- a/net/ipv4/proc.c
> +++ b/net/ipv4/proc.c
> @@ -128,7 +128,7 @@ int snmp_get_info(char *buffer, char **start, off_t offset, int length)
> len += sprintf (buffer + len,
> "\nIcmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps\n"
> "Icmp:");
> - for (i=0; i<offsetof(struct icmp_mib, __pad)/sizeof(unsigned long); i++)
> + for (i=0; i<offsetof(struct icmp_mib, dummy)/sizeof(unsigned long); i++)
> len += sprintf(buffer+len, " %lu", fold_field((unsigned long*)icmp_statistics, sizeof(struct icmp_mib), i));
>
> len += sprintf (buffer + len,
>
I guess we could count number of spaces/fields in both lines,
and disable the iproute2 trick if counts match.
^ permalink raw reply
* Re: iproute2/nstat: Bug in displaying icmp stats
From: Eric Dumazet @ 2014-12-06 1:10 UTC (permalink / raw)
To: Vijay Subramanian; +Cc: netdev
In-Reply-To: <CAGK4HS_ty2=f0PNU-w8QPP9BoP67E+4MaDxLTLV4G22dx_7A-Q@mail.gmail.com>
On Fri, 2014-12-05 at 15:35 -0800, Vijay Subramanian wrote:
> Hi,
>
> I noticed nstat is displaying icmp stats incorrectly.
>
> $ cat /proc/net/snmp | grep Icmp | head -2 | awk '{print $1 " " $2 " "
> $3 " " $4}'
> Icmp: InMsgs InErrors InCsumErrors
> Icmp: 215 0 0
>
> $ nstat -az | grep IcmpIn | head -3
> IcmpInMsgs 0 0.0
> IcmpInErrors 215 0.0
> IcmpInCsumErrors 0 0.0
>
> For example, as seen in /proc/net/snmp, IcmpInMsgs should be 215 but
> that value is assigned to IcmpInErrors.
>
> The issue seems to be the way the values are populated.
>
> $vim +209 misc/nstat.c
> -----x----
>
> /* Trick to skip "dummy" trailing ICMP MIB in 2.4 */
>
> if (strcmp(idbuf, "IcmpOutAddrMaskReps") == 0)
>
> idbuf[5] = 0;
>
> else
>
> n = n->next;
>
> -----x------
>
> It seems "IcmpOutAddrMaskReps" is processed twice and values assigned
> are off by one.
>
> Any idea what the code is doing for 2.4 kernel and how to fix this?
>
> vijay
According to
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
This was fixed in 2002 :
commit b838c6d5c189d03d3db56c2774de451cf041a39f
Author: Erik Schoenfelder <schoenfr@gaaertner.de>
Date: Sun Sep 29 03:56:52 2002 -0700
net/ipv4/proc.c: Dont print dummy member of icmp_mib.
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 649620a..3f2dbcb 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -128,7 +128,7 @@ int snmp_get_info(char *buffer, char **start, off_t offset, int length)
len += sprintf (buffer + len,
"\nIcmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps\n"
"Icmp:");
- for (i=0; i<offsetof(struct icmp_mib, __pad)/sizeof(unsigned long); i++)
+ for (i=0; i<offsetof(struct icmp_mib, dummy)/sizeof(unsigned long); i++)
len += sprintf(buffer+len, " %lu", fold_field((unsigned long*)icmp_statistics, sizeof(struct icmp_mib), i));
len += sprintf (buffer + len,
^ permalink raw reply related
* RE: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Arad, Ronen @ 2014-12-06 1:04 UTC (permalink / raw)
To: Arad, Ronen, Roopa Prabhu, Scott Feldman, Netdev
Cc: Jirí Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, David S. Miller, shm@cumulusnetworks.com,
Andy Gospodarek
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505D842BA@ORSMSX101.amr.corp.intel.com>
I have another case of propagation which is not covered by the proposed patch.
A recent patch introduced default_pvid attribute for a bridge (so far supported only via sysfs and not via netlink).
When a port joins a bridge, it inherits a PVID from the default_pvid of the bridge.
The bridge driver propagates that to the newly created net_bridge_port. This is done in br_vlan.c:
int nbp_vlan_init(struct net_bridge_port *p)
{
int rc = 0;
if (p->br->default_pvid) {
rc = nbp_vlan_add(p, p->br->default_pvid,
BRIDGE_VLAN_INFO_PVID |
BRIDGE_VLAN_INFO_UNTAGGED);
}
return rc;
}
When L2 switching is offloaded to the HW, this PVID setting need to be propagated. However, it does not come via ndo_bridge_setlink. The proposed propagation at br_setlink or an up level one at rtnetlink are not capable of handling this case.
One possible way for handling that is to replace the call to nbp_vlan_add with a call to a new function let's say
int br_propagate_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags)
This function will compose a netlink message with VLAN filtering information (i.e. AF_SPEC with VLAN_INFO) and call br_setlink - leveraging the offload support proposed by Roopa.
If this is an acceptable course of action, I could work on such patch.
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Arad, Ronen
> Sent: Friday, December 05, 2014 3:21 PM
> To: Roopa Prabhu; Scott Feldman; Netdev
> Cc: Jirí Pírko; Jamal Hadi Salim; Benjamin LaHaise; Thomas Graf; john
> fastabend; stephen@networkplumber.org; John Linville;
> nhorman@tuxdriver.com; Nicolas Dichtel; vyasevic@redhat.com; Florian
> Fainelli; buytenh@wantstofly.org; Aviad Raveh; David S. Miller;
> shm@cumulusnetworks.com; Andy Gospodarek
> Subject: RE: [PATCH 2/3] bridge: offload bridge port attributes to switch asic
> if feature flag set
>
>
>
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org] On Behalf Of Roopa Prabhu
> > Sent: Thursday, December 04, 2014 11:02 PM
> > To: Scott Feldman
> > Cc: Jiří Pírko; Jamal Hadi Salim; Benjamin LaHaise; Thomas Graf; john
> > fastabend; stephen@networkplumber.org; John Linville;
> > nhorman@tuxdriver.com; Nicolas Dichtel; vyasevic@redhat.com; Florian
> > Fainelli; buytenh@wantstofly.org; Aviad Raveh; Netdev; David S.
> > Miller; shm@cumulusnetworks.com; Andy Gospodarek
> > Subject: Re: [PATCH 2/3] bridge: offload bridge port attributes to
> > switch asic if feature flag set
> >
> > On 12/4/14, 10:41 PM, Scott Feldman wrote:
> > > On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
> > >> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> > >>
> > >> This allows offloading to switch asic without having the user to
> > >> set any flag. And this is done in the bridge driver to rollback
> > >> kernel settings on hw offload failure if required in the future.
> > >>
> > >> With this, it also makes sure a notification goes out only after
> > >> the attributes are set both in the kernel and hw.
> > > I like this approach as it streamlines the steps for the user in
> > > setting port flags. There is one case for FLOODING where you'll
> > > have to turn off flooding for both, and then turn on flooding in hw.
> > > You don't want flooding turned on on kernel and hw.
> > ok, maybe using the higher bits as in
> > https://patchwork.ozlabs.org/patch/413211/
> >
> > might help with that. Let me think some more.
> > >
> > >> ---
> > >> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
> > >> 1 file changed, 26 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> > >> index
> > >> 9f5eb55..ce173f0 100644
> > >> --- a/net/bridge/br_netlink.c
> > >> +++ b/net/bridge/br_netlink.c
> > >> @@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct
> > nlmsghdr *nlh)
> > >> afspec, RTM_SETLINK);
> > >> }
> > >>
> > >> + if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
> > >> + dev->netdev_ops->ndo_bridge_setlink) {
> > >> + int ret = dev->netdev_ops->ndo_bridge_setlink(dev,
> > >> + nlh);
> > > I think you want to up-level this to net/core/rtnetlink.c because
> > > you're only enabling the feature for one instance of a driver that
> > > implements ndo_bridge_setlink: the bridge driver. If another driver
> > > was MASTER and implemented ndo_bridge_setlink, you'd want same
> check
> > > to push setting down to SELF port driver.
> >
> > yeah, i thought about that. But i moved it here so that rollback would
> > be easier.
>
> There is a need for propagating setlink/dellink requests down multiple levels.
> The use-case I have in mind is a bridge at the top, team/bond in the middle,
> and port devices at the bottom.
> A setlink for VLAN filtering attributes would come with MASTER flag set, and
> either port or bond/team netdev.
> How would this be handled?
>
> The propagation rules between bridge and enslaved port device could be
> different from those between bond/team and enslaved devices.
> The current bridge driver does not propagate VLAN filtering from bridge to its
> ports as each port could have different configuration. In a case of a
> bond/team all members need to have the same configuration such that the a
> bond/team would be indistinguishable from a simple port.
>
> Therefore rtnetlink.c might not have the knowledge for propagation across
> multiple levels.
> It seems that each device which implements
> ndo_bridge_setlink/ndo_bridge_dellink and could have master role, need to
> take care of propagation to its slaves.
>
> > >
> > >> + if (ret && ret != -EOPNOTSUPP) {
> > >> + /* XXX Fix this in the future to rollback
> > >> + * kernel settings and return error
> > >> + */
> > > The future is now. Let's fix this now for the rollback case (again
> > > up in rtnetlink.c). So then a general question comes to mind: for
> > > these dual target sets, is it best to try HW first and then SW, or
> > > the other way around? Either way, on failure on second you need to
> > > rollback first. And, on failure, you need to know rollback value
> > > for first, so you have to do a getlink on first before attempting set.
> > yep, exactly, I went through the same thought process yesterday when i
> > was trying to implement rollback.
> > >
> > >> + br_warn(p->br, "error offloading bridge attributes "
> > >> + "on port %u(%s)\n", (unsigned int) p->port_no,
> > >> + p->dev->name);
> > >> + }
> > >> + }
> > >> +
> > >> if (err == 0)
> > >> br_ifinfo_notify(RTM_NEWLINK, p);
> > >> -
> > >> out:
> > >> return err;
> > >> }
> > >> @@ -433,6 +445,19 @@ int br_dellink(struct net_device *dev, struct
> > nlmsghdr *nlh)
> > >> err = br_afspec((struct net_bridge *)netdev_priv(dev), p,
> > >> afspec, RTM_DELLINK);
> > >>
> > >> + if (dev->features & NETIF_F_HW_SWITCH_OFFLOAD
> > >> + && dev->netdev_ops->ndo_bridge_setlink) {
> > >> + int ret = dev->netdev_ops->ndo_bridge_dellink(dev, nlh);
> > >> + if (ret && ret != -EOPNOTSUPP) {
> > >> + /* XXX Fix this in the future to rollback
> > >> + * kernel settings and return error
> > >> + */
> > >> + br_warn(p->br, "error offloading bridge attributes "
> > >> + "on port %u(%s)\n", (unsigned int) p->port_no,
> > >> + p->dev->name);
> > >> + }
> > >> + }
> > >> +
> > > Same comments as setlink above.
> > >
> > >> return err;
> > >> }
> > >> static int br_validate(struct nlattr *tb[], struct nlattr
> > >> *data[])
> > >> --
> > >> 1.7.10.4
> > >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org More majordomo
> info
> > at http://vger.kernel.org/majordomo-info.html
> \x04 {.n + +% lzwm b 맲 r zw u ^n r z \x1a h & \x1e G h \x03( 階 ݢj" \x1a ^[m z ޖ
> f h ~ m
^ permalink raw reply
* [PATCH iproute2] ss: Use nl_proto_a2n for filtering by netlink proto
From: Vadim Kochan @ 2014-12-06 0:52 UTC (permalink / raw)
To: netdev; +Cc: Vadim Kochan
Now it is posible to filter by existing Netlink protos:
ss -A netlink src uevent
ss -A netlink src nft
ss -A netlink src genl
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
misc/ss.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/misc/ss.c b/misc/ss.c
index a99294d..b9dbfd6 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1231,16 +1231,8 @@ void *parse_hostcond(char *addr)
}
if (addr[0] && strcmp(addr, "*")) {
a.addr.bitlen = 32;
- if (get_u32(a.addr.data, addr, 0)) {
- if (strcmp(addr, "rtnl") == 0)
- a.addr.data[0] = 0;
- else if (strcmp(addr, "fw") == 0)
- a.addr.data[0] = 3;
- else if (strcmp(addr, "tcpdiag") == 0)
- a.addr.data[0] = 4;
- else
- return NULL;
- }
+ if (nl_proto_a2n(&a.addr.data[0], addr) == -1)
+ return NULL;
}
goto out;
}
--
2.1.3
^ permalink raw reply related
* [PATCH v2 iproute] bridge link: add option 'self'
From: roopa @ 2014-12-06 0:59 UTC (permalink / raw)
To: jiri, sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic
Cc: netdev, davem, shm, gospo, Roopa Prabhu
From: Roopa Prabhu <roopa@cumulusnetworks.com>
Currently self is set internally only if hwmode is set.
This makes it necessary for the hw to have a mode.
There is no hwmode really required to go to hardware. So, introduce
self for anybody who wants to target hardware.
v1 -> v2
- fix a few bugs. Initialize flags to zero: this was required to
keep the current behaviour unchanged.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
bridge/link.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/bridge/link.c b/bridge/link.c
index 90d9e7f..b8b8675 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -261,7 +261,7 @@ static int brlink_modify(int argc, char **argv)
__s16 priority = -1;
__s8 state = -1;
__s16 mode = -1;
- __u16 flags = BRIDGE_FLAGS_MASTER;
+ __u16 flags = 0;
struct rtattr *nest;
memset(&req, 0, sizeof(req));
@@ -321,6 +321,8 @@ static int brlink_modify(int argc, char **argv)
"\"veb\".\n");
exit(-1);
}
+ } else if (strcmp(*argv, "self") == 0) {
+ flags = BRIDGE_FLAGS_SELF;
} else {
usage();
}
@@ -375,10 +377,11 @@ static int brlink_modify(int argc, char **argv)
* devices so far. Thus we only need to include the flags attribute
* if we are setting the hw mode.
*/
- if (mode >= 0) {
+ if (mode >= 0 || flags > 0) {
nest = addattr_nest(&req.n, sizeof(req), IFLA_AF_SPEC);
- addattr16(&req.n, sizeof(req), IFLA_BRIDGE_FLAGS, flags);
+ if (flags > 0)
+ addattr16(&req.n, sizeof(req), IFLA_BRIDGE_FLAGS, flags);
if (mode >= 0)
addattr16(&req.n, sizeof(req), IFLA_BRIDGE_MODE, mode);
--
1.7.10.4
^ permalink raw reply related
* [PATCH] dummy: add support for ethtool get_drvinfo
From: Flavio Leitner @ 2014-12-06 0:13 UTC (permalink / raw)
To: netdev; +Cc: Flavio Leitner
The command 'ethtool -i' is useful to find details
about the interface like the device driver being used.
This was missing for dummy driver.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
---
drivers/net/dummy.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index ff435fb..413ca4f 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -38,6 +38,9 @@
#include <net/rtnetlink.h>
#include <linux/u64_stats_sync.h>
+#define DRV_NAME "dummy"
+#define DRV_VERSION "1.0"
+
static int numdummies = 1;
/* fake multicast ability */
@@ -120,12 +123,24 @@ static const struct net_device_ops dummy_netdev_ops = {
.ndo_change_carrier = dummy_change_carrier,
};
+static void dummy_get_drvinfo(struct net_device *dev,
+ struct ethtool_drvinfo *info)
+{
+ strlcpy(info->driver, DRV_NAME, sizeof(info->driver));
+ strlcpy(info->version, DRV_VERSION, sizeof(info->version));
+}
+
+static const struct ethtool_ops dummy_ethtool_ops = {
+ .get_drvinfo = dummy_get_drvinfo,
+};
+
static void dummy_setup(struct net_device *dev)
{
ether_setup(dev);
/* Initialize the device structure. */
dev->netdev_ops = &dummy_netdev_ops;
+ dev->ethtool_ops = &dummy_ethtool_ops;
dev->destructor = free_netdev;
/* Fill in device structure with ethernet-generic values. */
@@ -150,7 +165,7 @@ static int dummy_validate(struct nlattr *tb[], struct nlattr *data[])
}
static struct rtnl_link_ops dummy_link_ops __read_mostly = {
- .kind = "dummy",
+ .kind = DRV_NAME,
.setup = dummy_setup,
.validate = dummy_validate,
};
@@ -209,4 +224,4 @@ static void __exit dummy_cleanup_module(void)
module_init(dummy_init_module);
module_exit(dummy_cleanup_module);
MODULE_LICENSE("GPL");
-MODULE_ALIAS_RTNL_LINK("dummy");
+MODULE_ALIAS_RTNL_LINK(DRV_NAME);
--
1.9.3
^ permalink raw reply related
* Re: KVM vs Xen-PV netperf numbers
From: Nick H @ 2014-12-05 23:36 UTC (permalink / raw)
To: Madhu Challa; +Cc: netdev
In-Reply-To: <CAN_zqfP-=8su7hdnd3HCjU++bWn_Qmom=5Zyf9-=iy_DJkAd2Q@mail.gmail.com>
Please do not top post..
Comments inline:
On Fri, Dec 5, 2014 at 12:20 PM, Madhu Challa <challa@noironetworks.com> wrote:
> Could you please attach your kvm command line. If you are running ubuntu you
> might also want to verify you have
>
> # To load the vhost_net module, which in some cases can speed up
> # network performance, set VHOST_NET_ENABLED to 1.
> VHOST_NET_ENABLED=1
>
> in /etc/default/qemu-kvm
sudo qemu-system-x86_64 -hda vdisk.img -smp 4 -netdev
type=tap,vhost=on,script=/usr/bin/qemu-ifup,id=net0 -device
virtio-net-pci,netdev=net0,mac=00:AD:44:44:CB:02 -m 8192
N
>
> Thanks.
>
> On Fri, Dec 5, 2014 at 12:02 PM, Nick H <nickkvm@gmail.com> wrote:
>>
>> Hello
>>
>> Not sure I have the right audience, I have two VM's on similar hosts.
>> One VM is KVM + virtio + vhost based while other is Xen PV with xen
>> netfront driver. Running a simple netperf test for 1400 byte packets
>> on both VM's, I see wider difference between throughput as follows.
>>
>> Xen-pv throughput comes to :
>>
>> ./netperf -H 10.xx.xx.49 -l 20 -t UDP_STREAM -- -m 1400
>> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
>> 10.xx.xx.49 (10.xx.xx.49) port 0 AF_INET : demo
>> Socket Message Elapsed Messages
>> Size Size Time Okay Errors Throughput
>> bytes bytes secs # # 10^6bits/sec
>>
>> 212992 1400 20.00 1910549 0 1069.89
>> 262144 20.00 1704789 954.66
>>
>> whereas KVM virtio number comes to :
>>
>> ./netperf -t UDP_STREAM -l 10 -H 10.xx.xx.49 -- -m 1400
>> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
>> 10.xx.xx.49 (10.xx.xx.49) port 0 AF_INET : demo
>> Socket Message Elapsed Messages
>> Size Size Time Okay Errors Throughput
>> bytes bytes secs # # 10^6bits/sec
>>
>> 212992 1400 10.00 155060 0 173.65
>> 262144 10.00 155060 173.65
>>
>> I built a custom kernel where I simply free up the (UDP only) skb in
>> virtio: xmit_skb() routine and I count how many skb's I have received.
>> Surprisingly it was not too high either:
>>
>> ./netperf -t UDP_STREAM -l 10 -H 10.xx.xx.49 -- -m 1400
>> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
>> 10.xx.xx.49 (10.xx.xx.49) port 0 AF_INET : demo
>> Socket Message Elapsed Messages
>> Size Size Time Okay Errors Throughput
>> bytes bytes secs # # 10^6bits/sec
>>
>> 212992 1400 10.00 224792 0 251.74
>> 262144 10.00 0 0.00
>>
>>
>> 1910549 packets pumped in xen pv driver versus 212992 packets pumped
>> in case of virtio driver. Assuming the data path inside the kernel is
>> same for both drivers , and I have eliminated virtio's
>> virtqueue_kick() call by freeing the packet ahead in my experiment,
>> can all this overhead attributed to system call overhead in case of
>> KVM+virtio combination ? Anything I am missing ?
>>
>> The KVM setup is based off:
>> Linux ubn-nested 3.17.0+ #16 SMP Thu Dec 4 12:00:09 PST 2014 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> Regards
>> N
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply
* iproute2/nstat: Bug in displaying icmp stats
From: Vijay Subramanian @ 2014-12-05 23:35 UTC (permalink / raw)
To: netdev
Hi,
I noticed nstat is displaying icmp stats incorrectly.
$ cat /proc/net/snmp | grep Icmp | head -2 | awk '{print $1 " " $2 " "
$3 " " $4}'
Icmp: InMsgs InErrors InCsumErrors
Icmp: 215 0 0
$ nstat -az | grep IcmpIn | head -3
IcmpInMsgs 0 0.0
IcmpInErrors 215 0.0
IcmpInCsumErrors 0 0.0
For example, as seen in /proc/net/snmp, IcmpInMsgs should be 215 but
that value is assigned to IcmpInErrors.
The issue seems to be the way the values are populated.
$vim +209 misc/nstat.c
-----x----
/* Trick to skip "dummy" trailing ICMP MIB in 2.4 */
if (strcmp(idbuf, "IcmpOutAddrMaskReps") == 0)
idbuf[5] = 0;
else
n = n->next;
-----x------
It seems "IcmpOutAddrMaskReps" is processed twice and values assigned
are off by one.
Any idea what the code is doing for 2.4 kernel and how to fix this?
vijay
^ permalink raw reply
* RE: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Arad, Ronen @ 2014-12-05 23:21 UTC (permalink / raw)
To: Roopa Prabhu, Scott Feldman, Netdev
Cc: Jirí Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, David S. Miller, shm@cumulusnetworks.com,
Andy Gospodarek
In-Reply-To: <54815883.80909@cumulusnetworks.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Roopa Prabhu
> Sent: Thursday, December 04, 2014 11:02 PM
> To: Scott Feldman
> Cc: Jiří Pírko; Jamal Hadi Salim; Benjamin LaHaise; Thomas Graf; john
> fastabend; stephen@networkplumber.org; John Linville;
> nhorman@tuxdriver.com; Nicolas Dichtel; vyasevic@redhat.com; Florian
> Fainelli; buytenh@wantstofly.org; Aviad Raveh; Netdev; David S. Miller;
> shm@cumulusnetworks.com; Andy Gospodarek
> Subject: Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic
> if feature flag set
>
> On 12/4/14, 10:41 PM, Scott Feldman wrote:
> > On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
> >> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> >>
> >> This allows offloading to switch asic without having the user to set
> >> any flag. And this is done in the bridge driver to rollback kernel
> >> settings on hw offload failure if required in the future.
> >>
> >> With this, it also makes sure a notification goes out only after the
> >> attributes are set both in the kernel and hw.
> > I like this approach as it streamlines the steps for the user in
> > setting port flags. There is one case for FLOODING where you'll have
> > to turn off flooding for both, and then turn on flooding in hw. You
> > don't want flooding turned on on kernel and hw.
> ok, maybe using the higher bits as in
> https://patchwork.ozlabs.org/patch/413211/
>
> might help with that. Let me think some more.
> >
> >> ---
> >> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
> >> 1 file changed, 26 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index
> >> 9f5eb55..ce173f0 100644
> >> --- a/net/bridge/br_netlink.c
> >> +++ b/net/bridge/br_netlink.c
> >> @@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct
> nlmsghdr *nlh)
> >> afspec, RTM_SETLINK);
> >> }
> >>
> >> + if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
> >> + dev->netdev_ops->ndo_bridge_setlink) {
> >> + int ret = dev->netdev_ops->ndo_bridge_setlink(dev,
> >> + nlh);
> > I think you want to up-level this to net/core/rtnetlink.c because
> > you're only enabling the feature for one instance of a driver that
> > implements ndo_bridge_setlink: the bridge driver. If another driver
> > was MASTER and implemented ndo_bridge_setlink, you'd want same check
> > to push setting down to SELF port driver.
>
> yeah, i thought about that. But i moved it here so that rollback would be
> easier.
There is a need for propagating setlink/dellink requests down multiple levels.
The use-case I have in mind is a bridge at the top, team/bond in the middle, and port devices at the bottom.
A setlink for VLAN filtering attributes would come with MASTER flag set, and either port or bond/team netdev.
How would this be handled?
The propagation rules between bridge and enslaved port device could be different from those between bond/team and enslaved devices.
The current bridge driver does not propagate VLAN filtering from bridge to its ports as each port could have different configuration. In a case of a bond/team all members need to have the same configuration such that the a bond/team would be indistinguishable from a simple port.
Therefore rtnetlink.c might not have the knowledge for propagation across multiple levels.
It seems that each device which implements ndo_bridge_setlink/ndo_bridge_dellink and could have master role, need to take care of propagation to its slaves.
> >
> >> + if (ret && ret != -EOPNOTSUPP) {
> >> + /* XXX Fix this in the future to rollback
> >> + * kernel settings and return error
> >> + */
> > The future is now. Let's fix this now for the rollback case (again up
> > in rtnetlink.c). So then a general question comes to mind: for these
> > dual target sets, is it best to try HW first and then SW, or the other
> > way around? Either way, on failure on second you need to rollback
> > first. And, on failure, you need to know rollback value for first, so
> > you have to do a getlink on first before attempting set.
> yep, exactly, I went through the same thought process yesterday when i was
> trying to implement rollback.
> >
> >> + br_warn(p->br, "error offloading bridge attributes "
> >> + "on port %u(%s)\n", (unsigned int) p->port_no,
> >> + p->dev->name);
> >> + }
> >> + }
> >> +
> >> if (err == 0)
> >> br_ifinfo_notify(RTM_NEWLINK, p);
> >> -
> >> out:
> >> return err;
> >> }
> >> @@ -433,6 +445,19 @@ int br_dellink(struct net_device *dev, struct
> nlmsghdr *nlh)
> >> err = br_afspec((struct net_bridge *)netdev_priv(dev), p,
> >> afspec, RTM_DELLINK);
> >>
> >> + if (dev->features & NETIF_F_HW_SWITCH_OFFLOAD
> >> + && dev->netdev_ops->ndo_bridge_setlink) {
> >> + int ret = dev->netdev_ops->ndo_bridge_dellink(dev, nlh);
> >> + if (ret && ret != -EOPNOTSUPP) {
> >> + /* XXX Fix this in the future to rollback
> >> + * kernel settings and return error
> >> + */
> >> + br_warn(p->br, "error offloading bridge attributes "
> >> + "on port %u(%s)\n", (unsigned int) p->port_no,
> >> + p->dev->name);
> >> + }
> >> + }
> >> +
> > Same comments as setlink above.
> >
> >> return err;
> >> }
> >> static int br_validate(struct nlattr *tb[], struct nlattr *data[])
> >> --
> >> 1.7.10.4
> >>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 3.14 64/73] net/ping: handle protocol mismatching scenario
From: Greg Kroah-Hartman @ 2014-12-05 22:45 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, David S. Miller, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
Jane Zhou, Yiwei Zhao
In-Reply-To: <20141205224433.921659956@linuxfoundation.org>
3.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jane Zhou <a17711@motorola.com>
commit 91a0b603469069cdcce4d572b7525ffc9fd352a6 upstream.
ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jane Zhou <a17711@motorola.com>
Signed-off-by: Yiwei Zhao <gbjc64@motorola.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/ping.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -217,6 +217,8 @@ static struct sock *ping_lookup(struct n
&ipv6_hdr(skb)->daddr))
continue;
#endif
+ } else {
+ continue;
}
if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif)
^ permalink raw reply
* [PATCH 3.17 100/122] net/ping: handle protocol mismatching scenario
From: Greg Kroah-Hartman @ 2014-12-05 22:44 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, David S. Miller, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev,
Jane Zhou, Yiwei Zhao
In-Reply-To: <20141205223305.514276242@linuxfoundation.org>
3.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jane Zhou <a17711@motorola.com>
commit 91a0b603469069cdcce4d572b7525ffc9fd352a6 upstream.
ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Jane Zhou <a17711@motorola.com>
Signed-off-by: Yiwei Zhao <gbjc64@motorola.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/ping.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -217,6 +217,8 @@ static struct sock *ping_lookup(struct n
&ipv6_hdr(skb)->daddr))
continue;
#endif
+ } else {
+ continue;
}
if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif)
^ permalink raw reply
* Re: [PATCH 1/3] netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature flag for switch device offloads
From: Thomas Graf @ 2014-12-05 22:43 UTC (permalink / raw)
To: roopa
Cc: jiri, sfeldma, jhs, bcrl, john.fastabend, stephen, linville,
nhorman, nicolas.dichtel, vyasevic, f.fainelli, buytenh, aviadr,
netdev, davem, shm, gospo
In-Reply-To: <1417746401-8140-2-git-send-email-roopa@cumulusnetworks.com>
On 12/04/14 at 06:26pm, roopa@cumulusnetworks.com wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This is a generic high level feature flag for all switch asic features today.
>
> switch drivers set this flag on switch ports. Logical devices like
> bridge, bonds, vxlans can inherit this flag from their slaves/ports.
>
> I had to use SWITCH in the name to avoid ambiguity with other feature
> flags. But, since i have been harping about not calling it 'switch',
> I am welcome to any suggestions :)
>
> An alternative to using a feature flag is to use a IFF_HW_OFFLOAD
> in net_device_flags.
What does this flag indicate specifically? What driver would
implement ndo_bridge_setlink() but not set this flag?
I think it should be clearly documented when this flag is to bet set.
^ permalink raw reply
* Re: [PATCH v2 1/6] net-PPP: Replacement of a printk() call by pr_warn() in mppe_rekey()
From: terry white @ 2014-12-05 22:35 UTC (permalink / raw)
To: joe; +Cc: linux-ppp, netdev, linux-kernel, kernel-janitors
In-Reply-To: <1417766255.2721.43.camel@perches.com>
... ciao:
: on "12-4-2014" "Joe Perches" writ:
: > Does it make sense to express such implementation details in the Linux
: > coding style documentation more explicitly (besides the fact that this
: > update suggestion was also triggered by a warning from the script
: > "checkpatch.pl".
:
: Probably not.
:
: Overly formalized coding style rules are perhaps
: more of a barrier to entry than most want.
funny you should mention that. as nothing more than a casual observer,
i'm noticing a "TIRED" sensation reading this thread. i have "0"
confidence a "SERIOUS" participant's enthusiasm would remain untested.
however, the "checkpatch.pl" warning suggests an assumed 'custom'. i
can't tell if this a 'serious' issue, or "pickin' fly shit out of pepper".
but from my reading of it, the "CODE" , and the "logic" driving it, is
not the problem.
season's best ...
--
... it's not what you see ,
but in stead , notice ...
^ permalink raw reply
* Re: [PATCHv2 net] i40e: Implement ndo_gso_check()
From: Jesse Gross @ 2014-12-05 21:52 UTC (permalink / raw)
To: Tom Herbert
Cc: Joe Stringer, netdev, Shannon Nelson, Brandeburg, Jesse,
Jeff Kirsher, linux.nics, Linux Kernel Mailing List
In-Reply-To: <CAEP_g=9Yg-pZf9-Wb4qrZhAMSB=edqDxBXSRskWCturt-nnxTg@mail.gmail.com>
On Tue, Dec 2, 2014 at 10:26 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Dec 1, 2014 at 4:09 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Dec 1, 2014 at 3:53 PM, Jesse Gross <jesse@nicira.com> wrote:
>>> On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert <therbert@google.com> wrote:
>>>> On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer <joestringer@nicira.com> wrote:
>>>>> On 21 November 2014 at 09:59, Joe Stringer <joestringer@nicira.com> wrote:
>>>>>> On 20 November 2014 16:19, Jesse Gross <jesse@nicira.com> wrote:
>>>>>>> I don't know if we need to have the check at all for IPIP though -
>>>>>>> after all the driver doesn't expose support for it all (actually it
>>>>>>> doesn't expose GRE either). This raises kind of an interesting
>>>>>>> question about the checks though - it's pretty easy to add support to
>>>>>>> the driver for a new GSO type (and I imagine that people will be
>>>>>>> adding GRE soon) and forget to update the check.
>>>>>>
>>>>>> If the check is more conservative, then testing would show that it's
>>>>>> not working and lead people to figure out why (and update the check).
>>>>>
>>>>> More concretely, one suggestion would be something like following at
>>>>> the start of each gso_check():
>>>>>
>>>>> + const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | SKB_GSO_FCOE |
>>>>> + SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
>>>>> +
>>>>> + if (skb_shinfo(skb)->gso_type & ~supported)
>>>>> + return false;
>>>>
>>>> This should already be handled by net_gso_ok.
>>>
>>> My original point wasn't so much that this isn't handled at the moment
>>> but that it's easy to add a supported GSO type but then forget to
>>> update this check - i.e. if a driver already supports UDP_TUNNEL and
>>> adds support for GRE with the same constraints. It seems not entirely
>>> ideal that this function is acting as a blacklist rather than a
>>> whitelist.
>>
>> Agreed, it would be nice to have all the checking logic in one place.
>> If all the drivers end up implementing ndo_gso_check then we could
>> potentially get rid of the GSO types as features. This probably
>> wouldn't be a bad thing since we already know that the features
>> mechanism doesn't scale (for instance there's no way to indicate that
>> certain combinations of GSO types are supported by a device).
>
> This crossed my mind and I agree that it's pretty clear that the
> features mechanism isn't scaling very well. Presumably, the logical
> extension of this is that each driver would have a function that looks
> at a packet and returns a set of offload operations that it can
> support rather than exposing a set of protocols. However, it seems
> like it would probably result in a bunch of duplicate code in each
> driver.
I think a possible middleground here is to convert ndo_gso_check() to
ndo_features_check(). This would behave similarly to
netif_skb_features() and give drivers an opportunity to knock out
features for a given packet. This would allow us to avoid duplicate
code in the immediate case of tunnels where we need to handle both GSO
and checksums and potentially enable wider usage in the future if it
makes sense.
^ permalink raw reply
* Re: 3.12.33 - BUG xfrm_selector_match+0x25/0x2f6
From: Julian Anastasov @ 2014-12-05 21:32 UTC (permalink / raw)
To: Smart Weblications GmbH - Florian Wiessner
Cc: Steffen Klassert, netdev, LKML, stable, Simon Horman, lvs-devel
In-Reply-To: <5481B944.2000002@smart-weblications.de>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2826 bytes --]
Hello,
On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> thank you for the fast responses! I would like to test any patch for 3.12.
I hope I'll have time this weekend...
> If i understand correctly, i set:
>
> echo 0 > /proc/sys/net/ipv4/vs/snat_reroute
The flag works per-packet, no need to reload any modules.
But it does not help for the case with local client where
the problem with sockets occurs, that is why you can keep
ip_vs_route_me_harder() empty (return 0) until patch is
created.
> modprobe ip_vs_ftp
>
> and reenable ftp ipvs?
>
> It does not crash, but ftp is not working with neither PASV nor PORT:
>
>
> [14:47:42] [R] Verbindung herstellen zu 192.168.10.62 -> IP=192.168.10.62 PORT=21
> [14:47:42] [R] Verbunden mit 192.168.10.62
> [14:47:43] [R] 220 (vsFTPd 3.0.2)
> [14:47:43] [R] USER (hidden)
> [14:47:43] [R] 331 Please specify the password.
> [14:47:43] [R] PASS (hidden)
> [14:47:43] [R] 230 Login successful.
> [14:47:43] [R] SYST
> [14:47:43] [R] 215 UNIX Type: L8
> [14:47:43] [R] FEAT
> [14:47:43] [R] 211-Features:
> [14:47:43] [R] EPRT
> [14:47:43] [R] EPSV
> [14:47:43] [R] MDTM
> [14:47:43] [R] PASV
> [14:47:43] [R] REST STREAM
> [14:47:43] [R] SIZE
> [14:47:43] [R] TVFS
> [14:47:43] [R] UTF8
> [14:47:43] [R] 211 End
> [14:47:43] [R] PWD
> [14:47:43] [R] 257 "/"
> [14:47:43] [R] CWD /
> [14:47:43] [R] 250 Directory successfully changed.
> [14:47:43] [R] PWD
> [14:47:43] [R] 257 "/"
> [14:47:43] [R] TYPE A
> [14:47:43] [R] 200 Switching to ASCII mode.
> [14:47:43] [R] PASV
> [14:47:43] [R] 227 Entering Passive Mode (10,10,1,23,251,6).
> [14:47:43] [R] Datenkanal-IP öffnen: 192.168.10.62 PORT: 64262
> [14:47:44] [R] Datensocket-Fehler: Verbindung abgewiesen
> [14:47:44] [R] List Fehler
> [14:47:44] [R] PASV
> [14:47:44] [R] 227 Entering Passive Mode (10,10,1,23,250,144).
> [14:47:44] [R] Datenkanal-IP öffnen: 192.168.10.62 PORT: 64144
> [14:47:45] [R] Datensocket-Fehler: Verbindung abgewiesen
> [14:47:45] [R] List Fehler
> [14:47:45] [R] PASV-Modus fehlgeschlagen, PORT -Modus versuchen...
> [14:47:45] [R] Auf PORT: 62505 warten, Verbindung erwarten.
> [14:47:45] [R] PORT 192,168,200,13,244,41
> [14:47:45] [R] 500 Illegal PORT command.
Who is 192.168.200.13? From vsftpd-3.0.2/postlogin.c,
handle_port():
/* SECURITY:
* 1) Reject requests not connecting to the control socket IP
* 2) Reject connects to privileged ports
*/
It looks like PORT command provides different IP.
IIRC, IPVS does not mangle PORT command, vsftpd expects to
connect to the same client IP. There is config option you can
try to set (port_promiscuous), only while testing.
> [14:47:45] [R] List Fehler
> [14:48:14] [R] QUIT
> [14:48:14] [R] 221 Goodbye.
> [14:48:14] [R] Ausgeloggt: 192.168.10.62
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH net-next 0/4] net: allow setting congctl via routing table
From: Hannes Frederic Sowa @ 2014-12-05 21:03 UTC (permalink / raw)
To: Dave Taht
Cc: Daniel Borkmann, davem@davemloft.net, Florian Westphal,
netdev@vger.kernel.org
In-Reply-To: <CAA93jw5SCx32T1Z-cSeAyHFWiQsW9gp20Qoq3v=w_RO7sq=VoA@mail.gmail.com>
Hi Dave,
On Fr, 2014-12-05 at 11:05 -0800, Dave Taht wrote:
> On Fri, Dec 5, 2014 at 10:35 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Fr, 2014-12-05 at 08:35 -0800, Dave Taht wrote:
> >> On Fri, Dec 5, 2014 at 7:24 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> >> > This is the second part of our work and allows for setting the congestion
> >> > control algorithm via routing table. For details, please see individual
> >> > patches.
> >> >
> >> > Joint work with Florian Westphal, suggested by Hannes Frederic Sowa.
> >> >
> >> > Thanks!
> >> >
> >> > Daniel Borkmann (4):
> >> > net: tcp: refactor reinitialization of congestion control
> >> > net: tcp: add key management to congestion control
> >> > net: tcp: add RTAX_CC_ALGO fib handling
> >> > net: tcp: add per route congestion control
> >>
> >>
> >> Very interesting. Have you tried something other than dctcp here
> >> (e.g. westwood or lp?)
> >>
> >> Have you considered the case where the route changes underneath
> >> you from one device to another?
> >
> > Notice, there is no way the state of a tcp congestion control algorithm
> > can be converted to be used by a different one, so this would only
> > affect new tcp connections via this interface.
>
> You are missing the point. If the route changes from a path that
> is DCTCP capable to one that is not, (say you fail over to a backup link)
I don't think that today's datacenter are designed that the backup path
has less performance than the primary link (different AQM settings). It
is much more important e.g. to allow the connections to a e.g. database
server selecting dctcp as CC and having all connections going to the
internet using some "ordinary" tcp congestion algorithm.
> and flows persist, bad things will happen. DCTCP, in particular, depends
> upon a very specific AQM configuration on all the hops in the path, without that
> it can be very aggressive.
That's for sure.
> I do think it is feasible to convert from at least some of the
> core state from one tcp congestion control algorithm to another.
Hmm, I haven't looked if that is possible. It might be.
> >> Example, here I am routing everything through eth0, where I
> >> would want cubic, probably...
> >>
> >> root@ganesha:~/git/tinc# ip route
> >> default via 172.26.16.1 dev eth0 proto babel onlink
> >> 69.181.216.0/22 via 172.26.16.1 dev eth0 proto babel onlink
> >> 169.254.0.0/16 dev eth0 scope link metric 1000
> >> 172.26.16.0/24 dev eth0 proto kernel scope link src 172.26.16.177
> >> 172.26.16.1 via 172.26.16.1 dev eth0 proto babel onlink
> >> 172.26.16.112 via 172.26.16.112 dev eth0 proto babel onlink
> >> 172.26.17.0/24 via 172.26.16.1 dev eth0 proto babel onlink
> >> 172.26.17.3 via 172.26.16.1 dev eth0 proto babel onlink
> >> 172.26.17.227 via 172.26.16.1 dev eth0 proto babel onlink
> >> 192.168.7.0/30 dev eth1 proto kernel scope link src 192.168.7.1 metric 1
> >> 192.168.7.2 via 172.26.16.112 dev eth0 proto babel onlink
> >>
> >> And I pull the plug, and everything flips over to wlan0,
> >> where I might want westwood (or something saner than
> >> that. It might be nice to have a per-device cc default
> >> algorithm...)
> >
> > Something like that might be possible with metrics and "via ... dev if0
> > metric xxx" routes, which will be cleaned up as soon as the interface
> > goes down and the fallback will be to a route with a different
> > congestion algorithm.
>
> mmm... I do dynamic routing via various routing protocols, which
> generally don't bother with inserting more than one metric.
I totally understand, they might even remove the routes and re-add them,
thus losing the tcp cc property.
> While we are thinking through this, what happens with tunnels?
Tunnels should behave just like ordinary interfaces, but depending how
they get routed it might make problems regarding DCTCP.
> This route in my network switches between interfaces and routes
> depending on which is best.
>
> fde5:dfb9:df90:fff0::/64 dev vpn6 proto kernel metric 256
> fde5:dfb9:df90:fff0::/60 via fde5:dfb9:df90:fff0::1 dev vpn6 metric 1024
>
>
> >> root@ganesha:~/git/tinc# ip route
> >> default via 172.26.17.224 dev wlan0 proto babel onlink
> >> 69.181.216.0/22 via 172.26.17.224 dev wlan0 proto babel onlink
> >> 169.254.0.0/16 dev eth0 scope link metric 1000
> >> 172.26.16.0/24 dev eth0 proto kernel scope link src 172.26.16.177
> >> 172.26.16.1 via 172.26.17.227 dev wlan0 proto babel onlink
> >> 172.26.16.112 via 172.26.17.227 dev wlan0 proto babel onlink
> >> 172.26.17.0/24 via 172.26.17.224 dev wlan0 proto babel onlink
> >> 172.26.17.3 via 172.26.17.227 dev wlan0 proto babel onlink
> >> 172.26.17.227 via 172.26.17.227 dev wlan0 proto babel onlink
> >> 192.168.7.0/30 dev eth1 proto kernel scope link src 192.168.7.1 metric 1
> >> 192.168.7.2 via 172.26.17.227 dev wlan0 proto babel onlink
Please note, that is is an end-node only feature. Normally, routers
don't do heavy tcp processing, thus using this feature on a router
wasn't considered by us. That's the same problematic like e.g.
tcp_quick_ack.
As soon as you have control over the application and it allows you to
bind to an interface via SO_BINDTODEVICE, you are able to select the
congestion control algorithm by using ip rule oif matching. But the
application could also chose the CC also by itself by using
'TCP_CONGESTION' setsockopt on a per-socket basis if you have source
access.
Bye,
Hannes
^ permalink raw reply
* Re: [PATCH v2 5/6] net-PPP: Delete an unnecessary assignment in mppe_alloc()
From: SF Markus Elfring @ 2014-12-05 21:00 UTC (permalink / raw)
To: Dan Carpenter
Cc: Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev, Eric Dumazet,
LKML, kernel-janitors, Julia Lawall
In-Reply-To: <20141205135723.GE4912@mwanda>
>> That is true.
>
> In that case, I misunderstood what you wrote.
I find it a bit interesting how this misunderstanding could happen here somehow.
> Looking at it now, this patch is actually ok.
Does that mean that you would like to add any tag like "Acked-by" or "Reviewed-by"
to any of the proposed six update steps?
Regards,
Markus
^ permalink raw reply
* Possible Bug Found with Build Warning?
From: nick @ 2014-12-05 20:55 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: linux.nics, e1000-devel, bruce.w.allan, jesse.brandeburg,
linux-kernel, john.ronciak, netdev
Greetings Jeff and the other developers at Intel,
I seem to be getting this error message,
drivers/net/ethernet/intel/i40e/i40e_debugfs.c: In function ‘i40e_dbg_dump_desc’:
drivers/net/ethernet/intel/i40e/i40e_debugfs.c:855:1: warning: the frame size of 8192 bytes is larger than 2048 bytes [-Wframe-larger-than=]
}
^
when building the kernel for the last few weeks for my other patches. As you can tell it states issues with the frame size being larger then 2048 bytes. I am wondering is this a issue as the struct is allocated only 2048 bytes of system memory per frame or is this just a mistake on me reading the warning.
Regards Nick
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply
* KVM vs Xen-PV netperf numbers
From: Nick H @ 2014-12-05 20:02 UTC (permalink / raw)
To: netdev
Hello
Not sure I have the right audience, I have two VM's on similar hosts.
One VM is KVM + virtio + vhost based while other is Xen PV with xen
netfront driver. Running a simple netperf test for 1400 byte packets
on both VM's, I see wider difference between throughput as follows.
Xen-pv throughput comes to :
./netperf -H 10.xx.xx.49 -l 20 -t UDP_STREAM -- -m 1400
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.xx.xx.49 (10.xx.xx.49) port 0 AF_INET : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 1400 20.00 1910549 0 1069.89
262144 20.00 1704789 954.66
whereas KVM virtio number comes to :
./netperf -t UDP_STREAM -l 10 -H 10.xx.xx.49 -- -m 1400
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.xx.xx.49 (10.xx.xx.49) port 0 AF_INET : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 1400 10.00 155060 0 173.65
262144 10.00 155060 173.65
I built a custom kernel where I simply free up the (UDP only) skb in
virtio: xmit_skb() routine and I count how many skb's I have received.
Surprisingly it was not too high either:
./netperf -t UDP_STREAM -l 10 -H 10.xx.xx.49 -- -m 1400
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.xx.xx.49 (10.xx.xx.49) port 0 AF_INET : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 1400 10.00 224792 0 251.74
262144 10.00 0 0.00
1910549 packets pumped in xen pv driver versus 212992 packets pumped
in case of virtio driver. Assuming the data path inside the kernel is
same for both drivers , and I have eliminated virtio's
virtqueue_kick() call by freeing the packet ahead in my experiment,
can all this overhead attributed to system call overhead in case of
KVM+virtio combination ? Anything I am missing ?
The KVM setup is based off:
Linux ubn-nested 3.17.0+ #16 SMP Thu Dec 4 12:00:09 PST 2014 x86_64
x86_64 x86_64 GNU/Linux
Regards
N
^ permalink raw reply
* [PATCH net] bnx2x: Implement ndo_gso_check()
From: Joe Stringer @ 2014-12-05 19:35 UTC (permalink / raw)
To: netdev; +Cc: ariel.elior, jesse, therbert, linux-kernel
Use vxlan_gso_check() to advertise offload support for this NIC.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 74fbf9e..893cdb6 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -45,6 +45,7 @@
#include <net/ip.h>
#include <net/ipv6.h>
#include <net/tcp.h>
+#include <net/vxlan.h>
#include <net/checksum.h>
#include <net/ip6_checksum.h>
#include <linux/workqueue.h>
@@ -12550,6 +12551,11 @@ static int bnx2x_get_phys_port_id(struct net_device *netdev,
return 0;
}
+static bool bnx2x_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+ return vxlan_gso_check(skb);
+}
+
static const struct net_device_ops bnx2x_netdev_ops = {
.ndo_open = bnx2x_open,
.ndo_stop = bnx2x_close,
@@ -12581,6 +12587,7 @@ static const struct net_device_ops bnx2x_netdev_ops = {
#endif
.ndo_get_phys_port_id = bnx2x_get_phys_port_id,
.ndo_set_vf_link_state = bnx2x_set_vf_link_state,
+ .ndo_gso_check = bnx2x_gso_check,
};
static int bnx2x_set_coherency_mask(struct bnx2x *bp)
--
1.7.10.4
^ permalink raw reply related
* Re: [linux-nics] [PATCHv4 net] i40e: Implement ndo_gso_check()
From: Jeff Kirsher @ 2014-12-05 19:12 UTC (permalink / raw)
To: Joe Stringer; +Cc: netdev, linux.nics, jesse, linux-kernel, therbert
In-Reply-To: <1417804872-58635-1-git-send-email-joestringer@nicira.com>
[-- Attachment #1: Type: text/plain, Size: 852 bytes --]
On Fri, 2014-12-05 at 10:41 -0800, Joe Stringer wrote:
> ndo_gso_check() was recently introduced to allow NICs to report the
> offloading support that they have on a per-skb basis. Add an
> implementation for this driver which checks for IPIP, GRE, UDP
> tunnels.
>
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> ---
> v4: Simplify the check to just do tunnel header length.
> Fix #define style issue.
> v3: Drop IPIP and GRE (no driver support even though hw supports it).
> Check for UDP outer protocol for UDP tunnels.
> v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
> Add MAX_INNER_LENGTH (as 80).
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
Thanks Joe, I will update the patch in my queue with your latest
version.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH 1/3] netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature flag for switch device offloads
From: Roopa Prabhu @ 2014-12-05 19:07 UTC (permalink / raw)
To: Scott Feldman
Cc: Jiri Pirko, Jamal Hadi Salim, Benjamin LaHaise, Thomas Graf,
john fastabend, stephen@networkplumber.org, John Linville,
nhorman@tuxdriver.com, Nicolas Dichtel, vyasevic@redhat.com,
Florian Fainelli, buytenh@wantstofly.org, Aviad Raveh, Netdev,
David S. Miller, shm, Andy Gospodarek
In-Reply-To: <CAE4R7bBzQQeQAHof=gObugmOGdGE-e5GRh=6V2+KJbctity9pw@mail.gmail.com>
On 12/5/14, 10:53 AM, Scott Feldman wrote:
> On Fri, Dec 5, 2014 at 6:16 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>> On 12/4/14, 11:41 PM, Jiri Pirko wrote:
>>> Fri, Dec 05, 2014 at 03:26:39AM CET, roopa@cumulusnetworks.com wrote:
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> This is a generic high level feature flag for all switch asic features
>>>> today.
>>>>
>>>> switch drivers set this flag on switch ports. Logical devices like
>>>> bridge, bonds, vxlans can inherit this flag from their slaves/ports.
>>>
>>> Can you please elaborate on how exactly would this inheritance look
>>> like?
>>
>> My thought there was, when a port with the hw offload flag is added to the
>> bridge, the same flag gets set on the bridge. And, for any bridge attributes
>> (not port attributes), this flag on the bridge can be used to offload those
>> bridge attributes.
>> bridge attribute examples: IFLA_BR_FORWARD_DELAY, IFLA_BR_HELLO_TIME,
>> IFLA_BR_MAX_AGE.
> Ah, wait, why do those need to be pushed down to driver/HW? Letting
> the bridge (or external process like mstpd) own the ctrl-plane means
> HW isn't running STP machine or aging out FDB entries. Let Linux take
> care of that.
>
> Same goes for bonding, since that was mentioned earlier. Keep LACP
> ctrl processing in the kernel/bonding driver, and there is no need to
> push bonding settings down to port driver/hw. driver/hw just need to
> know port membership and LACP status.
scott, agreed these are stp attributes. Wont go into hardware. I picked
the wrong example.
But, I was just trying to point out that there maybe bridge attributes
that need to be passed to hw.
Like the bridge ageing timer etc. We don't use it today. But others may
in the future.
I was just trying to say that ...this maybe useful in the future. The
current in-kernel implementation
nor my patches do anything for this. We can add it in the future if need
be.
^ permalink raw reply
* Re: [PATCH net-next 0/4] net: allow setting congctl via routing table
From: Dave Taht @ 2014-12-05 19:05 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Daniel Borkmann, davem@davemloft.net, Florian Westphal,
netdev@vger.kernel.org
In-Reply-To: <1417804540.2462.4.camel@localhost>
On Fri, Dec 5, 2014 at 10:35 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Fr, 2014-12-05 at 08:35 -0800, Dave Taht wrote:
>> On Fri, Dec 5, 2014 at 7:24 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> > This is the second part of our work and allows for setting the congestion
>> > control algorithm via routing table. For details, please see individual
>> > patches.
>> >
>> > Joint work with Florian Westphal, suggested by Hannes Frederic Sowa.
>> >
>> > Thanks!
>> >
>> > Daniel Borkmann (4):
>> > net: tcp: refactor reinitialization of congestion control
>> > net: tcp: add key management to congestion control
>> > net: tcp: add RTAX_CC_ALGO fib handling
>> > net: tcp: add per route congestion control
>>
>>
>> Very interesting. Have you tried something other than dctcp here
>> (e.g. westwood or lp?)
>>
>> Have you considered the case where the route changes underneath
>> you from one device to another?
>
> Notice, there is no way the state of a tcp congestion control algorithm
> can be converted to be used by a different one, so this would only
> affect new tcp connections via this interface.
You are missing the point. If the route changes from a path that
is DCTCP capable to one that is not, (say you fail over to a backup link)
and flows persist, bad things will happen. DCTCP, in particular, depends
upon a very specific AQM configuration on all the hops in the path, without that
it can be very aggressive.
I do think it is feasible to convert from at least some of the
core state from one tcp congestion control algorithm to another.
>> Example, here I am routing everything through eth0, where I
>> would want cubic, probably...
>>
>> root@ganesha:~/git/tinc# ip route
>> default via 172.26.16.1 dev eth0 proto babel onlink
>> 69.181.216.0/22 via 172.26.16.1 dev eth0 proto babel onlink
>> 169.254.0.0/16 dev eth0 scope link metric 1000
>> 172.26.16.0/24 dev eth0 proto kernel scope link src 172.26.16.177
>> 172.26.16.1 via 172.26.16.1 dev eth0 proto babel onlink
>> 172.26.16.112 via 172.26.16.112 dev eth0 proto babel onlink
>> 172.26.17.0/24 via 172.26.16.1 dev eth0 proto babel onlink
>> 172.26.17.3 via 172.26.16.1 dev eth0 proto babel onlink
>> 172.26.17.227 via 172.26.16.1 dev eth0 proto babel onlink
>> 192.168.7.0/30 dev eth1 proto kernel scope link src 192.168.7.1 metric 1
>> 192.168.7.2 via 172.26.16.112 dev eth0 proto babel onlink
>>
>> And I pull the plug, and everything flips over to wlan0,
>> where I might want westwood (or something saner than
>> that. It might be nice to have a per-device cc default
>> algorithm...)
>
> Something like that might be possible with metrics and "via ... dev if0
> metric xxx" routes, which will be cleaned up as soon as the interface
> goes down and the fallback will be to a route with a different
> congestion algorithm.
mmm... I do dynamic routing via various routing protocols, which
generally don't bother with inserting more than one metric.
While we are thinking through this, what happens with tunnels?
This route in my network switches between interfaces and routes
depending on which is best.
fde5:dfb9:df90:fff0::/64 dev vpn6 proto kernel metric 256
fde5:dfb9:df90:fff0::/60 via fde5:dfb9:df90:fff0::1 dev vpn6 metric 1024
>> root@ganesha:~/git/tinc# ip route
>> default via 172.26.17.224 dev wlan0 proto babel onlink
>> 69.181.216.0/22 via 172.26.17.224 dev wlan0 proto babel onlink
>> 169.254.0.0/16 dev eth0 scope link metric 1000
>> 172.26.16.0/24 dev eth0 proto kernel scope link src 172.26.16.177
>> 172.26.16.1 via 172.26.17.227 dev wlan0 proto babel onlink
>> 172.26.16.112 via 172.26.17.227 dev wlan0 proto babel onlink
>> 172.26.17.0/24 via 172.26.17.224 dev wlan0 proto babel onlink
>> 172.26.17.3 via 172.26.17.227 dev wlan0 proto babel onlink
>> 172.26.17.227 via 172.26.17.227 dev wlan0 proto babel onlink
>> 192.168.7.0/30 dev eth1 proto kernel scope link src 192.168.7.1 metric 1
>> 192.168.7.2 via 172.26.17.227 dev wlan0 proto babel onlink
>>
>
> Bye,
> Hannes
>
>
--
Dave Täht
thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox