* [PATCH net-next] cxgb4: fix the wrong conversion of Mbps to Kbps
From: Ganesh Goudar @ 2018-05-10 10:37 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar
fix the wrong conversion where 1 Mbps was converted to
1024 Kbps.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 32cad0a..5e33780 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2886,13 +2886,13 @@ static int cxgb_set_tx_maxrate(struct net_device *dev, int index, u32 rate)
}
/* Convert from Mbps to Kbps */
- req_rate = rate << 10;
+ req_rate = rate * 1000;
/* Max rate is 100 Gbps */
- if (req_rate >= SCHED_MAX_RATE_KBPS) {
+ if (req_rate > SCHED_MAX_RATE_KBPS) {
dev_err(adap->pdev_dev,
"Invalid rate %u Mbps, Max rate is %u Mbps\n",
- rate, SCHED_MAX_RATE_KBPS >> 10);
+ rate, SCHED_MAX_RATE_KBPS / 1000);
return -ERANGE;
}
--
2.1.0
^ permalink raw reply related
* Re: net: hang in unregister_netdevice: waiting for lo to become free
From: Dan Streetman @ 2018-05-10 10:23 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Tommi Rantala, Neil Horman, Xin Long, David Ahern,
Daniel Borkmann, Cong Wang, David Miller, Eric Dumazet,
Willem de Bruijn, Jakub Kicinski, Rasmus Villemoes, netdev, LKML,
Alexey Kuznetsov, Hideaki YOSHIFUJI, syzkaller, Dan Streetman,
Eric W. Biederman, Alexey Kodanev
In-Reply-To: <CACT4Y+avGQk4X=GmngAs5GE8DUWFwZ5nWQD+JrurZbL181AF6w@mail.gmail.com>
On Thu, May 10, 2018 at 2:46 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Apr 16, 2018 at 9:42 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>>>> On Wed, Feb 21, 2018 at 3:53 PM, Tommi Rantala
>>>>>> <tommi.t.rantala@nokia.com> wrote:
>>>>>>> On 20.02.2018 18:26, Neil Horman wrote:
>>>>>>>>
>>>>>>>> On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala
>>>>>>>>> <tommi.t.rantala@nokia.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 19.02.2018 20:59, Dmitry Vyukov wrote:
>>>>>>>>>>>
>>>>>>>>>>> Is this meant to be fixed already? I am still seeing this on the
>>>>>>>>>>> latest upstream tree.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> These two commits are in v4.16-rc1:
>>>>>>>>>>
>>>>>>>>>> commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8
>>>>>>>>>> Author: Tommi Rantala <tommi.t.rantala@nokia.com>
>>>>>>>>>> Date: Mon Feb 5 21:48:14 2018 +0200
>>>>>>>>>>
>>>>>>>>>> sctp: fix dst refcnt leak in sctp_v4_get_dst
>>>>>>>>>> ...
>>>>>>>>>> Fixes: 410f03831 ("sctp: add routing output fallback")
>>>>>>>>>> Fixes: 0ca50d12f ("sctp: fix src address selection if using
>>>>>>>>>> secondary
>>>>>>>>>> addresses")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2
>>>>>>>>>> Author: Alexey Kodanev <alexey.kodanev@oracle.com>
>>>>>>>>>> Date: Mon Feb 5 15:10:35 2018 +0300
>>>>>>>>>>
>>>>>>>>>> sctp: fix dst refcnt leak in sctp_v6_get_dst()
>>>>>>>>>> ...
>>>>>>>>>> Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using
>>>>>>>>>> secondary
>>>>>>>>>> addresses for ipv6")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I guess we missed something if it's still reproducible.
>>>>>>>>>>
>>>>>>>>>> I can check it later this week, unless someone else beat me to it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Tommi,
>>>>>>>>>
>>>>>>>>> Hmmm, I can't claim that it's exactly the same bug. Perhaps it's
>>>>>>>>> another one then. But I am still seeing these:
>>>>>>>>>
>>>>>>>>> [ 58.799130] unregister_netdevice: waiting for lo to become free.
>>>>>>>>> Usage count = 4
>>>>>>>>> [ 60.847138] unregister_netdevice: waiting for lo to become free.
>>>>>>>>> Usage count = 4
>>>>>>>>> [ 62.895093] unregister_netdevice: waiting for lo to become free.
>>>>>>>>> Usage count = 4
>>>>>>>>> [ 64.943103] unregister_netdevice: waiting for lo to become free.
>>>>>>>>> Usage count = 4
>>>>>>>>>
>>>>>>>>> on upstream tree pulled ~12 hours ago.
>>>>>>>>>
>>>>>>>> Can you write a systemtap script to probe dev_hold, and dev_put, printing
>>>>>>>> out a
>>>>>>>> backtrace if the device name matches "lo". That should tell us
>>>>>>>> definitively if
>>>>>>>> the problem is in the same location or not
>>>>>>>
>>>>>>>
>>>>>>> Hi Dmitry, I tested with the reproducer and the kernel .config file that you
>>>>>>> sent in the first email in this thread:
>>>>>>>
>>>>>>> With 4.16-rc2 unable to reproduce.
>>>>>>>
>>>>>>> With 4.15-rc9 bug reproducible, and I get "unregister_netdevice: waiting for
>>>>>>> lo to become free. Usage count = 3"
>>>>>>>
>>>>>>> With 4.15-rc9 and Alexey's "sctp: fix dst refcnt leak in sctp_v6_get_dst()"
>>>>>>> cherry-picked on top, unable to reproduce.
>>>>>>>
>>>>>>>
>>>>>>> Is syzkaller doing something else now to trigger the bug...?
>>>>>>> Can you still trigger the bug with the same reproducer?
>>>>>>
>>>>>> Hi Neil, Tommi,
>>>>>>
>>>>>> Reviving this old thread about "unregister_netdevice: waiting for lo
>>>>>> to become free. Usage count = 3" hangs.
>>>>>> I still did not have time to deep dive into what happens there (too
>>>>>> many bugs coming from syzbot). But this still actively happens and I
>>>>>> suspect accounts to a significant portion of various hang reports,
>>>>>> which are quite unpleasant.
>>>>>>
>>>>>> One idea that could make it all simpler:
>>>>>>
>>>>>> Is this wait loop in netdev_wait_allrefs() supposed to wait for any
>>>>>> prolonged periods of time under any non-buggy conditions? E.g. more
>>>>>> than 1-2 minutes?
>>>>>> If it only supposed to wait briefly for things that already supposed
>>>>>> to be shutting down, and we add a WARNING there after some timeout,
>>>>>> then syzbot will report all info how/when it happens, hopefully
>>>>>> extracting reproducers, and all the nice things.
>>>>>> But this WARNING should not have any false positives under any
>>>>>> realistic conditions (e.g. waiting for arrival of remote packets with
>>>>>> large timeouts).
>>>>>>
>>>>>> Looking at some task hung reports, it seems that this code holds some
>>>>>> mutexes, takes workqueue thread and prevents any progress with
>>>>>> destruction of other devices (and net namespace creation/destruction),
>>>>>> so I guess it should not wait for any indefinite periods of time?
>>>>>
>>>>> I'm working on this currently:
>>>>> https://bugs.launchpad.net/ubuntu/zesty/+source/linux/+bug/1711407
>>>>>
>>>>> I added a summary of what I've found to be the cause (or at least, one
>>>>> possible cause) of this:
>>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/comments/72
>>>>>
>>>>> I'm working on a patch to work around the main side-effect of this,
>>>>> which is hanging while holding the global net mutex. Hangs will still
>>>>> happen (e.g. if a dst leaks) but should not affect anything else,
>>>>> other than a leak of the dst and its net namespace.
>>>>>
>>>>> Fixing the dst leaks is important too, of course, but a dst leak (or
>>>>> other cause) shouldn't break the entire system.
>>>>
>>>> Leaking some memory is definitely better than hanging the system.
>>>>
>>>> So I've made syzkaller to recognize "unregister_netdevice: waiting for
>>>> (.*) to become free" as a kernel bug:
>>>> https://github.com/google/syzkaller/commit/7a67784ca8bdc3b26cce2f0ec9a40d2dd9ec9396
>>>> Unfortunately it does not make it catch these bugs because creating a
>>>> net namespace per test is too damn slow, so namespaces are reused for
>>>> lots of tests and when/if it's eventually destroyed it's already too
>>>> late to find root cause.
>>>>
>>>> But I've run a one-off experiment with prompt net namespace
>>>> destruction and syzkaller was able to easily extract a C reproducer:
>>>> https://gist.githubusercontent.com/dvyukov/d571e8fff24e127ca48a8c4790d42bfa/raw/52050e93ba9afbb5126b9d7bb39b7e71a82af016/gistfile1.txt
>>>>
>>>> On upstream 16e205cf42da1f497b10a4a24f563e6c0d574eec with this config:
>>>> https://gist.githubusercontent.com/dvyukov/9663c57443adb21f2795b92ef0829d62/raw/bbea0652e23746096dd56855a28f6c681aebcdee/gistfile1.txt
>>>>
>>>> this gives me:
>>>>
>>>> [ 83.183198] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 9
>>>> [ 85.231202] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 9
>>>> ...
>>>> [ 523.511205] unregister_netdevice: waiting for lo to become free.
>>>> Usage count = 9
>>>> ...
>>>>
>>>> This is generated from this syzkaller program:
>>>>
>>>> r0 = socket$inet6(0xa, 0x1, 0x84)
>>>> setsockopt$inet6_IPV6_XFRM_POLICY(r0, 0x29, 0x23,
>>>> &(0x7f0000000380)={{{@in6=@remote={0xfe, 0x80, [], 0xbb},
>>>> @in=@dev={0xac, 0x14, 0x14}, 0x0, 0x0, 0x0, 0x0, 0xa}, {}, {}, 0x0,
>>>> 0x0, 0x1}, {{@in=@local={0xac, 0x14, 0x14, 0xaa}, 0x0, 0x32}, 0x0,
>>>> @in=@local={0xac, 0x14, 0x14, 0xaa}, 0x3504}}, 0xe8)
>>>> bind$inet6(r0, &(0x7f0000000000)={0xa, 0x4e20}, 0x1c)
>>>> connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @dev={0xac, 0x14,
>>>> 0x14, 0xd}}, 0x10)
>>>> syz_emit_ethernet(0x3e, &(0x7f00000001c0)={@local={[0xaa, 0xaa, 0xaa,
>>>> 0xaa, 0xaa], 0xaa}, @dev={[0xaa, 0xaa, 0xaa, 0xaa, 0xaa]}, [],
>>>> {@ipv6={0x86dd, {0x0, 0x6, "50a09c", 0x8, 0xffffff11, 0x0,
>>>> @remote={0xfe, 0x80, [], 0xbb}, @local={0xfe, 0x80, [], 0xaa}, {[],
>>>> @udp={0x0, 0x4e20, 0x8}}}}}}, &(0x7f0000000040))
>>>>
>>>> So this seems to be related to IPv6 and/or xfrm and is potentially
>>>> caused by external packets (that syz_emit_ethernet call).
>>>
>>>
>>>
>>> Here is another repro which seems to be a different bug (note that it
>>> requires fault injection):
>>>
>>> https://gist.githubusercontent.com/dvyukov/1c56623016cc4c24a69d433c5114ad5b/raw/530478f571b195193101b912aa646948528baa8e/gistfile1.txt
>>>
>>> Dan, do you mind taking a look at them? Fixing these should eliminate
>>> root causes of these hangs/leaks.
>>
>> Yep I will look at them, thanks for the reproducers.
>
> Hi Dan,
>
> Any updates on this? syzbot is hitting this all the time.
Sorry, the recent changes from net_mutex -> net_rwsem/pernet_ops_rwsem
have complicated what I had done to workaround this, but I'm still
working on it. Apologies for the delay.
^ permalink raw reply
* [PATCH net-next 4/4] mlxsw: spectrum_span: Support VLAN under mirror-to-gretap
From: Ido Schimmel @ 2018-05-10 10:13 UTC (permalink / raw)
To: netdev, bridge; +Cc: davem, jiri, petrm, stephen, nikolay, mlxsw, Ido Schimmel
In-Reply-To: <20180510101306.4891-1-idosch@mellanox.com>
From: Petr Machata <petrm@mellanox.com>
When mirroring to a gretap or ip6gretap device, allow the underlay
packet path to include VLAN devices. The following configurations are
supported in underlay:
- vlan over phys
- vlan-unaware bridge where the egress device is vlan over phys
- vlan over vlan-aware bridge where the egress device is phys
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum_span.c | 27 +++++++++++++++-------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index d90582ee478f..3b77990df599 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -176,21 +176,23 @@ mlxsw_sp_span_entry_bridge_8021q(const struct net_device *br_dev,
{
struct bridge_vlan_info vinfo;
struct net_device *edev;
- u16 pvid;
+ u16 vid = *p_vid;
- if (WARN_ON(br_vlan_get_pvid(br_dev, &pvid)))
+ if (!vid && WARN_ON(br_vlan_get_pvid(br_dev, &vid)))
return NULL;
- if (!pvid)
+ if (!vid ||
+ br_vlan_get_info(br_dev, vid, &vinfo) ||
+ !(vinfo.flags & BRIDGE_VLAN_INFO_BRENTRY))
return NULL;
- edev = br_fdb_find_port(br_dev, dmac, pvid);
+ edev = br_fdb_find_port(br_dev, dmac, vid);
if (!edev)
return NULL;
- if (br_vlan_get_info(edev, pvid, &vinfo))
+ if (br_vlan_get_info(edev, vid, &vinfo))
return NULL;
if (!(vinfo.flags & BRIDGE_VLAN_INFO_UNTAGGED))
- *p_vid = pvid;
+ *p_vid = vid;
return edev;
}
@@ -208,13 +210,13 @@ mlxsw_sp_span_entry_bridge(const struct net_device *br_dev,
{
struct mlxsw_sp_bridge_port *bridge_port;
enum mlxsw_reg_spms_state spms_state;
+ struct net_device *dev = NULL;
struct mlxsw_sp_port *port;
- struct net_device *dev;
u8 stp_state;
if (br_vlan_enabled(br_dev))
dev = mlxsw_sp_span_entry_bridge_8021q(br_dev, dmac, p_vid);
- else
+ else if (!*p_vid)
dev = mlxsw_sp_span_entry_bridge_8021d(br_dev, dmac);
if (!dev)
return NULL;
@@ -261,12 +263,21 @@ mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
if (!l3edev || mlxsw_sp_span_dmac(tbl, &gw, l3edev, dmac))
goto unoffloadable;
+ if (is_vlan_dev(l3edev))
+ l3edev = mlxsw_sp_span_entry_vlan(l3edev, &vid);
+
if (netif_is_bridge_master(l3edev)) {
l3edev = mlxsw_sp_span_entry_bridge(l3edev, dmac, &vid);
if (!l3edev)
goto unoffloadable;
}
+ if (is_vlan_dev(l3edev)) {
+ if (vid || !(l3edev->flags & IFF_UP))
+ goto unoffloadable;
+ l3edev = mlxsw_sp_span_entry_vlan(l3edev, &vid);
+ }
+
if (!mlxsw_sp_port_dev_check(l3edev))
goto unoffloadable;
--
2.14.3
^ permalink raw reply related
* [PATCH net-next 3/4] mlxsw: spectrum_span: Support mirror-to-VLAN
From: Ido Schimmel @ 2018-05-10 10:13 UTC (permalink / raw)
To: netdev, bridge; +Cc: davem, jiri, petrm, stephen, nikolay, mlxsw, Ido Schimmel
In-Reply-To: <20180510101306.4891-1-idosch@mellanox.com>
From: Petr Machata <petrm@mellanox.com>
Offload "tc action mirred mirror" to a device that is a vlan device on
top of a front-panel port device. The hardware encapsulates the mirrored
packets in a VLAN tag. That includes the case that the mirrored traffic
is already VLAN-tagged--in that case the monitor traffic will be
double-tagged, just like in the software path.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum_span.c | 64 ++++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index cd9071ee19ad..d90582ee478f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -235,6 +235,14 @@ mlxsw_sp_span_entry_bridge(const struct net_device *br_dev,
return dev;
}
+static struct net_device *
+mlxsw_sp_span_entry_vlan(const struct net_device *vlan_dev,
+ u16 *p_vid)
+{
+ *p_vid = vlan_dev_vlan_id(vlan_dev);
+ return vlan_dev_real_dev(vlan_dev);
+}
+
static __maybe_unused int
mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
union mlxsw_sp_l3addr saddr,
@@ -477,6 +485,61 @@ struct mlxsw_sp_span_entry_ops mlxsw_sp_span_entry_ops_gretap6 = {
};
#endif
+static bool
+mlxsw_sp_span_vlan_can_handle(const struct net_device *dev)
+{
+ return is_vlan_dev(dev) &&
+ mlxsw_sp_port_dev_check(vlan_dev_real_dev(dev));
+}
+
+static int
+mlxsw_sp_span_entry_vlan_parms(const struct net_device *to_dev,
+ struct mlxsw_sp_span_parms *sparmsp)
+{
+ struct net_device *real_dev;
+ u16 vid;
+
+ if (!(to_dev->flags & IFF_UP))
+ return mlxsw_sp_span_entry_unoffloadable(sparmsp);
+
+ real_dev = mlxsw_sp_span_entry_vlan(to_dev, &vid);
+ sparmsp->dest_port = netdev_priv(real_dev);
+ sparmsp->vid = vid;
+ return 0;
+}
+
+static int
+mlxsw_sp_span_entry_vlan_configure(struct mlxsw_sp_span_entry *span_entry,
+ struct mlxsw_sp_span_parms sparms)
+{
+ struct mlxsw_sp_port *dest_port = sparms.dest_port;
+ struct mlxsw_sp *mlxsw_sp = dest_port->mlxsw_sp;
+ u8 local_port = dest_port->local_port;
+ char mpat_pl[MLXSW_REG_MPAT_LEN];
+ int pa_id = span_entry->id;
+
+ mlxsw_reg_mpat_pack(mpat_pl, pa_id, local_port, true,
+ MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH);
+ mlxsw_reg_mpat_eth_rspan_pack(mpat_pl, sparms.vid);
+
+ return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mpat), mpat_pl);
+}
+
+static void
+mlxsw_sp_span_entry_vlan_deconfigure(struct mlxsw_sp_span_entry *span_entry)
+{
+ mlxsw_sp_span_entry_deconfigure_common(span_entry,
+ MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH);
+}
+
+static const
+struct mlxsw_sp_span_entry_ops mlxsw_sp_span_entry_ops_vlan = {
+ .can_handle = mlxsw_sp_span_vlan_can_handle,
+ .parms = mlxsw_sp_span_entry_vlan_parms,
+ .configure = mlxsw_sp_span_entry_vlan_configure,
+ .deconfigure = mlxsw_sp_span_entry_vlan_deconfigure,
+};
+
static const
struct mlxsw_sp_span_entry_ops *const mlxsw_sp_span_entry_types[] = {
&mlxsw_sp_span_entry_ops_phys,
@@ -486,6 +549,7 @@ struct mlxsw_sp_span_entry_ops *const mlxsw_sp_span_entry_types[] = {
#if IS_ENABLED(CONFIG_IPV6_GRE)
&mlxsw_sp_span_entry_ops_gretap6,
#endif
+ &mlxsw_sp_span_entry_ops_vlan,
};
static int
--
2.14.3
^ permalink raw reply related
* [PATCH net-next 2/4] mlxsw: reg: Add MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH
From: Ido Schimmel @ 2018-05-10 10:13 UTC (permalink / raw)
To: netdev, bridge; +Cc: davem, jiri, petrm, stephen, nikolay, mlxsw, Ido Schimmel
In-Reply-To: <20180510101306.4891-1-idosch@mellanox.com>
From: Petr Machata <petrm@mellanox.com>
Add MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH to support VLAN-encapsulated
port mirroring.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/reg.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 6218231e379e..3f4d7e22cece 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -6833,6 +6833,12 @@ enum mlxsw_reg_mpat_span_type {
*/
MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH = 0x0,
+ /* Remote SPAN Ethernet VLAN.
+ * The packet is forwarded to the monitoring port on the monitoring
+ * VLAN.
+ */
+ MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH = 0x1,
+
/* Encapsulated Remote SPAN Ethernet L3 GRE.
* The packet is encapsulated with GRE header.
*/
--
2.14.3
^ permalink raw reply related
* [PATCH net-next 1/4] net: bridge: Allow bridge master in br_vlan_get_info()
From: Ido Schimmel @ 2018-05-10 10:13 UTC (permalink / raw)
To: netdev, bridge; +Cc: Ido Schimmel, mlxsw, nikolay, jiri, petrm, davem
In-Reply-To: <20180510101306.4891-1-idosch@mellanox.com>
From: Petr Machata <petrm@mellanox.com>
Mirroring offload in mlxsw needs to check that a given VLAN is allowed
to ingress the bridge device. br_vlan_get_info() is the function that is
used for this, however currently it only supports bridge port devices.
Extend it to support bridge masters as well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
net/bridge/br_vlan.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index df37a5137c25..dc832c0934c6 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1176,6 +1176,8 @@ int br_vlan_get_info(const struct net_device *dev, u16 vid,
p = br_port_get_check_rtnl(dev);
if (p)
vg = nbp_vlan_group(p);
+ else if (netif_is_bridge_master(dev))
+ vg = br_vlan_group(netdev_priv(dev));
else
return -EINVAL;
--
2.14.3
^ permalink raw reply related
* [PATCH net-next 0/4] mlxsw: Support VLAN devices in mirroring offloads
From: Ido Schimmel @ 2018-05-10 10:13 UTC (permalink / raw)
To: netdev, bridge; +Cc: Ido Schimmel, mlxsw, nikolay, jiri, petrm, davem
Petr says:
When offloading "tc action mirred mirror", there are several scenarios
where VLAN devices can show up, that mlxsw can offload on Spectrum
machines.
I) A direct mirror to a VLAN device on top of a front-panel port device
(commonly referred to as "RSPAN")
II) VLAN device in egress path of a packet when resolving a mirror to
gretap or ip6gretap netdevice.
Specifically in the latter case, the following are the cases that can be
offloaded:
IIa) VLAN device directly above a physical device.
IIb) A VLAN-unaware bridge where the egress device is as in IIa.
IIc) VLAN device on top of a VLAN-aware bridge where the egress device
is a physical device.
This patch set implements all the above cases.
First, in patch #1, br_vlan_get_info() is extended to allow bridge
master argument.
Case I is then implemented in patches #2 and #3, case II in patch #4.
Note that handling of VLAN protocol is not included. In case I, mirrored
packets may end up being double-tagged, and it might be reasonable for
the outer tag to be an 802.1ad. However, the protocol type configuration
would have to be put on the same VLAN netdevice that represents normal
VLAN traffic, and mlxsw currently ignores this setting in that case. Thus
this support was left out and the encapsulation always uses 802.1q
protocol.
Petr Machata (4):
net: bridge: Allow bridge master in br_vlan_get_info()
mlxsw: reg: Add MLXSW_REG_MPAT_SPAN_TYPE_REMOTE_ETH
mlxsw: spectrum_span: Support mirror-to-VLAN
mlxsw: spectrum_span: Support VLAN under mirror-to-gretap
drivers/net/ethernet/mellanox/mlxsw/reg.h | 6 ++
.../net/ethernet/mellanox/mlxsw/spectrum_span.c | 91 ++++++++++++++++++++--
net/bridge/br_vlan.c | 2 +
3 files changed, 91 insertions(+), 8 deletions(-)
--
2.14.3
^ permalink raw reply
* Re: [PATCH net-next] net/core: delete the no need variable description in devlink_resource_register() comment
From: Sun Lianwen @ 2018-05-10 10:10 UTC (permalink / raw)
To: sergei.shtylyov, davem; +Cc: netdev
In-Reply-To: <858a4f1d-4c15-7495-e7f1-2cbf5f4b110a@cogentembedded.com>
Hi
On 05/10/2018 05:52 PM, Sergei Shtylyov wrote:
> Hello!
>
> On 5/10/2018 6:28 AM, Sun Lianwen wrote:
>
>> The variable "top_hierarchy" and "reload_required" is not existence in
>
> Don't exist, you mean?
my statement maybe not right. I mean is that the parameter of
devlink_resource_register( ) don't have "top_hierarchy" and "reload_required"
3173int devlink_resource_register(struct devlink *devlink,
3174 const char *resource_name,
3175 u64 resource_size,
3176 u64 resource_id,
3177 u64 parent_resource_id,
3178 const struct devlink_resource_size_params *size_params)
>
>> devlink_resource_register()
>>
>> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
> [...]
>
> MBR, Sergei
>
>
>
Thanks,
Sun Lianwen
^ permalink raw reply
* Re: [PATCH bpf v3] x86/cpufeature: bpf hack for clang not supporting asm goto
From: Peter Zijlstra @ 2018-05-10 10:06 UTC (permalink / raw)
To: Yonghong Song
Cc: mingo, torvalds, ast, daniel, linux-kernel, x86, netdev,
kernel-team, Thomas Gleixner
In-Reply-To: <20180504033119.2130788-1-yhs@fb.com>
On Thu, May 03, 2018 at 08:31:19PM -0700, Yonghong Song wrote:
> This approach is preferred since the already deployed bcc scripts, or
> any other bpf applicaitons utilizing LLVM JIT compilation functionality,
> will continue work with the new kernel without re-compilation and
> re-deployment.
So I really hate this and would much rather see the BPF build
environment changed. It not consistenyly having __BPF__ defined really
smells like a bug on your end.
Sometimes you just need to update tools... Is it really too hard to do
-D__BPF__ in the bpf build process that we need to mollest the kernel
for it?
> Note that this is a hack in the kernel to workaround bpf compilation issue.
> The hack will be removed once clang starts to support asm goto.
Note that that ^^ already mandates people re-deploy their bpf tools, so
why is llvm supporting asm-goto a better point to re-deploy than fixing
a consistent __BPF__ define for the bpf build environment?
> diff --git a/Makefile b/Makefile
> index 83b6c54..cfd8759 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -504,6 +504,7 @@ export RETPOLINE_CFLAGS
> ifeq ($(call shell-cached,$(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC) $(KBUILD_CFLAGS)), y)
> CC_HAVE_ASM_GOTO := 1
> KBUILD_CFLAGS += -DCC_HAVE_ASM_GOTO
> + KBUILD_CFLAGS += -D__NO_CLANG_BPF_HACK
> KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
> endif
I really think this is the wrong thing to do; but if the x86 maintainers
are willing to take this, I'll grudingly shut up.
Ingo, Thomas?
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index b27da96..42edd5d 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -140,6 +140,8 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
>
> #define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit)
>
> +/* this macro is a temporary hack for bpf until clang gains asm-goto support */
> +#ifdef __NO_CLANG_BPF_HACK
> /*
> * Static testing of CPU features. Used the same as boot_cpu_has().
> * These will statically patch the target code for additional
> @@ -195,6 +197,9 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
> boot_cpu_has(bit) : \
> _static_cpu_has(bit) \
> )
> +#else
> +#define static_cpu_has(bit) boot_cpu_has(bit)
> +#endif
>
> #define cpu_has_bug(c, bit) cpu_has(c, (bit))
> #define set_cpu_bug(c, bit) set_cpu_cap(c, (bit))
> --
> 2.9.5
>
^ permalink raw reply
* Re: [PATCH net-next] net/core: delete the no need variable description in devlink_resource_register() comment
From: Sergei Shtylyov @ 2018-05-10 9:52 UTC (permalink / raw)
To: Sun Lianwen, davem; +Cc: netdev
In-Reply-To: <20180510032800.5450-1-sunlw.fnst@cn.fujitsu.com>
Hello!
On 5/10/2018 6:28 AM, Sun Lianwen wrote:
> The variable "top_hierarchy" and "reload_required" is not existence in
Don't exist, you mean?
> devlink_resource_register()
>
> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
[...]
MBR, Sergei
^ permalink raw reply
* Re: kernel BUG at net/core/skbuff.c:LINE! (3)
From: Dmitry Vyukov @ 2018-05-10 9:51 UTC (permalink / raw)
To: Xin Long, William Tu, mvohra
Cc: syzbot, davem, LKML, linux-sctp, network dev, Neil Horman,
syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CADvbK_d9Z3OsHnLc2mg-3e=0t6sVFnEVXoDah3nWJrFSyaq+5w@mail.gmail.com>
On Sat, Feb 10, 2018 at 12:17 PM, Xin Long <lucien.xin@gmail.com> wrote:
> On Fri, Feb 2, 2018 at 3:21 AM, syzbot
> <syzbot+b2bf2652983d23734c5c@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> b2fe5fa68642860e7de76167c3111623aa0d5de1 (Wed Jan 31 22:31:10 2018 +0000)
>> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+b2bf2652983d23734c5c@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> skbuff: skb_over_panic: text:000000004b89f3be len:66136 put:66124
>> head:00000000f255561a data:00000000ccb55e52 tail:0x10310 end:0x6c0
>> dev:<NULL>
> From the raw log, it should be a data chunk.
> But I couldn't see how len:66136 happened?
> considering that frag_point is always smaller than SCTP_MAX_CHUNK_LEN.
William, Meenakshi,
This crash was bisected to:
commit 84e54fe0a5eaed696dee4019c396f8396f5a908b
Author: William Tu <u9012063@gmail.com>
Date: Tue Aug 22 09:40:28 2017 -0700
gre: introduce native tunnel support for ERSPAN
bisection log:
https://gist.githubusercontent.com/dvyukov/a9661d43b2b519b91540f7466dbc32c1/raw/8df343224177933c8c398be126bb82be99aa0b4b/gistfile1.txt
>> ------------[ cut here ]------------
>> kernel BUG at net/core/skbuff.c:104!
>> invalid opcode: 0000 [#1] SMP KASAN
>> Dumping ftrace buffer:
>> (ftrace buffer empty)
>> Modules linked in:
>> CPU: 1 PID: 19738 Comm: syz-executor3 Not tainted 4.15.0+ #219
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
>> RSP: 0018:ffff8801c1a6e4e8 EFLAGS: 00010286
>> RAX: 000000000000008f RBX: ffff8801d0090000 RCX: 0000000000000000
>> RDX: 000000000000008f RSI: ffffc90003d53000 RDI: ffffed003834dc91
>> RBP: ffff8801c1a6e550 R08: 1ffff1003834dc1f R09: 0000000000000000
>> R10: 0000000000000004 R11: 0000000000000000 R12: ffffffff863fe4e0
>> R13: ffffffff85276640 R14: 000000000001024c R15: ffffffff863fdd20
>> FS: 00007f69cd01b700(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000718008 CR3: 00000001c71c7006 CR4: 00000000001606e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> skb_over_panic net/core/skbuff.c:109 [inline]
>> skb_put+0x18d/0x1d0 net/core/skbuff.c:1695
>> skb_put_data include/linux/skbuff.h:2049 [inline]
>> sctp_packet_pack net/sctp/output.c:473 [inline]
>> sctp_packet_transmit+0x1180/0x3750 net/sctp/output.c:606
>> sctp_outq_flush+0x121b/0x4060 net/sctp/outqueue.c:1197
>> sctp_outq_uncork+0x5a/0x70 net/sctp/outqueue.c:776
>> sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1807 [inline]
>> sctp_side_effects net/sctp/sm_sideeffect.c:1210 [inline]
>> sctp_do_sm+0x4e0/0x6ed0 net/sctp/sm_sideeffect.c:1181
>> sctp_primitive_SEND+0xa0/0xd0 net/sctp/primitive.c:178
>> sctp_sendmsg+0x1894/0x35e0 net/sctp/socket.c:2029
>> inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
>> sock_sendmsg_nosec net/socket.c:630 [inline]
>> sock_sendmsg+0xca/0x110 net/socket.c:640
>> sock_write_iter+0x31a/0x5d0 net/socket.c:909
>> call_write_iter include/linux/fs.h:1781 [inline]
>> do_iter_readv_writev+0x55c/0x830 fs/read_write.c:653
>> do_iter_write+0x154/0x540 fs/read_write.c:932
>> vfs_writev+0x18a/0x340 fs/read_write.c:977
>> do_writev+0xfc/0x2a0 fs/read_write.c:1012
>> SYSC_writev fs/read_write.c:1085 [inline]
>> SyS_writev+0x27/0x30 fs/read_write.c:1082
>> entry_SYSCALL_64_fastpath+0x29/0xa0
>> RIP: 0033:0x453299
>> RSP: 002b:00007f69cd01ac58 EFLAGS: 00000212 ORIG_RAX: 0000000000000014
>> RAX: ffffffffffffffda RBX: 000000000071bea0 RCX: 0000000000453299
>> RDX: 0000000000000001 RSI: 0000000020f7ffe0 RDI: 0000000000000013
>> RBP: 00000000000005c5 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006f7b18
>> R13: 00000000ffffffff R14: 00007f69cd01b6d4 R15: 0000000000000000
>> Code: 04 01 84 c0 74 04 3c 03 7e 23 8b 8b 80 00 00 00 41 57 48 c7 c7 60 dd
>> 3f 86 52 56 4c 89 ea 41 50 4c 89 e6 45 89 f0 e8 c6 d7 25 fd <0f> 0b 4c 89 4d
>> b8 4c 89 45 c0 48 89 75 c8 48 89 55 d0 e8 47 53
>> RIP: skb_panic+0x162/0x1f0 net/core/skbuff.c:100 RSP: ffff8801c1a6e4e8
>> ---[ end trace c7cd29819a9b12ab ]---
>>
>>
>> ---
>> This bug is generated by a dumb bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this bug report.
>> If you forgot to add the Reported-by tag, once the fix for this bug is
>> merged
>> into any tree, please reply to this email with:
>> #syz fix: exact-commit-title
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug
>> report.
>> Note: all commands must start from beginning of the line in the email body.
^ permalink raw reply
* Re: [RFC v3 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-10 9:49 UTC (permalink / raw)
To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu
In-Reply-To: <20180510085601.6mpxf3yvwxnqnk5q@debian>
On 2018年05月10日 16:56, Tiwei Bie wrote:
> On Thu, May 10, 2018 at 03:34:50PM +0800, Jason Wang wrote:
>> On 2018年05月10日 15:32, Jason Wang wrote:
>>> On 2018年04月25日 13:15, Tiwei Bie wrote:
>>>> + /* We're using some buffers from the free list. */
>>>> + vq->vq.num_free -= descs_used;
>>>> +
>>>> + /* Update free pointer */
>>>> + if (indirect) {
>>>> + n = head + 1;
>>>> + if (n >= vq->vring_packed.num) {
>>>> + n = 0;
>>>> + vq->wrap_counter ^= 1;
>>>> + }
>>>> + vq->next_avail_idx = n;
>>>> + } else
>>>> + vq->next_avail_idx = i;
>>> During testing zerocopy (out of order completion), I found driver may
>>> submit two identical buffer id to vhost. So the above code may not work
>>> well.
>>>
>>> Consider the case that driver adds 3 buffer and virtqueue size is 8.
>>>
>>> a) id = 0,count = 2,next_avail = 2
>>>
>>> b) id = 2,count = 4,next_avail = 2
>> next_avail should be 6 here.
>>
>>> c) id = 4,count = 2,next_avail = 0
>>>
>> id should be 6 here.
>>
>> Thanks
>>
>>> if packet b is done before packet a, driver may think buffer id 0 is
>>> available and try to use it if even if the real buffer 0 was not done.
>>>
>>> Thanks
> Nice catch! Thanks a lot!
> I'll implement an ID allocator.
>
> Best regards,
> Tiwei Bie
Sounds good.
Another similar issue is detac_buf_packed(). It did:
for (j = 0; j < vq->desc_state[head].num; j++) {
desc = &vq->vring_packed.desc[i];
vring_unmap_one_packed(vq, desc);
i++;
if (i >= vq->vring_packed.num)
i = 0;
}
This probably won't work for out of order too and according to the spec:
"""
Driver needs to keep track of the size of the list corresponding to each
buffer ID, to be able to skip to where the next used descriptor is
written by the device.
"""
Looks like we should not depend on the descriptor ring.
Thanks
^ permalink raw reply
* [PATCH net] sctp: remove sctp_chunk_put from fail_mark err path in sctp_ulpevent_make_rcvmsg
From: Xin Long @ 2018-05-10 9:34 UTC (permalink / raw)
To: network dev, linux-sctp
Cc: davem, Marcelo Ricardo Leitner, Neil Horman, syzkaller
In Commit 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too"),
it held the chunk in sctp_ulpevent_make_rcvmsg to access it safely later
in recvmsg. However, it also added sctp_chunk_put in fail_mark err path,
which is only triggered before holding the chunk.
syzbot reported a use-after-free crash happened on this err path, where
it shouldn't call sctp_chunk_put.
This patch simply removes this call.
Fixes: 1f45f78f8e51 ("sctp: allow GSO frags to access the chunk too")
Reported-by: syzbot+141d898c5f24489db4aa@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/sctp/ulpevent.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 84207ad..8cb7d98 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -715,7 +715,6 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
return event;
fail_mark:
- sctp_chunk_put(chunk);
kfree_skb(skb);
fail:
return NULL;
--
2.1.0
^ permalink raw reply related
* Re: [PATCH] selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
From: Alexey Kodanev @ 2018-05-10 9:28 UTC (permalink / raw)
To: Paul Moore, Stephen Smalley, Richard Haines
Cc: selinux, Eric Paris, linux-security-module, netdev
In-Reply-To: <CAHC9VhRmX+Z+ignXXBrSGyUkpxudO76Fv-6o_My=skh-U8Ccgw@mail.gmail.com>
On 10.05.2018 01:02, Paul Moore wrote:
...
> I just had a better look at this and I believe that Alexey and Stephen
> are right: this is the best option. My apologies for the noise
> earlier. However, while looking at the code I think there are some
> additional necessary changes:
>
> * In the case of an SCTP socket, we should return -EINVAL, just as we
> do with other address families.
Right.
> * While not strictly related to AF_UNSPEC, we really should be passing
> the address family of the sockaddr, and not the socket, to functions
> that need to interpret the bind address/port.
That looks like a correct solution. I guess we need the same fix for
sctp_connectx(), in selinux_socket_connect_helper().
>
> I'm waiting for my kernel to compile so I haven't given this any
> sanity testing, but the patch below is what I think we need ...
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 4cafe6a19167..5f30045b2053 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4576,6 +4576,7 @@ static int selinux_socket_post_create(struct socket *sock,
> int family,
> static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, i
> nt addrlen)
> {
> struct sock *sk = sock->sk;
> + struct sk_security_struct *sksec = sk->sk_security;
> u16 family;
> int err;
>
> @@ -4587,13 +4588,13 @@ static int selinux_socket_bind(struct socket *sock, stru
> ct sockaddr *address, in
> family = sk->sk_family;
> if (family == PF_INET || family == PF_INET6) {
> char *addrp;
> - struct sk_security_struct *sksec = sk->sk_security;
> struct common_audit_data ad;
> struct lsm_network_audit net = {0,};
> struct sockaddr_in *addr4 = NULL;
> struct sockaddr_in6 *addr6 = NULL;
> unsigned short snum;
> u32 sid, node_perm;
> + u16 family_sa = address->sa_family;
>
> /*
> * sctp_bindx(3) calls via selinux_sctp_bind_connect()
> @@ -4601,11 +4602,19 @@ static int selinux_socket_bind(struct socket *sock, stru
> ct sockaddr *address, in
> * need to check address->sa_family as it is possible to have
> * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
> */
> - switch (address->sa_family) {
> + switch (family_sa) {
> + case AF_UNSPEC:
> case AF_INET:
> if (addrlen < sizeof(struct sockaddr_in))
> return -EINVAL;
> addr4 = (struct sockaddr_in *)address;
> + if (family_sa == AF_UNSPEC) {
> + /* see "__inet_bind()", we only want to allow
> + * AF_UNSPEC if the address is INADDR_ANY */
> + if (addr4->sin_addr.s_addr != htonl(INADDR_ANY))
> + goto err_af;
> + family_sa = AF_INET;
> + }
> snum = ntohs(addr4->sin_port);
> addrp = (char *)&addr4->sin_addr.s_addr;
> break;
> @@ -4617,15 +4626,14 @@ static int selinux_socket_bind(struct socket *sock, stru
> ct sockaddr *address, in
> addrp = (char *)&addr6->sin6_addr.s6_addr;
> break;
> default:
> - /* Note that SCTP services expect -EINVAL, whereas
> - * others expect -EAFNOSUPPORT.
> - */
> - if (sksec->sclass == SECCLASS_SCTP_SOCKET)
> - return -EINVAL;
> - else
> - return -EAFNOSUPPORT;
> + goto err_af;
> }
>
> + ad.type = LSM_AUDIT_DATA_NET;
> + ad.u.net = &net;
> + ad.u.net->sport = htons(snum);
> + ad.u.net->family = family_sa;
> +
May be we could move setting ad.u.net->v{4|6}info.saddr here as well?
Will send a v2 of this patch so that SCTP socket returns EINVAL with
AF_UNSPEC. Should I prepare a patch with correcting 'ad.u.net->family'
and sel_netnode_sid()?
Thanks,
Alexey
> if (snum) {
> int low, high;
>
> @@ -4637,10 +4645,6 @@ static int selinux_socket_bind(struct socket *sock, struc
> t sockaddr *address, in
> snum, &sid);
> if (err)
> goto out;
> - ad.type = LSM_AUDIT_DATA_NET;
> - ad.u.net = &net;
> - ad.u.net->sport = htons(snum);
> - ad.u.net->family = family;
> err = avc_has_perm(&selinux_state,
> sksec->sid, sid,
> sksec->sclass,
> @@ -4672,16 +4676,11 @@ static int selinux_socket_bind(struct socket *sock, stru
> ct sockaddr *address, in
> break;
> }
>
> - err = sel_netnode_sid(addrp, family, &sid);
> + err = sel_netnode_sid(addrp, family_sa, &sid);
> if (err)
> goto out;
>
> - ad.type = LSM_AUDIT_DATA_NET;
> - ad.u.net = &net;
> - ad.u.net->sport = htons(snum);
> - ad.u.net->family = family;
> -
> - if (address->sa_family == AF_INET)
> + if (family_sa == AF_INET)
> ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
> else
> ad.u.net->v6info.saddr = addr6->sin6_addr;
> @@ -4694,6 +4693,12 @@ static int selinux_socket_bind(struct socket *sock, struc
> t sockaddr *address, in
> }
> out:
> return err;
> +err_af:
> + /* Note that SCTP services expect -EINVAL, others -EAFNOSUPPORT. */
> + if (sksec->sclass == SECCLASS_SCTP_SOCKET)
> + return -EINVAL;
> + else
> + return -EAFNOSUPPORT;
> }
>
> /* This supports connect(2) and SCTP connect services such as sctp_connectx(3)
>
^ permalink raw reply
* Re: [bpf-next PATCH 1/4] bpf: devmap introduce dev_map_enqueue
From: kbuild test robot @ 2018-05-10 9:10 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: kbuild-all, netdev, Daniel Borkmann, Alexei Starovoitov,
Jesper Dangaard Brouer, Christoph Hellwig, BjörnTöpel,
Magnus Karlsson
In-Reply-To: <152587157974.20423.10791157575158535841.stgit@firesoul>
[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]
Hi Jesper,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/xdp-introduce-bulking-for-ndo_xdp_xmit-API/20180510-134105
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-x014-201818 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
In file included from kernel/sysctl.c:67:0:
>> include/linux/bpf.h:577:28: warning: 'struct bpf_dtab_netdev' declared inside parameter list will not be visible outside of this definition or declaration
int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
^~~~~~~~~~~~~~~
vim +577 include/linux/bpf.h
574
575 struct xdp_buff;
576 static inline
> 577 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
578 {
579 return 0;
580 }
581
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26956 bytes --]
^ permalink raw reply
* Re: [bpf-next PATCH 1/4] bpf: devmap introduce dev_map_enqueue
From: kbuild test robot @ 2018-05-10 9:10 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: kbuild-all, netdev, Daniel Borkmann, Alexei Starovoitov,
Jesper Dangaard Brouer, Christoph Hellwig, BjörnTöpel,
Magnus Karlsson
In-Reply-To: <152587157974.20423.10791157575158535841.stgit@firesoul>
[-- Attachment #1: Type: text/plain, Size: 2637 bytes --]
Hi Jesper,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/xdp-introduce-bulking-for-ndo_xdp_xmit-API/20180510-134105
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-acpi-redef (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
In file included from net/core/filter.c:52:0:
include/linux/bpf.h:577:28: warning: 'struct bpf_dtab_netdev' declared inside parameter list will not be visible outside of this definition or declaration
int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
^~~~~~~~~~~~~~~
net/core/filter.c: In function '__bpf_tx_xdp_map':
>> net/core/filter.c:3025:25: error: passing argument 1 of 'dev_map_enqueue' from incompatible pointer type [-Werror=incompatible-pointer-types]
err = dev_map_enqueue(dst, xdp);
^~~
In file included from net/core/filter.c:52:0:
include/linux/bpf.h:577:5: note: expected 'struct bpf_dtab_netdev *' but argument is of type 'struct bpf_dtab_netdev *'
int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp)
^~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/dev_map_enqueue +3025 net/core/filter.c
3013
3014 static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
3015 struct bpf_map *map,
3016 struct xdp_buff *xdp,
3017 u32 index)
3018 {
3019 int err;
3020
3021 switch (map->map_type) {
3022 case BPF_MAP_TYPE_DEVMAP: {
3023 struct bpf_dtab_netdev *dst = fwd;
3024
> 3025 err = dev_map_enqueue(dst, xdp);
3026 if (err)
3027 return err;
3028 __dev_map_insert_ctx(map, index);
3029 break;
3030 }
3031 case BPF_MAP_TYPE_CPUMAP: {
3032 struct bpf_cpu_map_entry *rcpu = fwd;
3033
3034 err = cpu_map_enqueue(rcpu, xdp, dev_rx);
3035 if (err)
3036 return err;
3037 __cpu_map_insert_ctx(map, index);
3038 break;
3039 }
3040 case BPF_MAP_TYPE_XSKMAP: {
3041 struct xdp_sock *xs = fwd;
3042
3043 err = __xsk_map_redirect(map, xdp, xs);
3044 return err;
3045 }
3046 default:
3047 break;
3048 }
3049 return 0;
3050 }
3051
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30404 bytes --]
^ permalink raw reply
* Re: [bpf-next v3 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: Toke Høiland-Jørgensen @ 2018-05-10 9:09 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David Ahern
Cc: netdev, borkmann, ast, davem, shm, roopa, john.fastabend, brouer
In-Reply-To: <20180510093158.08a7ed4b@redhat.com>
Jesper Dangaard Brouer <brouer@redhat.com> writes:
> On Wed, 9 May 2018 20:34:26 -0700
> David Ahern <dsahern@gmail.com> wrote:
>
>> Provide a helper for doing a FIB and neighbor lookup in the kernel
>> tables from an XDP program. The helper provides a fastpath for forwarding
>> packets. If the packet is a local delivery or for any reason is not a
>> simple lookup and forward, the packet continues up the stack.
>>
>> If it is to be forwarded, the forwarding can be done directly if the
>> neighbor is already known. If the neighbor does not exist, the first
>> few packets go up the stack for neighbor resolution. Once resolved, the
>> xdp program provides the fast path.
>>
>> On successful lookup the nexthop dmac, current device smac and egress
>> device index are returned.
>>
>> The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
>> are implemented in this patch. The API includes layer 4 parameters if
>> the XDP program chooses to do deep packet inspection to allow compare
>> against ACLs implemented as FIB rules.
>>
>> Header rewrite is left to the XDP program.
>>
>> The lookup takes 2 flags:
>> - BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
>> straight to the table associated with the device (expert setting for
>> those looking to maximize throughput)
>>
>> - BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
>> Default is an ingress lookup.
>>
>> Initial performance numbers collected by Jesper, forwarded packets/sec:
>>
>> Full stack XDP FIB lookup XDP Direct lookup
>> IPv4 1,947,969 7,074,156 7,415,333
>> IPv6 1,728,000 6,165,504 7,262,720
>>
>
> The "Full stack" tests were with netfilter modules unloaded. Default
> setting with netfilter conntrack loaded and default Fedora firewall
> rules, show around 700Kpps.
>
>> These number are single CPU core forwarding on a Broadwell
>> E5-1650 v4 @ 3.60GHz.
>>
>> Signed-off-by: David Ahern <dsahern@gmail.com>
>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> This helper is awesome, as it really shows how XDP is meant to work in
> concert and cooperate with the existing network stack.
+1!
-Toke
^ permalink raw reply
* Re: [bpf-next PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
From: kbuild test robot @ 2018-05-10 9:03 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: kbuild-all, netdev, Daniel Borkmann, Alexei Starovoitov,
Jesper Dangaard Brouer, Christoph Hellwig, BjörnTöpel,
Magnus Karlsson
In-Reply-To: <152587159495.20423.14022994969026458789.stgit@firesoul>
[-- Attachment #1: Type: text/plain, Size: 977 bytes --]
Hi Jesper,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/xdp-introduce-bulking-for-ndo_xdp_xmit-API/20180510-134105
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k
All errors (new ones prefixed by >>):
net/core/xdp.o: In function `__xdp_return':
>> xdp.c:(.text+0x356): undefined reference to `__page_pool_put_page'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11635 bytes --]
^ permalink raw reply
* Re: [RFC v3 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-10 8:56 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu
In-Reply-To: <5885acac-e9e3-3abf-b6a2-7347f4d55be2@redhat.com>
On Thu, May 10, 2018 at 03:34:50PM +0800, Jason Wang wrote:
> On 2018年05月10日 15:32, Jason Wang wrote:
> > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > + /* We're using some buffers from the free list. */
> > > + vq->vq.num_free -= descs_used;
> > > +
> > > + /* Update free pointer */
> > > + if (indirect) {
> > > + n = head + 1;
> > > + if (n >= vq->vring_packed.num) {
> > > + n = 0;
> > > + vq->wrap_counter ^= 1;
> > > + }
> > > + vq->next_avail_idx = n;
> > > + } else
> > > + vq->next_avail_idx = i;
> >
> > During testing zerocopy (out of order completion), I found driver may
> > submit two identical buffer id to vhost. So the above code may not work
> > well.
> >
> > Consider the case that driver adds 3 buffer and virtqueue size is 8.
> >
> > a) id = 0,count = 2,next_avail = 2
> >
> > b) id = 2,count = 4,next_avail = 2
>
> next_avail should be 6 here.
>
> >
> > c) id = 4,count = 2,next_avail = 0
> >
>
> id should be 6 here.
>
> Thanks
>
> > if packet b is done before packet a, driver may think buffer id 0 is
> > available and try to use it if even if the real buffer 0 was not done.
> >
> > Thanks
Nice catch! Thanks a lot!
I'll implement an ID allocator.
Best regards,
Tiwei Bie
^ permalink raw reply
* Re: STMMAC driver with TSO enabled issue
From: Jose Abreu @ 2018-05-10 8:55 UTC (permalink / raw)
To: Bhadram Varka, Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <89c0a735-9e34-89c6-7692-579e48dadaa6@nvidia.com>
++net-dev
Hi Bhadram,
On 09-05-2018 12:03, Bhadram Varka wrote:
> Hi,
>
> Thanks for responding.
>
> Tried below suggested way. Still observing the issue -
It seems stmmac has a bug in the RX side when using TSO which is
causing all the RX descriptors to be consumed. The stmmac_rx()
function will need to be refactored. I will send a fix ASAP.
Thanks and Best Regards,
Jose Miguel Abreu
>
> [root@alarm ~]# iperf3 -c 10.19.65.141
> Connecting to host 10.19.65.141, port 5201
> [ 5] local 10.19.65.210 port 57630 connected to 10.19.65.141
> port 5201
> [ 65.408268] stmmac_tso_xmit(): line = 2842
> [ 65.412362] stmmac_tso_xmit: tcphdrlen 32, hdr_len 66,
> pay_len 0, mss 1448
> [ 65.419224] skb->len 8754, skb->data_len 8688
> [ 65.423672] stmmac_tso_xmit: curr=20 dirty=17 f=18, e=20,
> f_p=00000000178e52e1, nfrags 1
> [ 65.431747] TX descriptor ring:
> [ 65.434881] 000 [0x82005000]: 0x0 0x0 0x0 0x0
> [ 65.439230] 001 [0x82005010]: 0x0 0x0 0x0 0x0
> [ 65.443578] 002 [0x82005020]: 0x0 0x0 0x0 0x0
> [ 65.447927] 003 [0x82005030]: 0x0 0x0 0x0 0x0
> [ 65.452275] 004 [0x82005040]: 0x0 0x0 0x0 0x0
> [ 65.456622] 005 [0x82005050]: 0x0 0x0 0x0 0x0
> [ 65.460970] 006 [0x82005060]: 0x0 0x0 0x0 0x0
> [ 65.465316] 007 [0x82005070]: 0x0 0x0 0x0 0x0
> [ 65.469664] 008 [0x82005080]: 0x0 0x0 0x0 0x0
> [ 65.474010] 009 [0x82005090]: 0x0 0x0 0x0 0x0
> [ 65.478357] 010 [0x820050a0]: 0x0 0x0 0x0 0x0
> [ 65.482706] 011 [0x820050b0]: 0x0 0x0 0x0 0x0
> [ 65.487053] 012 [0x820050c0]: 0x0 0x0 0x0 0x0
> [ 65.491400] 013 [0x820050d0]: 0x0 0x0 0x0 0x0
> [ 65.495746] 014 [0x820050e0]: 0x0 0x0 0x0 0x0
> [ 65.500092] 015 [0x820050f0]: 0x0 0x0 0x0 0x0
> [ 65.504438] 016 [0x82005100]: 0x0 0x0 0x0 0x0
> [ 65.508784] 017 [0x82005110]: 0x0 0x0 0x5a8 0xc4000000
> [ 65.513910] 018 [0x82005120]: 0xfb297000 0x0 0x42 0xa04421f0
> [ 65.519557] 019 [0x82005130]: 0xfb298000 0x0 0x21f0 0x90000000
> [ 65.525376] 020 [0x82005140]: 0x0 0x0 0x0 0x0
> [ 65.529722] 021 [0x82005150]: 0x0 0x0 0x0 0x0
> [ 65.534069] 022 [0x82005160]: 0x0 0x0 0x0 0x0
> [ 65.538414] 023 [0x82005170]: 0x0 0x0 0x0 0x0
> [ 65.542761] 024 [0x82005180]: 0x0 0x0 0x0 0x0
> [ 65.547107] 025 [0x82005190]: 0x0 0x0 0x0 0x0
> [ 65.551454] 026 [0x820051a0]: 0x0 0x0 0x0 0x0
> [ 65.555802] 027 [0x820051b0]: 0x0 0x0 0x0 0x0
> [ 65.560147] 028 [0x820051c0]: 0x0 0x0 0x0 0x0
> [ 65.564493] 029 [0x820051d0]: 0x0 0x0 0x0 0x0
> [ 65.568840] 030 [0x820051e0]: 0x0 0x0 0x0 0x0
> [ 65.573187] 031 [0x820051f0]: 0x0 0x0 0x0 0x0
> [ 65.577533] 032 [0x82005200]: 0x0 0x0 0x0 0x0
> [ 65.581879] 033 [0x82005210]: 0x0 0x0 0x0 0x0
> [ 65.586225] 034 [0x82005220]: 0x0 0x0 0x0 0x0
> [ 65.590571] 035 [0x82005230]: 0x0 0x0 0x0 0x0
> [ 65.594917] 036 [0x82005240]: 0x0 0x0 0x0 0x0
> [ 65.599262] 037 [0x82005250]: 0x0 0x0 0x0 0x0
> [ 65.603607] 038 [0x82005260]: 0x0 0x0 0x0 0x0
> [ 65.607952] 039 [0x82005270]: 0x0 0x0 0x0 0x0
> [ 65.612297] 040 [0x82005280]: 0x0 0x0 0x0 0x0
> [ 65.616643] 041 [0x82005290]: 0x0 0x0 0x0 0x0
> [ 65.620989] 042 [0x820052a0]: 0x0 0x0 0x0 0x0
> [ 65.625336] 043 [0x820052b0]: 0x0 0x0 0x0 0x0
> [ 65.629681] 044 [0x820052c0]: 0x0 0x0 0x0 0x0
> [ 65.634027] 045 [0x820052d0]: 0x0 0x0 0x0 0x0
> [ 65.638372] 046 [0x820052e0]: 0x0 0x0 0x0 0x0
> [ 65.642718] 047 [0x820052f0]: 0x0 0x0 0x0 0x0
> [ 65.647063] 048 [0x82005300]: 0x0 0x0 0x0 0x0
> [ 65.651408] 049 [0x82005310]: 0x0 0x0 0x0 0x0
> [ 65.655754] 050 [0x82005320]: 0x0 0x0 0x0 0x0
> [ 65.660099] 051 [0x82005330]: 0x0 0x0 0x0 0x0
> [ 65.664444] 052 [0x82005340]: 0x0 0x0 0x0 0x0
> [ 65.668790] 053 [0x82005350]: 0x0 0x0 0x0 0x0
> [ 65.673134] 054 [0x82005360]: 0x0 0x0 0x0 0x0
> [ 65.677480] 055 [0x82005370]: 0x0 0x0 0x0 0x0
> [ 65.681825] 056 [0x82005380]: 0x0 0x0 0x0 0x0
> [ 65.686170] 057 [0x82005390]: 0x0 0x0 0x0 0x0
> [ 65.690515] 058 [0x820053a0]: 0x0 0x0 0x0 0x0
> [ 65.694861] 059 [0x820053b0]: 0x0 0x0 0x0 0x0
> [ 65.699206] 060 [0x820053c0]: 0x0 0x0 0x0 0x0
> [ 65.703552] 061 [0x820053d0]: 0x0 0x0 0x0 0x0
> [ 65.707898] 062 [0x820053e0]: 0x0 0x0 0x0 0x0
> [ 65.712243] 063 [0x820053f0]: 0x0 0x0 0x0 0x0
> [ 65.716706] stmmac_tso_xmit(): line = 2842
> [ 65.720802] stmmac_tso_xmit: tcphdrlen 32, hdr_len 66,
> pay_len 0, mss 1448
> [ 65.727669] skb->len 4410, skb->data_len 4344
> [ 65.732114] stmmac_tso_xmit: curr=22 dirty=19 f=20, e=22,
> f_p=00000000b1247b41, nfrags 1
> [ 65.740190] TX descriptor ring:
> [ 65.743327] 000 [0x82005000]: 0x0 0x0 0x0 0x0
> [ 65.747678] 001 [0x82005010]: 0x0 0x0 0x0 0x0
> [ 65.752029] 002 [0x82005020]: 0x0 0x0 0x0 0x0
> [ 65.756378] 003 [0x82005030]: 0x0 0x0 0x0 0x0
> [ 65.760727] 004 [0x82005040]: 0x0 0x0 0x0 0x0
> [ 65.765077] 005 [0x82005050]: 0x0 0x0 0x0 0x0
> [ 65.769427] 006 [0x82005060]: 0x0 0x0 0x0 0x0
> [ 65.773776] 007 [0x82005070]: 0x0 0x0 0x0 0x0
> [ 65.778126] 008 [0x82005080]: 0x0 0x0 0x0 0x0
> [ 65.782476] 009 [0x82005090]: 0x0 0x0 0x0 0x0
> [ 65.786826] 010 [0x820050a0]: 0x0 0x0 0x0 0x0
> [ 65.791176] 011 [0x820050b0]: 0x0 0x0 0x0 0x0
> [ 65.795526] 012 [0x820050c0]: 0x0 0x0 0x0 0x0
> [ 65.799875] 013 [0x820050d0]: 0x0 0x0 0x0 0x0
> [ 65.804225] 014 [0x820050e0]: 0x0 0x0 0x0 0x0
> [ 65.808575] 015 [0x820050f0]: 0x0 0x0 0x0 0x0
> [ 65.812925] 016 [0x82005100]: 0x0 0x0 0x0 0x0
> [ 65.817274] 017 [0x82005110]: 0x0 0x0 0x0 0x0
> [ 65.821625] 018 [0x82005120]: 0x0 0x0 0x0 0x0
> [ 65.825976] 019 [0x82005130]: 0xfb298000 0x0 0x21f0 0x90000000
> [ 65.831800] 020 [0x82005140]: 0xfb2a1000 0x0 0x42 0xa04410f8
> [ 65.837450] 021 [0x82005150]: 0xfb2a2000 0x0 0x10f8 0x90000000
> [ 65.843273] 022 [0x82005160]: 0x0 0x0 0x0 0x0
> [ 65.847622] 023 [0x82005170]: 0x0 0x0 0x0 0x0
> [ 65.851971] 024 [0x82005180]: 0x0 0x0 0x0 0x0
> [ 65.856319] 025 [0x82005190]: 0x0 0x0 0x0 0x0
> [ 65.860670] 026 [0x820051a0]: 0x0 0x0 0x0 0x0
> [ 65.865020] 027 [0x820051b0]: 0x0 0x0 0x0 0x0
> [ 65.869369] 028 [0x820051c0]: 0x0 0x0 0x0 0x0
> [ 65.873719] 029 [0x820051d0]: 0x0 0x0 0x0 0x0
> [ 65.878068] 030 [0x820051e0]: 0x0 0x0 0x0 0x0
> [ 65.882418] 031 [0x820051f0]: 0x0 0x0 0x0 0x0
> [ 65.886767] 032 [0x82005200]: 0x0 0x0 0x0 0x0
> [ 65.891118] 033 [0x82005210]: 0x0 0x0 0x0 0x0
> [ 65.895467] 034 [0x82005220]: 0x0 0x0 0x0 0x0
> [ 65.899816] 035 [0x82005230]: 0x0 0x0 0x0 0x0
> [ 65.904165] 036 [0x82005240]: 0x0 0x0 0x0 0x0
> [ 65.908515] 037 [0x82005250]: 0x0 0x0 0x0 0x0
> [ 65.912865] 038 [0x82005260]: 0x0 0x0 0x0 0x0
> [ 65.917215] 039 [0x82005270]: 0x0 0x0 0x0 0x0
> [ 65.921564] 040 [0x82005280]: 0x0 0x0 0x0 0x0
> [ 65.925915] 041 [0x82005290]: 0x0 0x0 0x0 0x0
> [ 65.930264] 042 [0x820052a0]: 0x0 0x0 0x0 0x0
> [ 65.934615] 043 [0x820052b0]: 0x0 0x0 0x0 0x0
> [ 65.938964] 044 [0x820052c0]: 0x0 0x0 0x0 0x0
> [ 65.943313] 045 [0x820052d0]: 0x0 0x0 0x0 0x0
> [ 65.947664] 046 [0x820052e0]: 0x0 0x0 0x0 0x0
> [ 65.952012] 047 [0x820052f0]: 0x0 0x0 0x0 0x0
> [ 65.956363] 048 [0x82005300]: 0x0 0x0 0x0 0x0
> [ 65.960712] 049 [0x82005310]: 0x0 0x0 0x0 0x0
> [ 65.965061] 050 [0x82005320]: 0x0 0x0 0x0 0x0
> [ 65.969410] 051 [0x82005330]: 0x0 0x0 0x0 0x0
> [ 65.973760] 052 [0x82005340]: 0x0 0x0 0x0 0x0
> [ 65.978110] 053 [0x82005350]: 0x0 0x0 0x0 0x0
> [ 65.982460] 054 [0x82005360]: 0x0 0x0 0x0 0x0
> [ 65.986812] 055 [0x82005370]: 0x0 0x0 0x0 0x0
> [ 65.991161] 056 [0x82005380]: 0x0 0x0 0x0 0x0
> [ 65.995510] 057 [0x82005390]: 0x0 0x0 0x0 0x0
> [ 65.999860] 058 [0x820053a0]: 0x0 0x0 0x0 0x0
> [ 66.004210] 059 [0x820053b0]: 0x0 0x0 0x0 0x0
> [ 66.008559] 060 [0x820053c0]: 0x0 0x0 0x0 0x0
> [ 66.012908] 061 [0x820053d0]: 0x0 0x0 0x0 0x0
> [ 66.017257] 062 [0x820053e0]: 0x0 0x0 0x0 0x0
> [ 66.021607] 063 [0x820053f0]: 0x0 0x0 0x0 0x0
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 184 KBytes 1.50 Mbits/sec 0
> 1.41 KBytes
> [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
> [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41
> KBytes
>
>
> On 5/9/2018 3:35 PM, Jose Abreu wrote:
>> Hi Bhadram,
>>
>> On 09-05-2018 08:18, Bhadram Varka wrote:
>>>
>>> + queue0 {
>>> + snps,weight = <0x10>;
>>
>>> + queue1 {
>>> + snps,weight = <0x10>;
>>>
>>
>>> + queue2 {
>>> + snps,weight = <0x10>;
>>>
>>
>>> + queue3 {
>>> + snps,weight = <0x10>;
>>>
>>
>> This is wrong. You can't use the same weight for all queues.
>> Please try with different weights (for example: 0x10, 0x11, 0x12,
>> 0x13).
>>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
>>
>
^ permalink raw reply
* Significant capacity drop on loopback interface
From: Naruto Nguyen @ 2018-05-10 8:35 UTC (permalink / raw)
To: netdev
Hello everyone,
Recently, I used netperf to test the TCP performance on loopback
interface on my 2 nodes, one is installed kernel 4.4.103 and the other
is 3.12.61
netperf -l 100 -t TCP_RR
netperf -l 100 -t TCP_RR -- -D
In both cases, I see that the throughput on 4.4.103 is about just 1/2
in comparing with 3.12.61 node
# netperf -l 100 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0
AF_INET to localhost () port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 100.00 37714.68
16384 87380
netperf -l 100 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0
AF_INET to localhost () port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 100.00 64038.41
16384 87380
When running tcpdump to capture all packets in loopback interface, I
see that during 200s capture, the number of packets on loopback of
4.4.103 is double the number of packets in 3.12.61? Could you please
let me know if it can cause the low throughput as above? Do we have
any tuning for TCP on loopback to improve the performace (actually the
low throughput also happens with UDP) or if we have any known
performance issue in 4.4 kernel on loopback?
Thanks a lot,
Brs,
Naruto
^ permalink raw reply
* [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
From: gfree.wind @ 2018-05-10 8:28 UTC (permalink / raw)
To: davem, daniel, jakub.kicinski, dsahern, netdev; +Cc: Gao Feng
From: Gao Feng <gfree.wind@vip.163.com>
The skb flow limit is implemented for each CPU independently. In the
current codes, the function skb_flow_limit gets the softnet_data by
this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
the current cpu when enable RPS. As the result, the skb_flow_limit checks
the stats of current CPU, while the skb is going to append the queue of
another CPU. It isn't the expected behavior.
Now pass the softnet_data as a param to softnet_data to make consistent.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
net/core/dev.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index af0558b..0f98eff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3883,18 +3883,15 @@ static int rps_ipi_queued(struct softnet_data *sd)
int netdev_flow_limit_table_len __read_mostly = (1 << 12);
#endif
-static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
+static bool skb_flow_limit(struct softnet_data *sd, struct sk_buff *skb, unsigned int qlen)
{
#ifdef CONFIG_NET_FLOW_LIMIT
struct sd_flow_limit *fl;
- struct softnet_data *sd;
unsigned int old_flow, new_flow;
if (qlen < (netdev_max_backlog >> 1))
return false;
- sd = this_cpu_ptr(&softnet_data);
-
rcu_read_lock();
fl = rcu_dereference(sd->flow_limit);
if (fl) {
@@ -3938,7 +3935,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
if (!netif_running(skb->dev))
goto drop;
qlen = skb_queue_len(&sd->input_pkt_queue);
- if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) {
+ if (qlen <= netdev_max_backlog && !skb_flow_limit(sd, skb, qlen)) {
if (qlen) {
enqueue:
__skb_queue_tail(&sd->input_pkt_queue, skb);
--
1.9.1
^ permalink raw reply related
* Re: [bpf-next PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
From: kbuild test robot @ 2018-05-10 8:10 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: kbuild-all, netdev, Daniel Borkmann, Alexei Starovoitov,
Jesper Dangaard Brouer, Christoph Hellwig, BjörnTöpel,
Magnus Karlsson
In-Reply-To: <152587159495.20423.14022994969026458789.stgit@firesoul>
[-- Attachment #1: Type: text/plain, Size: 1820 bytes --]
Hi Jesper,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/xdp-introduce-bulking-for-ndo_xdp_xmit-API/20180510-134105
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-randconfig-a1-05100951 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
net/core/xdp.o: In function `__xdp_return':
>> net/core/xdp.c:323: undefined reference to `__page_pool_put_page'
vim +323 net/core/xdp.c
310
311 static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct)
312 {
313 struct xdp_mem_allocator *xa;
314 struct page *page;
315
316 switch (mem->type) {
317 case MEM_TYPE_PAGE_POOL:
318 rcu_read_lock();
319 /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
320 xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
321 page = virt_to_head_page(data);
322 if (xa)
> 323 __page_pool_put_page(xa->page_pool, page, napi_direct);
324 else
325 put_page(page);
326 rcu_read_unlock();
327 break;
328 case MEM_TYPE_PAGE_SHARED:
329 page_frag_free(data);
330 break;
331 case MEM_TYPE_PAGE_ORDER0:
332 page = virt_to_page(data); /* Assumes order0 page*/
333 put_page(page);
334 break;
335 default:
336 /* Not possible, checked in xdp_rxq_info_reg_mem_model() */
337 break;
338 }
339 }
340
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29808 bytes --]
^ permalink raw reply
* Re: [RFC v3 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-10 7:34 UTC (permalink / raw)
To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <927f4478-5a81-31d4-ac69-f9ec26248591@redhat.com>
On 2018年05月10日 15:32, Jason Wang wrote:
>
>
> On 2018年04月25日 13:15, Tiwei Bie wrote:
>> + /* We're using some buffers from the free list. */
>> + vq->vq.num_free -= descs_used;
>> +
>> + /* Update free pointer */
>> + if (indirect) {
>> + n = head + 1;
>> + if (n >= vq->vring_packed.num) {
>> + n = 0;
>> + vq->wrap_counter ^= 1;
>> + }
>> + vq->next_avail_idx = n;
>> + } else
>> + vq->next_avail_idx = i;
>
> During testing zerocopy (out of order completion), I found driver may
> submit two identical buffer id to vhost. So the above code may not
> work well.
>
> Consider the case that driver adds 3 buffer and virtqueue size is 8.
>
> a) id = 0,count = 2,next_avail = 2
>
> b) id = 2,count = 4,next_avail = 2
next_avail should be 6 here.
>
> c) id = 4,count = 2,next_avail = 0
>
id should be 6 here.
Thanks
> if packet b is done before packet a, driver may think buffer id 0 is
> available and try to use it if even if the real buffer 0 was not done.
>
> Thanks
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC v3 3/5] virtio_ring: add packed ring support
From: Jason Wang @ 2018-05-10 7:32 UTC (permalink / raw)
To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann
In-Reply-To: <20180425051550.24342-4-tiwei.bie@intel.com>
On 2018年04月25日 13:15, Tiwei Bie wrote:
> + /* We're using some buffers from the free list. */
> + vq->vq.num_free -= descs_used;
> +
> + /* Update free pointer */
> + if (indirect) {
> + n = head + 1;
> + if (n >= vq->vring_packed.num) {
> + n = 0;
> + vq->wrap_counter ^= 1;
> + }
> + vq->next_avail_idx = n;
> + } else
> + vq->next_avail_idx = i;
During testing zerocopy (out of order completion), I found driver may
submit two identical buffer id to vhost. So the above code may not work
well.
Consider the case that driver adds 3 buffer and virtqueue size is 8.
a) id = 0,count = 2,next_avail = 2
b) id = 2,count = 4,next_avail = 2
c) id = 4,count = 2,next_avail = 0
if packet b is done before packet a, driver may think buffer id 0 is
available and try to use it if even if the real buffer 0 was not done.
Thanks
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox