* [PATCH v1 net-next] stmmac: fix DMA channel hang in half-duplex mode
From: Bhadram Varka @ 2018-06-13 4:30 UTC (permalink / raw)
To: peppe.cavallaro, alexandre.torgue, joabreu; +Cc: netdev, narayanr
HW does not support Half-duplex mode in multi-queue
scenario. Fix it by not advertising the Half-Duplex
mode if multi-queue enabled.
Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 11fb7c7..07e748c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -928,6 +928,7 @@ static void stmmac_check_pcs_mode(struct stmmac_priv *priv)
static int stmmac_init_phy(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
+ u32 tx_cnt = priv->plat->tx_queues_to_use;
struct phy_device *phydev;
char phy_id_fmt[MII_BUS_ID_SIZE + 3];
char bus_id[MII_BUS_ID_SIZE];
@@ -969,6 +970,15 @@ static int stmmac_init_phy(struct net_device *dev)
SUPPORTED_1000baseT_Full);
/*
+ * Half-duplex mode not supported with multiqueue
+ * half-duplex can only works with single queue
+ */
+ if (tx_cnt > 1)
+ phydev->supported &= ~(SUPPORTED_1000baseT_Half |
+ SUPPORTED_100baseT_Half |
+ SUPPORTED_10baseT_Half);
+
+ /*
* Broken HW is sometimes missing the pull-up resistor on the
* MDIO line, which results in reads to non-existent devices returning
* 0 rather than 0xffff. Catch this here and treat 0 as a non-existent
--
2.7.4
^ permalink raw reply related
* [PATCH net] neighbour: skip NTF_EXT_LEARNED entries during forced gc
From: Roopa Prabhu @ 2018-06-13 4:26 UTC (permalink / raw)
To: davem; +Cc: netdev
From: Roopa Prabhu <roopa@cumulusnetworks.com>
Commit 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
added support for NTF_EXT_LEARNED for neighbour entries.
NTF_EXT_LEARNED entries are neigh entries managed by control
plane (eg: Ethernet VPN implementation in FRR routing suite).
Periodic gc already excludes these entries. This patch extends
it to forced gc which the earlier patch missed.
Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
net/core/neighbour.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index a7a9c3d..8e3fda9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -119,13 +119,14 @@ unsigned long neigh_rand_reach_time(unsigned long base)
EXPORT_SYMBOL(neigh_rand_reach_time);
-static bool neigh_del(struct neighbour *n, __u8 state,
+static bool neigh_del(struct neighbour *n, __u8 state, __u8 flags,
struct neighbour __rcu **np, struct neigh_table *tbl)
{
bool retval = false;
write_lock(&n->lock);
- if (refcount_read(&n->refcnt) == 1 && !(n->nud_state & state)) {
+ if (refcount_read(&n->refcnt) == 1 && !(n->nud_state & state) &&
+ !(n->flags & flags)) {
struct neighbour *neigh;
neigh = rcu_dereference_protected(n->next,
@@ -157,7 +158,7 @@ bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl)
while ((n = rcu_dereference_protected(*np,
lockdep_is_held(&tbl->lock)))) {
if (n == ndel)
- return neigh_del(n, 0, np, tbl);
+ return neigh_del(n, 0, 0, np, tbl);
np = &n->next;
}
return false;
@@ -185,7 +186,8 @@ static int neigh_forced_gc(struct neigh_table *tbl)
* - nobody refers to it.
* - it is not permanent
*/
- if (neigh_del(n, NUD_PERMANENT, np, tbl)) {
+ if (neigh_del(n, NUD_PERMANENT, NTF_EXT_LEARNED, np,
+ tbl)) {
shrunk = 1;
continue;
}
--
2.1.4
^ permalink raw reply related
* Re: [PATCH iproute2-next v2] ip-xfrm: Add support for OUTPUT_MARK
From: Stephen Hemminger @ 2018-06-13 4:24 UTC (permalink / raw)
To: Lorenzo Colitti
Cc: Subash Abhinov Kasiviswanathan, netdev, David Ahern,
Steffen Klassert
In-Reply-To: <CAKD1Yr119qtuabrPL=MbVGeUgRykTeqCmjvCb1buAQU2yZCKjw@mail.gmail.com>
On Wed, 13 Jun 2018 12:14:53 +0900
Lorenzo Colitti <lorenzo@google.com> wrote:
> On Wed, Jun 13, 2018 at 3:48 AM Subash Abhinov Kasiviswanathan
> <subashab@codeaurora.org> wrote:
> >
> > src 192.168.1.1 dst 192.168.1.2
> > proto esp spi 0x00004321 reqid 0 mode tunnel
> > replay-window 0 flag af-unspec
> > mark 0x10000/0x3ffff
> > output-mark 0x20000
>
> Nit: I don't know what guarantees we provide (if any) that the output
> format of "ip xfrm state" does not change except to add new lines at
> the end. Personally, I feel that an app or script that depends on
> "auth-trunc" (or anything else, really) being on the line immediately
> after "mark" is brittle and should be fixed. This is particularly true
> since in general between the mark and the encryption there might be an
> auth-trunc line, or an auth line, or neither. As such, adding this
> line here seems OK to me.
Scripts should use json mode. If it ever gets added to xfrm output (hint).
^ permalink raw reply
* [iproute2 1/1] rdma: sync some IP headers with glibc
From: Hoang Le @ 2018-06-13 4:09 UTC (permalink / raw)
To: jon.maloy, maloy, ying.xue, netdev, tipc-discussion
In the commit 9a362cc71a45, new userspace header:
(i.e rdma/rdma_user_cm.h -> linux/in6.h)
is included before the kernel space header:
(i.e utils.h -> resolv.h -> netinet/in.h).
This leads to unsynchronous some IP headers and compiler got failure
with error: redefinition of some structs IP.
In this commit, just reorder this including to make them in-sync.
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
---
rdma/rdma.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/rdma/rdma.h b/rdma/rdma.h
index fcaf9e69e07c..d4b7ba1918b1 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -15,6 +15,7 @@
#include <string.h>
#include <errno.h>
#include <getopt.h>
+#include <netinet/in.h>
#include <libmnl/libmnl.h>
#include <rdma/rdma_netlink.h>
#include <rdma/rdma_user_cm.h>
--
2.7.4
^ permalink raw reply related
* Re: [PATCH iproute2-next v2] ip-xfrm: Add support for OUTPUT_MARK
From: Lorenzo Colitti @ 2018-06-13 3:14 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan
Cc: netdev, Stephen Hemminger, David Ahern, Steffen Klassert
In-Reply-To: <1528829293-23222-1-git-send-email-subashab@codeaurora.org>
On Wed, Jun 13, 2018 at 3:48 AM Subash Abhinov Kasiviswanathan
<subashab@codeaurora.org> wrote:
>
> src 192.168.1.1 dst 192.168.1.2
> proto esp spi 0x00004321 reqid 0 mode tunnel
> replay-window 0 flag af-unspec
> mark 0x10000/0x3ffff
> output-mark 0x20000
Nit: I don't know what guarantees we provide (if any) that the output
format of "ip xfrm state" does not change except to add new lines at
the end. Personally, I feel that an app or script that depends on
"auth-trunc" (or anything else, really) being on the line immediately
after "mark" is brittle and should be fixed. This is particularly true
since in general between the mark and the encryption there might be an
auth-trunc line, or an auth line, or neither. As such, adding this
line here seems OK to me.
> @@ -61,6 +61,7 @@ static void usage(void)
> fprintf(stderr, " [ flag FLAG-LIST ] [ sel SELECTOR ] [ LIMIT-LIST ] [ encap ENCAP ]\n");
> fprintf(stderr, " [ coa ADDR[/PLEN] ] [ ctx CTX ] [ extra-flag EXTRA-FLAG-LIST ]\n");
> fprintf(stderr, " [ offload [dev DEV] dir DIR ]\n");
> + fprintf(stderr, " [ output-mark OUTPUT-MARK]\n");
Nit: I think you want a space between OUTPUT-MARK and ].
Other than that,
Acked-by: Lorenzo Colitti <lorenzo@google.com>
^ permalink raw reply
* Re: [PATCH 2/2] r8169: Reinstate ASPM Support
From: Kai Heng Feng @ 2018-06-13 2:59 UTC (permalink / raw)
To: Heiner Kallweit
Cc: davem, ryankao, hayeswang, hau, romieu, bhelgaas, netdev,
linux-pci, linux-kernel
In-Reply-To: <30b033ec-2ac3-6470-be43-06044f87b81d@gmail.com>
Hi Heiner,
> On Jun 13, 2018, at 3:35 AM, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 12.06.2018 11:57, Kai-Heng Feng wrote:
>> On newer Intel platforms, ASPM support in r8169 is the last missing
>> puzzle to let Package C-State achieves PC8. Without ASPM support, the
>> deepest Package C-State can hit is PC3.
>> PC8 can save additional ~3W in comparison with PC3 on my testing
>> platform.
> Maybe we should replace PC8 with "beyond PC3". My system
> (Haswell 2961Y) reaches 50% PC7 + 5% PC9 + 45% PC10 now.
> It never seems to use PC8.
My original wording are really mouthful. I'll update them in next version.
The platform in question is Coffee Lake. This patch should make systems
newer than Skylake to hit > PC3. Older systems may not see significant
change.
I'll also state these info in the next version.
>
>> The original patch is from Realtek.
> Please add a link to this original patch.
Realtek sent me the patch privately. Is it okay to upload the patch to
pastebin or gist?
Kai-Heng
>
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> ---
>> v2:
>> - Remove module parameter.
>> - Remove pci_disable_link_state().
>>
>> drivers/net/ethernet/realtek/r8169.c | 41 +++++++++++++++++++---------
>> 1 file changed, 28 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c
>> b/drivers/net/ethernet/realtek/r8169.c
>> index 9b55ce513a36..85f4e746b040 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -5289,6 +5289,18 @@ static void rtl_pcie_state_l2l3_enable(struct
>> rtl8169_private *tp, bool enable)
>> RTL_W8(tp, Config3, data);
>> }
>>
>> +static void rtl_hw_internal_aspm_clkreq_enable(struct rtl8169_private
>> *tp,
>> + bool enable)
>
> Do we need this hw_internal in the function name?
>
>> +{
>> + if (enable) {
>> + RTL_W8(tp, Config2, RTL_R8(tp, Config2) | ClkReqEn);
>> + RTL_W8(tp, Config5, RTL_R8(tp, Config5) | ASPM_en);
>> + } else {
>> + RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> + RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + }
>> +}
>> +
>> static void rtl_hw_start_8168bb(struct rtl8169_private *tp)
>> {
>> RTL_W8(tp, Config3, RTL_R8(tp, Config3) & ~Beacon_en);
>> @@ -5645,9 +5657,9 @@ static void rtl_hw_start_8168g_1(struct
>> rtl8169_private *tp)
>> rtl_hw_start_8168g(tp);
>>
>> /* disable aspm and clock request before access ephy */
>> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
>> rtl_ephy_init(tp, e_info_8168g_1, ARRAY_SIZE(e_info_8168g_1));
>> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
>> }
>>
>> static void rtl_hw_start_8168g_2(struct rtl8169_private *tp)
>> @@ -5680,9 +5692,9 @@ static void rtl_hw_start_8411_2(struct
>> rtl8169_private *tp)
>> rtl_hw_start_8168g(tp);
>>
>> /* disable aspm and clock request before access ephy */
>> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
>> rtl_ephy_init(tp, e_info_8411_2, ARRAY_SIZE(e_info_8411_2));
>> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
>> }
>>
>> static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
>> @@ -5699,8 +5711,7 @@ static void rtl_hw_start_8168h_1(struct
>> rtl8169_private *tp)
>> };
>>
>> /* disable aspm and clock request before access ephy */
>> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
>> rtl_ephy_init(tp, e_info_8168h_1, ARRAY_SIZE(e_info_8168h_1));
>>
>> RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>> @@ -5779,6 +5790,8 @@ static void rtl_hw_start_8168h_1(struct
>> rtl8169_private *tp)
>> r8168_mac_ocp_write(tp, 0xe63e, 0x0000);
>> r8168_mac_ocp_write(tp, 0xc094, 0x0000);
>> r8168_mac_ocp_write(tp, 0xc09e, 0x0000);
>> +
>> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
>> }
>>
>> static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
>> @@ -5830,11 +5843,12 @@ static void rtl_hw_start_8168ep_1(struct
>> rtl8169_private *tp)
>> };
>>
>> /* disable aspm and clock request before access ephy */
>> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
>> rtl_ephy_init(tp, e_info_8168ep_1, ARRAY_SIZE(e_info_8168ep_1));
>>
>> rtl_hw_start_8168ep(tp);
>> +
>> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
>> }
>>
>> static void rtl_hw_start_8168ep_2(struct rtl8169_private *tp)
>> @@ -5846,14 +5860,15 @@ static void rtl_hw_start_8168ep_2(struct
>> rtl8169_private *tp)
>> };
>>
>> /* disable aspm and clock request before access ephy */
>> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
>> rtl_ephy_init(tp, e_info_8168ep_2, ARRAY_SIZE(e_info_8168ep_2));
>>
>> rtl_hw_start_8168ep(tp);
>>
>> RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) & ~PFM_EN);
>> RTL_W8(tp, MISC_1, RTL_R8(tp, MISC_1) & ~PFM_D3COLD_EN);
>> +
>> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
>> }
>>
>> static void rtl_hw_start_8168ep_3(struct rtl8169_private *tp)
>> @@ -5867,8 +5882,7 @@ static void rtl_hw_start_8168ep_3(struct
>> rtl8169_private *tp)
>> };
>>
>> /* disable aspm and clock request before access ephy */
>> - RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~ClkReqEn);
>> - RTL_W8(tp, Config5, RTL_R8(tp, Config5) & ~ASPM_en);
>> + rtl_hw_internal_aspm_clkreq_enable(tp, false);
>> rtl_ephy_init(tp, e_info_8168ep_3, ARRAY_SIZE(e_info_8168ep_3));
>>
>> rtl_hw_start_8168ep(tp);
>> @@ -5888,6 +5902,8 @@ static void rtl_hw_start_8168ep_3(struct
>> rtl8169_private *tp)
>> data = r8168_mac_ocp_read(tp, 0xe860);
>> data |= 0x0080;
>> r8168_mac_ocp_write(tp, 0xe860, data);
>> +
>> + rtl_hw_internal_aspm_clkreq_enable(tp, true);
>> }
>>
>> static void rtl_hw_start_8168(struct rtl8169_private *tp)
>> @@ -7646,7 +7662,6 @@ static int rtl_init_one(struct pci_dev *pdev,
>> const struct pci_device_id *ent)
>> mii->reg_num_mask = 0x1f;
>> mii->supports_gmii = cfg->has_gmii;
>>
>> -
>> /* enable device (incl. PCI PM wakeup and hotplug setup) */
>> rc = pcim_enable_device(pdev);
>> if (rc < 0) {
^ permalink raw reply
* Re: [PATCH 1/2] r8169: Don't disable ASPM in the driver
From: Kai Heng Feng @ 2018-06-13 2:52 UTC (permalink / raw)
To: Heiner Kallweit
Cc: David Miller, Ryankao, Hayes Wang, Hau, romieu, bhelgaas,
Linux Netdev List, linux-pci, linux-kernel
In-Reply-To: <caa66d98-e5cc-8bcd-1052-cef4ff00c32d@gmail.com>
Hi Heiner,
> On Jun 13, 2018, at 3:30 AM, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 12.06.2018 11:57, Kai-Heng Feng wrote:
>> Enable or disable ASPM should be done in PCI core instead of in the
>> device driver.
>>
>> Commit ba04c7c93bbc ("r8169: disable ASPM") uses
>> pci_disable_link_state() to disable ASPM. This is incorrect, if the
>> device really needs to disable ASPM, we should use a quirk in PCI core
>> to prevent the PCI core from setting ASPM altogether.
> I wouldn't call using pci_disable_link_state() in a driver incorrect
> (as it works), there is just a better way which is more in line with
> the PCI subsystem architecture.
Ok, I'll amend the commit log in next version.
>
>> Let's remove pci_disable_link_state() for now. Use PCI core quirks if
>> any regression happens.
> The vendor driver disables ASPM unconditionally for chip version 25
> (there it's METHOD_9), so I think ASPM support is broken in this chip
> version. I'll cook a PCI quirk.
I actually asked Ryankao about this. He said that variant is more then a
decades old and he can't find why it doesn't support ASPM.
Since METHOD_9 might be a platform issue instead, my intention was to
enable ASPM for all variants. If users hit any issue, then we can introduce
new PCI quirks.
>
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>
> Please note that netdev is closed currently. Once 4.18-RC1 is out it
> will be re-opened. Then please re-submit properly annotating PATCH
> with "net-next" (I've forgotten this often enough myself).
Will do for next version. Thanks!
Kai-Heng
>
>> ---
>> v2:
>> - Remove module parameter.
>> - Remove pci_disable_link_state().
>>
>> drivers/net/ethernet/realtek/r8169.c | 5 -----
>> 1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c
>> b/drivers/net/ethernet/realtek/r8169.c
>> index 75dfac0248f4..9b55ce513a36 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -25,7 +25,6 @@
>> #include <linux/dma-mapping.h>
>> #include <linux/pm_runtime.h>
>> #include <linux/firmware.h>
>> -#include <linux/pci-aspm.h>
>> #include <linux/prefetch.h>
>> #include <linux/ipv6.h>
>> #include <net/ip6_checksum.h>
>> @@ -7647,10 +7646,6 @@ static int rtl_init_one(struct pci_dev *pdev,
>> const struct pci_device_id *ent)
>> mii->reg_num_mask = 0x1f;
>> mii->supports_gmii = cfg->has_gmii;
>>
>> - /* disable ASPM completely as that cause random device stop working
>> - * problems as well as full system hangs for some PCIe devices users */
>> - pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
>> - PCIE_LINK_STATE_CLKPM);
>>
>> /* enable device (incl. PCI PM wakeup and hotplug setup) */
>> rc = pcim_enable_device(pdev);
^ permalink raw reply
* Re: KASAN: use-after-free Read in rds_cong_queue_updates
From: syzbot @ 2018-06-13 2:51 UTC (permalink / raw)
To: davem, linux-kernel, linux-rdma, netdev, rds-devel,
santosh.shilimkar, syzkaller-bugs
In-Reply-To: <089e08e548431cd0f90565c9f4e5@google.com>
syzbot has found a reproducer for the following crash on:
HEAD commit: f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1461f03f800000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
dashboard link: https://syzkaller.appspot.com/bug?extid=4c20b3866171ce8441d2
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=16cbfeaf800000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=165227f7800000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
==================================================================
BUG: KASAN: use-after-free in atomic_read
include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: use-after-free in refcount_read include/linux/refcount.h:42
[inline]
BUG: KASAN: use-after-free in check_net include/net/net_namespace.h:236
[inline]
BUG: KASAN: use-after-free in rds_destroy_pending net/rds/rds.h:897 [inline]
BUG: KASAN: use-after-free in rds_cong_queue_updates+0x255/0x590
net/rds/cong.c:226
Read of size 4 at addr ffff8801ab180044 by task syz-executor199/4800
CPU: 1 PID: 4800 Comm: syz-executor199 Not tainted 4.17.0+ #84
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
refcount_read include/linux/refcount.h:42 [inline]
check_net include/net/net_namespace.h:236 [inline]
rds_destroy_pending net/rds/rds.h:897 [inline]
rds_cong_queue_updates+0x255/0x590 net/rds/cong.c:226
rds_recv_rcvbuf_delta.part.3+0x211/0x350 net/rds/recv.c:126
rds_recv_rcvbuf_delta net/rds/recv.c:735 [inline]
rds_clear_recv_queue+0x2f0/0x4c0 net/rds/recv.c:735
rds_release+0x15c/0x550 net/rds/af_rds.c:72
__sock_release+0xd7/0x260 net/socket.c:603
sock_close+0x19/0x20 net/socket.c:1186
__fput+0x353/0x890 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243
task_work_run+0x1e4/0x290 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1aee/0x2730 kernel/exit.c:865
do_group_exit+0x16f/0x430 kernel/exit.c:968
get_signal+0x886/0x1960 kernel/signal.c:2468
do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
exit_to_usermode_loop+0x2cf/0x360 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44f439
Code: e8 ac be 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 5b ff fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc65567dcf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000006edadc RCX: 000000000044f439
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000006edadc
RBP: 00000000006edad8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff3df31b1f R14: 00007fc65567e9c0 R15: 0000000000000061
Allocated by task 4800:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
kmem_cache_zalloc include/linux/slab.h:696 [inline]
net_alloc net/core/net_namespace.c:383 [inline]
copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
ksys_unshare+0x708/0xf90 kernel/fork.c:2411
__do_sys_unshare kernel/fork.c:2479 [inline]
__se_sys_unshare kernel/fork.c:2477 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2477
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 746:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
net_free net/core/net_namespace.c:399 [inline]
net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
net_drop_ns net/core/net_namespace.c:405 [inline]
cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
The buggy address belongs to the object at ffff8801ab180040
which belongs to the cache net_namespace(17:syz0) of size 8896
The buggy address is located 4 bytes inside of
8896-byte region [ffff8801ab180040, ffff8801ab182300)
The buggy address belongs to the page:
page:ffffea0006ac6000 count:1 mapcount:0 mapping:ffff8801aeaa0080 index:0x0
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff8801d3827048 ffff8801d3827048 ffff8801aeaa0080
raw: 0000000000000000 ffff8801ab180040 0000000100000001 ffff8801ab7cae40
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff8801ab7cae40
Memory state around the buggy address:
ffff8801ab17ff00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8801ab17ff80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff8801ab180000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
ffff8801ab180080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801ab180100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
^ permalink raw reply
* Re: [PATCH net] VSOCK: check sk state before receive
From: Hangbin Liu @ 2018-06-13 1:44 UTC (permalink / raw)
To: Jorgen S. Hansen; +Cc: Stefan Hajnoczi, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <E9BA11C2-0F15-4FFF-8E29-74640E82D046@vmware.com>
On Mon, Jun 04, 2018 at 04:02:39PM +0000, Jorgen S. Hansen wrote:
>
> > On May 30, 2018, at 11:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote:
> >> Hmm...Although I won't reproduce this bug with my reproducer after
> >> apply my patch. I could still get a similiar issue with syzkaller sock vnet test.
> >>
> >> It looks this patch is not complete. Here is the KASAN call trace with my patch.
> >> I can also reproduce it without my patch.
> >
> > Seems like a race between vmci_datagram_destroy_handle() and the
> > delayed callback, vmci_transport_recv_dgram_cb().
> >
> > I don't know the VMCI transport well so I'll leave this to Jorgen.
>
> Yes, it looks like we are calling the delayed callback after we return from vmci_datagram_destroy_handle(). I’ll take a closer look at the VMCI side here - the refcounting of VMCI datagram endpoints should guard against this, since the delayed callback does a get on the datagram resource, so this could a VMCI driver issue, and not a problem in the VMCI transport for AF_VSOCK.
Hi Jorgen,
Thanks for helping look at this. I'm happy to run test for you patch.
Thanks
Hangbin
^ permalink raw reply
* Re: Problems in tc-matchall.8, tc-sample.8
From: Eric S. Raymond @ 2018-06-13 1:12 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20180612172439.6416b4e7@xeon-e3>
[-- Attachment #1: Type: text/plain, Size: 866 bytes --]
Stephen Hemminger <stephen@networkplumber.org>:
> The upstream repositories for master and net-next branch are now
> split. Master branch is at:
> git://git.kernel.org/pub/scm/network/iproute2/iproute2.gti
>
> and patches for next release are in (master branch):
> git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
>
>
> Github is an out of date clone (like all the kernels on there).
OK. Patch fixing markup in 7 files enclosed, with signoff.
No content changes in these patches. The intent is just to fix syntax bugs
so doclifter can do a clean lift to DocBook-XML, from which high-quality
HTML can be generated.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.
[-- Attachment #2: 0001-Markup-fixes-for-various-manual-pages.patch --]
[-- Type: text/x-diff, Size: 2885 bytes --]
>From c06f4f46b0a09ebb21c11ccc894fd827f5a6250a Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Tue, 12 Jun 2018 21:02:38 -0400
Subject: [PATCH] Markup fixes for various manual pages.
Signed-off-by: Eric S. Raymond <esr@thyrsus.com>
---
man/man8/tc-cbq-details.8 | 4 ++--
man/man8/tc-cbq.8 | 4 ++--
man/man8/tc-htb.8 | 4 ++--
man/man8/tc-matchall.8 | 2 --
man/man8/tc-mqprio.8 | 4 ++--
man/man8/tc-prio.8 | 4 ++--
man/man8/tc-sample.8 | 2 --
7 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/man/man8/tc-cbq-details.8 b/man/man8/tc-cbq-details.8
index 9368103b..42027732 100644
--- a/man/man8/tc-cbq-details.8
+++ b/man/man8/tc-cbq-details.8
@@ -4,9 +4,9 @@ CBQ \- Class Based Queueing
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root} [ handle
major:
.B ] cbq avpkt
bytes
diff --git a/man/man8/tc-cbq.8 b/man/man8/tc-cbq.8
index 301265d8..0d958843 100644
--- a/man/man8/tc-cbq.8
+++ b/man/man8/tc-cbq.8
@@ -4,9 +4,9 @@ CBQ \- Class Based Queueing
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] cbq [ allot
bytes
diff --git a/man/man8/tc-htb.8 b/man/man8/tc-htb.8
index ae310f43..b1a364bd 100644
--- a/man/man8/tc-htb.8
+++ b/man/man8/tc-htb.8
@@ -4,9 +4,9 @@ HTB \- Hierarchy Token Bucket
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] htb [ default
minor-id
diff --git a/man/man8/tc-matchall.8 b/man/man8/tc-matchall.8
index e3cddb1f..28969461 100644
--- a/man/man8/tc-matchall.8
+++ b/man/man8/tc-matchall.8
@@ -81,7 +81,5 @@ tc filter add dev eth0 parent ffff: matchall \\
action sample rate 100 group 12
.EE
.RE
-
-.EE
.SH SEE ALSO
.BR tc (8),
diff --git a/man/man8/tc-mqprio.8 b/man/man8/tc-mqprio.8
index a1bedd35..0936b2be 100644
--- a/man/man8/tc-mqprio.8
+++ b/man/man8/tc-mqprio.8
@@ -4,9 +4,9 @@ MQPRIO \- Multiqueue Priority Qdisc (Offloaded Hardware QOS)
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] mqprio [ numtc
tcs
diff --git a/man/man8/tc-prio.8 b/man/man8/tc-prio.8
index 605f3d39..8c5b21dd 100644
--- a/man/man8/tc-prio.8
+++ b/man/man8/tc-prio.8
@@ -4,9 +4,9 @@ PRIO \- Priority qdisc
.SH SYNOPSIS
.B tc qdisc ... dev
dev
-.B ( parent
+.B { parent
classid
-.B | root) [ handle
+.B | root } [ handle
major:
.B ] prio [ bands
bands
diff --git a/man/man8/tc-sample.8 b/man/man8/tc-sample.8
index 3e03eba2..0facd3c5 100644
--- a/man/man8/tc-sample.8
+++ b/man/man8/tc-sample.8
@@ -116,8 +116,6 @@ tc filter add dev eth1 parent ffff: matchall \\
action sample index 19
.EE
.RE
-
-.EE
.RE
.SH SEE ALSO
.BR tc (8),
--
2.17.1
^ permalink raw reply related
* Backport 3c75f6ee139d ("net_sched: sch_htb: add per class overlimits counter")
From: Cong Wang @ 2018-06-13 0:50 UTC (permalink / raw)
To: David Miller; +Cc: Eric Dumazet, Linux Kernel Network Developers
Hi, Dave
Please backport 3c75f6ee139d ("net_sched: sch_htb: add per class
overlimits counter") to the stable branches you take care of.
Technically it doesn't fix any bug, but it is useful for diagnose
purpose. And of course, it is easy to backport too.
Please let me know if you need my help to backport it.
Thanks!
^ permalink raw reply
* Re: [RFC nf-next 0/5] netfilter: add ebpf translation infrastructure
From: Alexei Starovoitov @ 2018-06-13 0:43 UTC (permalink / raw)
To: Florian Westphal
Cc: netfilter-devel, ast, daniel, netdev, David S. Miller, ecree
In-Reply-To: <20180612092812.vptmhuekmpb4pn5z@breakpoint.cc>
On Tue, Jun 12, 2018 at 11:28:12AM +0200, Florian Westphal wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > On Fri, Jun 01, 2018 at 05:32:11PM +0200, Florian Westphal wrote:
> > > The userspace helper translates the rules, and, if successful, installs the
> > > generated program(s) via bpf syscall.
> > >
> > > For each rule a small response containing the corresponding epbf file
> > > descriptor (can be -1 on failure) and a attribute count (how many
> > > expressions were jitted) gets sent back to kernel via pipe.
> > >
> > > If translation fails, the rule is will be processed by nf_tables
> > > interpreter (as before this patch).
> > >
> > > If translation succeeded, nf_tables fetches the bpf program using the file
> > > descriptor identifier, allocates a new rule blob containing the new 'ebpf'
> > > expression (and possible trailing un-translated expressions).
> > >
> > > It then replaces the original rule in the transaction log with the new
> > > 'ebpf-rule'. The original rule is retained in a private area inside the epbf
> > > expression to be able to present the original expressions back to userspace
> > > on 'nft list ruleset'.
> > >
> > > For easier review, this contains the kernel-side only.
> > > nf_tables_jit_work() will not do anything, yet.
> > >
> > > Unresolved issues:
> > > - maps and sets.
> > > It might be possible to add a new ebpf map type that just wraps
> > > the nft set infrastructure for lookups.
> > > This would allow nft userspace to continue to work as-is while
> > > not requiring new ebpf helper.
> > > Anonymous set should be a lot easier as they're immutable
> > > and could probably be handled already by existing infra.
> > >
> > > - BPF_PROG_RUN() is bolted into nft main loop via a middleman expression.
> > > I'm also abusing skb->cb[] to pass network and transport header offsets.
> > > Its not 'public' api so this can be changed later.
> > >
> > > - always uses BPF_PROG_TYPE_SCHED_CLS.
> > > This is because it "works" for current RFC purposes.
> > >
> > > - we should eventually support translating multiple (adjacent) rules
> > > into single program.
> > >
> > > If we do this kernel will need to track mapping of rules to
> > > program (to re-jit when a rule is changed. This isn't implemented
> > > so far, but can be added later. Alternatively, one could also add a
> > > 'readonly' table switch to just prevent further updates.
> > >
> > > We will also need to dump the 'next' generation of the
> > > to-be-translated table. The kernel has this information, so its only
> > > a matter of serializing it back to userspace from the commit phase.
> > >
> > > The jitter is still limited. So far it supports:
> > >
> > > * payload expression for network and transport header
> > > * meta mark, nfproto, l4proto
> > > * 32 bit immediates
> > > * 32 bit bitmask ops
> > > * accept/drop verdicts
> > >
> > > As this uses netlink, there is also no technical requirement for
> > > libnftnl, its simply used here for convienience.
> > >
> > > It doesn't need any userspace changes. Patches for libnftnl and nftables
> > > make debug info available (e.g. to map rule to its bpf prog id).
> > >
> > > Comments welcome.
> >
> > The implementation of patch 5 looks good to me, but I'm concerned with
> > patch 2 that adds 'ebpf expression' to nft. I see no reason to do so.
>
> I think its important user(space) can see which rules are jitted, and
> which ebpf prog corresponds to which rule(s), using an expression as
> container allows to re-use existing nft config plane code to serialze
> this via netlink attributes.
In my mind it would be all or nothing. I don't think it helps
to convert some rules and not all.
> > It seems existing support for infinite number of nft expressions is
> > used as a way to execute infinite number of bpf programs sequentially.
>
> In this RFC, yes.
>
> > I don't think it was a scalable approach before and won't scale in the future.
> > I think the algorithm should consider all nft rules at once and generate
> > a program or two that will execute fast even when number of rules is large.
>
> Yes, but existence of the epbf expression doesn't prevent doing this in
> the future. Doing it now complicates things and given unresolved issues
> (see above cover letter) I'm reluctant to implement this already. The
> UMH in this RFC can translate only a very small subset of
> expressions. To make full-table realistic I think issues outlined above
> need to be addressed first.
>
> It can be done, in such case the epbf expression would replace not just
> rule but possibly all of them.
I think 'all of them' is mandatory. Same for bpfilter.
Existing iptables/nft work as fallback already.
Only when converting all rules we get performance benefit.
Partial converstion only makes things harder to debug and confuse users.
> Netlink dump of such a fully-translated table would have the epbf
> expression at the beginning of the first rule, exposing epbf program id/tag,
> and a list of the nft rule IDs that it replaced. In the extreme (ideal)
> case, it would thus list all rule handle IDs of the chain (including
> those reachable via jump-to-user-defined-chains).
>
> Rest of dump would be as if ebpf did not exist, but these rules would
> all be "dead" from packet-path point of view. They are linked from via
> the nft epbf pseudo-expression, but no different from an arbitrary
> cookie/comment.
>
> As explained above, this also needs kernel to track mapping of
> n nft rules to m ebpf progs, rather than the simple 1:1 mapping done
> in this RFC.
>
> The 1:1 mapping is not being set stone here, its just the inital
> step to get the needed plumbing in, also see "Unresolved issues"
> in cover letter above.
>
> So:
>
> Step 1: 1:1 mapping, an nft rule has at most one ebpf prog.
> Step 2: figure out how to handle maps, sets, and how to cope with
> not-yet-translateable expressions
> Step 3: m:n mapping: kernel provides adjacent rules to the UMH for
> jitting. Example: user appends rules a, b, c. UMH creates
> single ebpf prog from a/b/c.
> nft-pseudo-expression replaces a/b/c in the
> packet path, original rules a/b/c are linked from the pseudo
> expression for tracking. If user deletes rule b, we provide
> a/c to UMH to create new epbf prog that replaces new
> sequence a/c.
> Step 4: always provide entire future base chain and all reachable chains
> to the umh. Ideally all of it is replaced by single program.
Right. I think the first implementation of converter should
be translating all rules at once. Not necessarily all features,
but all rules. Even if 60% of rules can be translated as bpf+trie
there is not much benefit to do that and somehow mix and match
the other 40% of old style iterative rule evaluation.
Algorithms are too different. Iterative will be a drag on trie.
>
> Eventually, entire eval loop could be replaced by ebpf prog.
> But it will need some time to get there -- at this point existing
> nft expressions would no longer provide an ->eval() function.
>
> Does that make sense to you?
>
> If you see this as flawed, please let me know, but as I have no idea
> how to resolve these issues going from 0 to 4 makes no sense to me.
I think the challenge is how to implement 4 without doing step 1, right?
imo doing such 1:1 (single rule to single bpf prog) translation does not
help to break hard problem into smaller pieces. Such 1:1 is great
for prototype, but not to land upstream.
For the same reasons in bpfilter we did single iptable rule to single
bpf prog translation, but such code doesn't belong in upstream tree,
since it's not a scalable approach.
It's too easy to follow that road, but it goes nowhere.
Hence my proposal to invest time into building decision tree based
algorithm coupled with pre- and post- bpf progs that supply 'key'
into decision trie lookup and interpret the result.
This way thousands of basic firewall rules will be translated
in efficient way, but even tiny ruleset with complex features (like
nat) won't be translated and that's ok.
We can build on top algorithm that considers all rules at once,
but not on top of translator that does one rule at a time.
> > There are papers on scalable packet classification algorithms that
> > use decision trees (hicuts, hypercuts, efficuts, etc)
> > Imo that is the direction should we should be looking at.
>
> Okay, but without any idea how to consider existing expressions,
> sets, maps etc. I'm not sure it makes sense to work on that at this
> point.
I think sets and ipset (in case of iptables) fit well into trie model.
> We also have the second problem that the netfilter base hook infra
> (NF_HOOK) already imposes indirect calls on us.
>
> Is there a plan to have a away to replace those indirect calls with
> direct ones? We can't do that easily because most of the functions are
> in modules, but AFAIU ebpf could rewrite that to a sequence of direct
> calls.
Yes. abundance of indirect calls is a separate, but equally important
problem. We need to address both of them.
>
> [..]
>
> > imo this way majority of iptables/nft rules can be converted and
> > performance will be great even with large rulesets.
>
> Oh, I do not doubt that multiple rules can be compiled into single program,
> sorry if the RFC 1:1 mapping was confusing or gave that impression.
I think bpfilter RFC also made folks believe that translating
iptables rules one by one is what we're going to do as well.
I hope this confusion is now resolved.
The kernel doesn't need another sequential match firewall.
^ permalink raw reply
* Re: [bpf PATCH] bpf: selftest fix for sockmap
From: Daniel Borkmann @ 2018-06-13 0:31 UTC (permalink / raw)
To: John Fastabend, ast; +Cc: netdev
In-Reply-To: <20180611184735.31255.51105.stgit@john-Precision-Tower-5810>
On 06/11/2018 08:47 PM, John Fastabend wrote:
> In selftest test_maps the sockmap test case attempts to add a socket
> in listening state to the sockmap. This is no longer a valid operation
> so it fails as expected. However, the test wrongly reports this as an
> error now. Fix the test to avoid adding sockets in listening state.
>
> Fixes: 945ae430aa44 ("bpf: sockmap only allow ESTABLISHED sock state")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
(fyi, discussed with John that this will be enrolled into the set of
fixes he has pending for bpf since the test is related to the one
restricting to ESTABLISHED state.)
^ permalink raw reply
* Re: Problems in tc-matchall.8, tc-sample.8
From: Stephen Hemminger @ 2018-06-13 0:24 UTC (permalink / raw)
To: Eric S. Raymond; +Cc: netdev
In-Reply-To: <20180612234103.GB14546@thyrsus.com>
On Tue, 12 Jun 2018 19:41:03 -0400
"Eric S. Raymond" <esr@thyrsus.com> wrote:
> Stephen Hemminger <stephen@networkplumber.org>:
> > On Tue, 12 Jun 2018 18:00:03 -0400
> > "Eric S. Raymond" <esr@thyrsus.com> wrote:
> >
> > > Stephen Hemminger <stephen@networkplumber.org>:
> > > > Please resubmit as real patch with signed-off-by
> > >
> > > I would like to follow your intructions, but that description leaves me
> > > not quite certain what you want. A git format-patch thing? If so, what
> > > git url should I clone from?
> >
> > iproute patches are handled the same as the Linux kernel.
> > Please submit patches to the netdev@vger.kernel.org with the same kind
> > of diff format (and signed-off-by) as the kernel.
> >
> > Like the kernel, patches which are pure bug fixes go to the master
> > branch, and patches with new functionality are handled with the iproute2-next repository.
>
> Then I should bugfix against this repository?
>
> https://github.com/shemminger/iproute2
The upstream repositories for master and net-next branch are now
split. Master branch is at:
git://git.kernel.org/pub/scm/network/iproute2/iproute2.gti
and patches for next release are in (master branch):
git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
Github is an out of date clone (like all the kernels on there).
^ permalink raw reply
* Re: [PATCH 1/1] ip: add rmnet initial support
From: Stephen Hemminger @ 2018-06-13 0:22 UTC (permalink / raw)
To: Daniele Palmas; +Cc: netdev, Subash Abhinov Kasiviswanathan
In-Reply-To: <1528812777-7512-1-git-send-email-dnlplm@gmail.com>
On Tue, 12 Jun 2018 16:12:57 +0200
Daniele Palmas <dnlplm@gmail.com> wrote:
> This patch adds basic support for Qualcomm rmnet devices.
>
> Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
> ---
> ip/Makefile | 2 +-
> ip/iplink.c | 2 +-
> ip/iplink_rmnet.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 72 insertions(+), 2 deletions(-)
> create mode 100644 ip/iplink_rmnet.c
};
I am glad to see integrated tool support, but this needs to be targeted at
the iproute2-next since it is a new feature.
Some things that I would like to see changed:
1. All of iproute2 is now using SPDX license identifiers, you should not
include GPL boilerplate
2. You should provide dump (print_opt) as well as parse routine.
Output format should use the print_uint (json print) routines.
3. Please update manual page (man/man8/ip-link.8.in) to include the new
option.
^ permalink raw reply
* Re: [PATCH] selftests: bpf: config: add config fragments
From: Daniel Borkmann @ 2018-06-13 0:08 UTC (permalink / raw)
To: Anders Roxell, ast, shuah
Cc: netdev, linux-kernel, linux-kselftest, William Tu
In-Reply-To: <20180612110510.11731-1-anders.roxell@linaro.org>
On 06/12/2018 01:05 PM, Anders Roxell wrote:
> Tests test_tunnel.sh fails due to config fragments ins't enabled.
>
> Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
>
> All tests passes except ip6gretap that still fails. I'm unsure why.
> Ideas?
William (Cc) might be able to help you out.
Applied the one below in the mean-time to bpf, thanks!
> Cheers,
> Anders
>
> tools/testing/selftests/bpf/config | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 1eefe211a4a8..7eb613ffef55 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -7,3 +7,13 @@ CONFIG_CGROUP_BPF=y
> CONFIG_NETDEVSIM=m
> CONFIG_NET_CLS_ACT=y
> CONFIG_NET_SCH_INGRESS=y
> +CONFIG_NET_IPIP=y
> +CONFIG_IPV6=y
> +CONFIG_NET_IPGRE_DEMUX=y
> +CONFIG_NET_IPGRE=y
> +CONFIG_IPV6_GRE=y
> +CONFIG_CRYPTO_USER_API_HASH=m
> +CONFIG_CRYPTO_HMAC=m
> +CONFIG_CRYPTO_SHA256=m
> +CONFIG_VXLAN=y
> +CONFIG_GENEVE=y
>
^ permalink raw reply
* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-13 0:08 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, aaron.f.brown, jiri, kubakici,
netdev, qemu-devel, loseweigh, virtualization
In-Reply-To: <20180612142557-mutt-send-email-mst@kernel.org>
On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote:
> On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote:
>> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:
>>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:
>>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:
>>>>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>>>>> act as a standby for another device with the same MAC address.
>>>>>>
>>>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>>>>> by default as i am using libvirt to start the VMs.
>>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>>>>> XML file?
>>>>>>
>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>> So I do not think we can commit to this interface: we
>>>>> really need to control visibility of the primary device.
>>>> The problem is legacy guest won't use primary device at all if we do this.
>>> And that's by design - I think it's the only way to ensure the
>>> legacy guest isn't confused.
>> Yes. I think so. But i am not sure if Qemu is the right place to control the visibility
>> of the primary device. The primary device may not be specified as an argument to Qemu. It
>> may be plugged in later.
>> The cloud service provider is providing a feature that enables low latency datapath and live
>> migration capability.
>> A tenant can use this feature only if he is running a VM that has virtio-net with failover support.
> Well live migration is there already. The new feature is low latency
> data path.
we get live migration with just virtio. But I meant live migration with VF as
primary device.
>
> And it's the guest that needs failover support not the VM.
Isn't guest and VM synonymous?
>
>
>> I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for
>> an upper layer indicating if the STANDBY feature is successfully negotiated or not.
>> The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links.
>> If VF is successfully hot plugged, virtio-net link should be disabled.
> Did you even talk to upper layer management about it?
> Just list the steps they need to do and you will see
> that's a lot of machinery to manage by the upper layer.
>
> What do we gain in flexibility? As far as I can see the
> only gain is some resources saved for legacy VMs.
>
> That's not a lot as tenant of the upper layer probably already has
> at least a hunch that it's a new guest otherwise
> why bother specifying the feature at all - you
> save even more resources without it.
>
I am not all that familiar with how Qemu manages network devices. If we can do all the
required management of the primary/standby devices within Qemu, that is definitely a better
approach without upper layer involvement.
>
>
>>>> How about control the visibility of standby device?
>>>>
>>>> Thanks
>>> standy the always there to guarantee no downtime.
>>>
>>>>> However just for testing purposes, we could add a non-stable
>>>>> interface "x-standby" with the understanding that as any
>>>>> x- prefix it's unstable and will be changed down the road,
>>>>> likely in the next release.
>>>>>
>>>>>
>>>>>> ---
>>>>>> hw/net/virtio-net.c | 2 ++
>>>>>> include/standard-headers/linux/virtio_net.h | 3 +++
>>>>>> 2 files changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index 90502fca7c..38b3140670 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>> true),
>>>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>>>> + false),
>>>>>> DEFINE_PROP_END_OF_LIST(),
>>>>>> };
>>>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>>>>> index e9f255ea3f..01ec09684c 100644
>>>>>> --- a/include/standard-headers/linux/virtio_net.h
>>>>>> +++ b/include/standard-headers/linux/virtio_net.h
>>>>>> @@ -57,6 +57,9 @@
>>>>>> * Steering */
>>>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>>>>>> + * with the same MAC.
>>>>>> + */
>>>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>>>>>> #ifndef VIRTIO_NET_NO_LEGACY
>>>>>> --
>>>>>> 2.14.3
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH bpf v3] tools/bpftool: fix a bug in bpftool perf
From: Daniel Borkmann @ 2018-06-13 0:04 UTC (permalink / raw)
To: Yonghong Song, ast, netdev; +Cc: kernel-team
In-Reply-To: <20180612053548.901931-1-yhs@fb.com>
On 06/12/2018 07:35 AM, Yonghong Song wrote:
> Commit b04df400c302 ("tools/bpftool: add perf subcommand")
> introduced bpftool subcommand perf to query bpf program
> kuprobe and tracepoint attachments.
>
> The perf subcommand will first test whether bpf subcommand
> BPF_TASK_FD_QUERY is supported in kernel or not. It does it
> by opening a file with argv[0] and feeds the file descriptor
> and current task pid to the kernel for querying.
>
> Such an approach won't work if the argv[0] cannot be opened
> successfully in the current directory. This is especially
> true when bpftool is accessible through PATH env variable.
> The error below reflects the open failure for file argv[0]
> at home directory.
>
> [yhs@localhost ~]$ which bpftool
> /usr/local/sbin/bpftool
> [yhs@localhost ~]$ bpftool perf
> Error: perf_query_support: No such file or directory
>
> To fix the issue, let us open root directory ("/")
> which exists in every linux system. With the fix, the
> error message will correctly reflect the permission issue.
>
> [yhs@localhost ~]$ which bpftool
> /usr/local/sbin/bpftool
> [yhs@localhost ~]$ bpftool perf
> Error: perf_query_support: Operation not permitted
> HINT: non root or kernel doesn't support TASK_FD_QUERY
>
> Fixes: b04df400c302 ("tools/bpftool: add perf subcommand")
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
Applied to bpf, thanks Yonghong!
^ permalink raw reply
* Re: Problems in tc-matchall.8, tc-sample.8
From: Eric S. Raymond @ 2018-06-12 23:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20180612153350.75e77f01@xeon-e3>
Stephen Hemminger <stephen@networkplumber.org>:
> On Tue, 12 Jun 2018 18:00:03 -0400
> "Eric S. Raymond" <esr@thyrsus.com> wrote:
>
> > Stephen Hemminger <stephen@networkplumber.org>:
> > > Please resubmit as real patch with signed-off-by
> >
> > I would like to follow your intructions, but that description leaves me
> > not quite certain what you want. A git format-patch thing? If so, what
> > git url should I clone from?
>
> iproute patches are handled the same as the Linux kernel.
> Please submit patches to the netdev@vger.kernel.org with the same kind
> of diff format (and signed-off-by) as the kernel.
>
> Like the kernel, patches which are pure bug fixes go to the master
> branch, and patches with new functionality are handled with the iproute2-next repository.
Then I should bugfix against this repository?
https://github.com/shemminger/iproute2
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.
^ permalink raw reply
* Re: [PATCH v2] tcp: verify the checksum of the first data segment in a new connection
From: van der Linden, Frank @ 2018-06-12 23:12 UTC (permalink / raw)
To: Eric Dumazet, edumazet@google.com, netdev@vger.kernel.org
In-Reply-To: <212193c0-2fee-7f88-5473-9f5f4c548cb8@gmail.com>
Ok, patch v3 sent.
It was rightly pointed out to me that I shouldn't commit the mortal sin of top posting - but bear with me guys, I'll dig up my 25-year old .muttrc :-)
Frank
On 6/12/18, 3:03 PM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
On 06/12/2018 02:53 PM, van der Linden, Frank wrote:
> The convention seems to be to call tcp_checksum_complete after tcp_filter has a chance to deal with the packet. I wanted to preserve that.
>
> If that is not a concern, then I agree that this is a far better way to go.
>
> Frank
Given that we can drop the packet earlier from :
if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
goto csum_error;
I am quite sure we really do not care of tcp_filter() being
hit or not by packets with bad checksum.
Thanks
^ permalink raw reply
* [PATCH v3] tcp: verify the checksum of the first data segment in a new connection
From: Frank van der Linden @ 2018-06-12 23:09 UTC (permalink / raw)
To: edumazet, netdev; +Cc: fllinden
commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.
But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly. These lower-level processing functions do not do any checksum
verification.
Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
net/ipv4/tcp_ipv4.c | 4 ++++
net/ipv6/tcp_ipv6.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f70586b..ef8cd0f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1689,6 +1689,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
reqsk_put(req);
goto discard_it;
}
+ if (tcp_checksum_complete(skb)) {
+ reqsk_put(req);
+ goto csum_error;
+ }
if (unlikely(sk->sk_state != TCP_LISTEN)) {
inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6d664d8..5d4eb9d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1475,6 +1475,10 @@ static int tcp_v6_rcv(struct sk_buff *skb)
reqsk_put(req);
goto discard_it;
}
+ if (tcp_checksum_complete(skb)) {
+ reqsk_put(req);
+ goto csum_error;
+ }
if (unlikely(sk->sk_state != TCP_LISTEN)) {
inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH 1/1] ip: add rmnet initial support
From: Subash Abhinov Kasiviswanathan @ 2018-06-12 23:06 UTC (permalink / raw)
To: Daniele Palmas; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1528812777-7512-1-git-send-email-dnlplm@gmail.com>
> +
> +static void print_explain(FILE *f)
> +{
> + fprintf(f,
> + "Usage: ... rmnet mux_id MUXID\n"
> + "\n"
> + "MUXID := 1-127\n"
> + );
> +}
Hi Daniele
This range can be from 1-254.
> +
> +static void explain(void)
> +{
> + print_explain(stderr);
> +}
> +
> +static int rmnet_parse_opt(struct link_util *lu, int argc, char
> **argv,
> + struct nlmsghdr *n)
> +{
> + __u16 mux_id;
> +
> + while (argc > 0) {
> + if (matches(*argv, "mux_id") == 0) {
> + NEXT_ARG();
> + if (get_u16(&mux_id, *argv, 0))
> + invarg("mux_id is invalid", *argv);
> + addattr_l(n, 1024, IFLA_RMNET_MUX_ID, &mux_id, 2);
You could use addattr16() instead since it is __u16.
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply
* Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
From: Alexander Duyck @ 2018-06-12 22:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Alexander Duyck, intel-wired-lan, Netdev
In-Reply-To: <20180612105029.77b40381@xeon-e3>
On Tue, Jun 12, 2018 at 10:50 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Tue, 12 Jun 2018 11:18:25 -0400
> Alexander Duyck <alexander.h.duyck@intel.com> wrote:
>
>> This patch series is meant to allow support for the L2 forward offload, aka
>> MACVLAN offload without the need for using ndo_select_queue.
>>
>> The existing solution currently requires that we use ndo_select_queue in
>> the transmit path if we want to associate specific Tx queues with a given
>> MACVLAN interface. In order to get away from this we need to repurpose the
>> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
>> a means of accessing the queues on the lower device. As a result we cannot
>> offload a device that is configured as multiqueue, however it doesn't
>> really make sense to configure a macvlan interfaced as being multiqueue
>> anyway since it doesn't really have a qdisc of its own in the first place.
>>
>> I am submitting this as an RFC for the netdev mailing list, and officially
>> submitting it for testing to Jeff Kirsher's next-queue in order to validate
>> the ixgbe specific bits.
>>
>> The big changes in this set are:
>> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>> Disable XPS for single queue devices
>> Replace accel_priv with sb_dev in ndo_select_queue
>> Add sb_dev parameter to fallback function for ndo_select_queue
>> Consolidated ndo_select_queue functions that appeared to be duplicates
>>
>> v2: Implement generic "select_queue" functions instead of "fallback" functions.
>> Tweak last two patches to account for changes in dev_pick_tx_xxx functions.
>>
>> ---
>>
>> Alexander Duyck (7):
>> net-sysfs: Drop support for XPS and traffic_class on single queue device
>> net: Add support for subordinate device traffic classes
>> ixgbe: Add code to populate and use macvlan tc to Tx queue map
>> net: Add support for subordinate traffic classes to netdev_pick_tx
>> net: Add generic ndo_select_queue functions
>> net: allow ndo_select_queue to pass netdev
>> net: allow fallback function to pass netdev
>>
>>
>> drivers/infiniband/hw/hfi1/vnic_main.c | 2
>> drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c | 4 -
>> drivers/net/bonding/bond_main.c | 3
>> drivers/net/ethernet/amazon/ena/ena_netdev.c | 5 -
>> drivers/net/ethernet/broadcom/bcmsysport.c | 6 -
>> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 +
>> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3
>> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 5 -
>> drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 -
>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++++++--
>> drivers/net/ethernet/lantiq_etop.c | 10 -
>> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 7 +
>> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 3
>> drivers/net/ethernet/mellanox/mlx5/core/en.h | 3
>> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 -
>> drivers/net/ethernet/renesas/ravb_main.c | 3
>> drivers/net/ethernet/sun/ldmvsw.c | 3
>> drivers/net/ethernet/sun/sunvnet.c | 3
>> drivers/net/ethernet/ti/netcp_core.c | 9 -
>> drivers/net/hyperv/netvsc_drv.c | 6 -
>> drivers/net/macvlan.c | 10 -
>> drivers/net/net_failover.c | 7 +
>> drivers/net/team/team.c | 3
>> drivers/net/tun.c | 3
>> drivers/net/wireless/marvell/mwifiex/main.c | 3
>> drivers/net/xen-netback/interface.c | 4 -
>> drivers/net/xen-netfront.c | 3
>> drivers/staging/netlogic/xlr_net.c | 9 -
>> drivers/staging/rtl8188eu/os_dep/os_intfs.c | 3
>> drivers/staging/rtl8723bs/os_dep/os_intfs.c | 7 -
>> include/linux/netdevice.h | 34 ++++-
>> net/core/dev.c | 156 ++++++++++++++++++---
>> net/core/net-sysfs.c | 36 ++++-
>> net/mac80211/iface.c | 4 -
>> net/packet/af_packet.c | 7 +
>> 35 files changed, 312 insertions(+), 130 deletions(-)
>>
>> --
>
> This makes sense. I thought you were hoping to get rid of select queue in future?
That would be nice, however there are still a bunch of corner cases
that are not handled that have been dumped into select queue. For
example in the case of ixgbe the issue is FCoE. There are a number of
other places that are using it as well as I seem to recall netvsc and
bonding both use it to store off the original Rx->Tx queue mapping
when passing through the interface.
For now I figure we can take this one hill at a time and I am just
making it so we don't have to use ndo_select_queue in order to make
vmdq work for macvlan offload.
- Alex
^ permalink raw reply
* Re: [PATCH v2] xen/netfront: raise max number of slots in xennet_get_responses()
From: David Miller @ 2018-06-12 22:43 UTC (permalink / raw)
To: jgross; +Cc: linux-kernel, xen-devel, netdev, boris.ostrovsky
In-Reply-To: <20180612065753.10569-1-jgross@suse.com>
From: Juergen Gross <jgross@suse.com>
Date: Tue, 12 Jun 2018 08:57:53 +0200
> The max number of slots used in xennet_get_responses() is set to
> MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
>
> In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
> difference is resulting in frequent messages "too many slots" and a
> reduced network throughput for some workloads (factor 10 below that of
> a kernel-xen based guest).
>
> Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
> the max number of slots to use solves that problem (tests showed no
> more messages "too many slots" and throughput was as high as with the
> kernel-xen based guest system).
>
> Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in
> netfront_tx_slot_available() for making it clearer what is really being
> tested without actually modifying the tested value.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Applied, thanks.
^ permalink raw reply
* Re: [Patch net] smc: convert to ->poll_mask
From: David Miller @ 2018-06-12 22:37 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: netdev, penguin-kernel, hch, ubraun
In-Reply-To: <20180611210714.3754-1-xiyou.wangcong@gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 11 Jun 2018 14:07:14 -0700
> smc->clcsock is an internal TCP socket, after TCP socket
> converts to ->poll_mask, ->poll doesn't exist any more.
> So just convert smc socket to ->poll_mask too.
>
> Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask")
> Reported-by: syzbot+f5066e369b2d5fff630f@syzkaller.appspotmail.com
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ursula Braun <ubraun@linux.ibm.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Applied, thanks Cong.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox