Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] wimax: i2400: fix memory leak
From: Navid Emamdoost @ 2019-09-10 23:01 UTC (permalink / raw)
  Cc: emamd001, smccaman, kjlu, Navid Emamdoost, Inaky Perez-Gonzalez,
	linux-wimax, David S. Miller, netdev, linux-kernel

In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with
skb response.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 drivers/net/wimax/i2400m/op-rfkill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wimax/i2400m/op-rfkill.c b/drivers/net/wimax/i2400m/op-rfkill.c
index 6642bcb27761..8efb493ceec2 100644
--- a/drivers/net/wimax/i2400m/op-rfkill.c
+++ b/drivers/net/wimax/i2400m/op-rfkill.c
@@ -127,6 +127,7 @@ int i2400m_op_rfkill_sw_toggle(struct wimax_dev *wimax_dev,
 			"%d\n", result);
 	result = 0;
 error_cmd:
+	kfree(cmd);
 	kfree_skb(ack_skb);
 error_msg_to_dev:
 error_alloc:
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] ftgmac100: Disable HW checksum generation on AST2500
From: Vijay Khemka @ 2019-09-10 23:07 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller, YueHaibing, Andrew Lunn,
	Kate Stewart, Mauro Carvalho Chehab, Luis Chamberlain,
	Thomas Gleixner, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: openbmc @ lists . ozlabs . org, Sai Dasari,
	linux-aspeed@lists.ozlabs.org
In-Reply-To: <0797B1F1-883D-4129-AC16-794957ACCF1B@fb.com>



On 9/10/19, 3:50 PM, "Linux-aspeed on behalf of Vijay Khemka" <linux-aspeed-bounces+vijaykhemka=fb.com@lists.ozlabs.org on behalf of vijaykhemka@fb.com> wrote:

    
    
    On 9/10/19, 3:05 PM, "Florian Fainelli" <f.fainelli@gmail.com> wrote:
    
        On 9/10/19 2:37 PM, Vijay Khemka wrote:
        > HW checksum generation is not working for AST2500, specially with IPV6
        > over NCSI. All TCP packets with IPv6 get dropped. By disabling this
        > it works perfectly fine with IPV6.
        > 
        > Verified with IPV6 enabled and can do ssh.
        
        How about IPv4, do these packets have problem? If not, can you continue
        advertising NETIF_F_IP_CSUM but take out NETIF_F_IPV6_CSUM?
    
    I changed code from (netdev->hw_features &= ~NETIF_F_HW_CSUM) to 
    (netdev->hw_features &= ~NETIF_F_ IPV6_CSUM). And it is not working. 
    Don't know why. IPV4 works without any change but IPv6 needs HW_CSUM
    Disabled.

Now I changed to
netdev->hw_features &= (~NETIF_F_HW_CSUM) | NETIF_F_IP_CSUM;
And it works.
        
        > 
        > Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
        > ---
        >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++--
        >  1 file changed, 3 insertions(+), 2 deletions(-)
        > 
        > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
        > index 030fed65393e..591c9725002b 100644
        > --- a/drivers/net/ethernet/faraday/ftgmac100.c
        > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
        > @@ -1839,8 +1839,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
        >  	if (priv->use_ncsi)
        >  		netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
        >  
        > -	/* AST2400  doesn't have working HW checksum generation */
        > -	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
        > +	/* AST2400  and AST2500 doesn't have working HW checksum generation */
        > +	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac") ||
        > +		   of_device_is_compatible(np, "aspeed,ast2500-mac")))
        >  		netdev->hw_features &= ~NETIF_F_HW_CSUM;
        >  	if (np && of_get_property(np, "no-hw-checksum", NULL))
        >  		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
        > 
        
        
        -- 
        Florian
        
    
    


^ permalink raw reply

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
From: Al Viro @ 2019-09-10 23:15 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Carlos Antonio Neira Bustos, netdev@vger.kernel.org,
	ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <dadf3657-2648-14ef-35ee-e09efb2cdb3e@fb.com>

On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
> 
> Carlos,
> 
> Discussed with Eric today for what is the best way to get
> the device number for a namespace. The following patch seems
> a reasonable start although Eric would like to see
> how the helper is used in order to decide whether the
> interface looks right.
> 
> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
> Author: Yonghong Song <yhs@fb.com>
> Date:   Mon Sep 9 21:50:51 2019 -0700
> 
>      nsfs: add an interface function ns_get_inum_dev()
> 
>      This patch added an interface function
>      ns_get_inum_dev(). Given a ns_common structure,
>      the function returns the inode and device
>      numbers. The function will be used later
>      by a newly added bpf helper.
> 
>      Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index a0431642c6b5..a603c6fc3f54 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>          return ERR_PTR(-EINVAL);
>   }
> 
> +/* Get the device number for the current task pidns.
> + */
> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
> +{
> +       *inum = ns->inum;
> +       *dev = nsfs_mnt->mnt_sb->s_dev;
> +}

Umm...  Where would it get the device number once we get (hell knows
what for) multiple nsfs instances?  I still don't understand what
would that be about, TBH...  Is it really per-userns?  Or something
else entirely?  Eric, could you give some context?

^ permalink raw reply

* Re: KASAN: use-after-free Read in rxrpc_send_keepalive
From: syzbot @ 2019-09-10 23:35 UTC (permalink / raw)
  To: MAILER_DAEMON, davem, dhowells, linux-afs, linux-kernel, netdev,
	syzkaller-bugs
In-Reply-To: <000000000000e695c1058fb26925@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    3120b9a6 Merge tag 'ipc-fixes' of git://git.kernel.org/pub..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=107d1ca5600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ed2b148cd67382ec
dashboard link: https://syzkaller.appspot.com/bug?extid=d850c266e3df14da1d31
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17347095600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143bcca5600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in rxrpc_send_keepalive+0xe2/0x3c0  
net/rxrpc/output.c:634
Read of size 8 at addr ffff8880a859a058 by task kworker/0:2/3016

CPU: 0 PID: 3016 Comm: kworker/0:2 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
  print_address_description+0x75/0x5b0 mm/kasan/report.c:351
  __kasan_report+0x14b/0x1c0 mm/kasan/report.c:482
  kasan_report+0x26/0x50 mm/kasan/common.c:618
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
  rxrpc_send_keepalive+0xe2/0x3c0 net/rxrpc/output.c:634
  rxrpc_peer_keepalive_dispatch net/rxrpc/peer_event.c:369 [inline]
  rxrpc_peer_keepalive_worker+0x76e/0xb40 net/rxrpc/peer_event.c:430
  process_one_work+0x7ef/0x10e0 kernel/workqueue.c:2269
  worker_thread+0xc01/0x1630 kernel/workqueue.c:2415
  kthread+0x332/0x350 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 9378:
  save_stack mm/kasan/common.c:69 [inline]
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:493
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:507
  kmem_cache_alloc_trace+0x221/0x2f0 mm/slab.c:3550
  kmalloc include/linux/slab.h:552 [inline]
  kzalloc include/linux/slab.h:748 [inline]
  rxrpc_alloc_connection+0x79/0x490 net/rxrpc/conn_object.c:41
  rxrpc_alloc_client_connection net/rxrpc/conn_client.c:176 [inline]
  rxrpc_get_client_conn net/rxrpc/conn_client.c:339 [inline]
  rxrpc_connect_call+0xb30/0x2c40 net/rxrpc/conn_client.c:697
  rxrpc_new_client_call+0x6d5/0xb60 net/rxrpc/call_object.c:289
  rxrpc_new_client_call_for_sendmsg net/rxrpc/sendmsg.c:595 [inline]
  rxrpc_do_sendmsg+0xf2b/0x19b0 net/rxrpc/sendmsg.c:652
  rxrpc_sendmsg+0x5eb/0x8b0 net/rxrpc/af_rxrpc.c:585
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg net/socket.c:657 [inline]
  ___sys_sendmsg+0x60d/0x910 net/socket.c:2311
  __sys_sendmmsg+0x239/0x470 net/socket.c:2413
  __do_sys_sendmmsg net/socket.c:2442 [inline]
  __se_sys_sendmmsg net/socket.c:2439 [inline]
  __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2439
  do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 16:
  save_stack mm/kasan/common.c:69 [inline]
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x12a/0x1e0 mm/kasan/common.c:455
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:463
  __cache_free mm/slab.c:3425 [inline]
  kfree+0x115/0x200 mm/slab.c:3756
  rxrpc_destroy_connection+0x1ec/0x240 net/rxrpc/conn_object.c:372
  __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
  rcu_do_batch kernel/rcu/tree.c:2114 [inline]
  rcu_core+0x892/0xf10 kernel/rcu/tree.c:2314
  rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
  __do_softirq+0x333/0x7c4 arch/x86/include/asm/paravirt.h:778

The buggy address belongs to the object at ffff8880a859a040
  which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
  1024-byte region [ffff8880a859a040, ffff8880a859a440)
The buggy address belongs to the page:
page:ffffea0002a16680 refcount:1 mapcount:0 mapping:ffff8880aa400c40  
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00024cc688 ffffea0002684d88 ffff8880aa400c40
raw: 0000000000000000 ffff8880a859a040 0000000100000007 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8880a8599f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8880a8599f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8880a859a000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                     ^
  ffff8880a859a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880a859a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


^ permalink raw reply

* Re: [PATCH] net: Remove the source address setting in connect() for UDP
From: Enke Chen (enkechen) @ 2019-09-10 23:55 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xe-linux-external(mailer list), Enke Chen (enkechen)
In-Reply-To: <1DCD31CA-E94F-4127-876F-8DD355E6CF9A@cisco.com>

Hi, David:

Do you still have concerns about backward compatibility of the fix?

I really do not see how existing, working applications would be negatively impacted
by the fix.

Thanks.   -- Enke

-----Original Message-----
From: "Enke Chen (enkechen)" <enkechen@cisco.com>
Date: Friday, September 6, 2019 at 12:23 AM
To: David Miller <davem@davemloft.net>
Cc: "kuznet@ms2.inr.ac.ru" <kuznet@ms2.inr.ac.ru>, "yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "xe-linux-external(mailer list)" <xe-linux-external@cisco.com>, "Enke Chen (enkechen)" <enkechen@cisco.com>
Subject: Re: [PATCH] net: Remove the source address setting in connect() for UDP

Hi, David:

Yes, I understand the code has been there for a long time.  But the issues are real, and it's really nasty when
You run into them.  As I described in the patch log, there is no backward compatibility Issue for fixing it.

---
There is no backward compatibility issue here as the source address setting
in connect() is not needed anyway.

  - No impact on the source address selection when the source address
    is explicitly specified by "bind()", or by the "IP_PKTINFO" option.

  - In the case that the source address is not explicitly specified,
    the selection of the source address would be more accurate and
    reliable based on the up-to-date routing table.
---

Thanks.  -- Enke

-----Original Message-----
From: <linux-kernel-owner@vger.kernel.org> on behalf of David Miller <davem@davemloft.net>
Date: Friday, September 6, 2019 at 12:14 AM
To: "Enke Chen (enkechen)" <enkechen@cisco.com>
Cc: "kuznet@ms2.inr.ac.ru" <kuznet@ms2.inr.ac.ru>, "yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "xe-linux-external(mailer list)" <xe-linux-external@cisco.com>
Subject: Re: [PATCH] net: Remove the source address setting in connect() for UDP

From: Enke Chen <enkechen@cisco.com>
Date: Thu,  5 Sep 2019 19:54:37 -0700

> The connect() system call for a UDP socket is for setting the destination
> address and port. But the current code mistakenly sets the source address
> for the socket as well. Remove the source address setting in connect() for
> UDP in this patch.

Do you have any idea how many decades of precedence this behavior has and
therefore how much you potentially will break userspace?

This boat has sailed a long time ago I'm afraid.



^ permalink raw reply

* [PATCH] net: qrtr: fix memort leak in qrtr_tun_write_iter
From: Navid Emamdoost @ 2019-09-11  0:37 UTC (permalink / raw)
  Cc: emamd001, smccaman, kjlu, Navid Emamdoost, David S. Miller,
	netdev, linux-kernel

In qrtr_tun_write_iter the allocated kbuf should be release in case of
error happening.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 net/qrtr/tun.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/qrtr/tun.c b/net/qrtr/tun.c
index ccff1e544c21..1dba8b92560e 100644
--- a/net/qrtr/tun.c
+++ b/net/qrtr/tun.c
@@ -84,12 +84,18 @@ static ssize_t qrtr_tun_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (!kbuf)
 		return -ENOMEM;
 
-	if (!copy_from_iter_full(kbuf, len, from))
+	if (!copy_from_iter_full(kbuf, len, from)) {
+		kfree(kbuf);
 		return -EFAULT;
+	}
 
 	ret = qrtr_endpoint_post(&tun->ep, kbuf, len);
+	if (ret < 0) {
+		kfree(kbuf);
+		return ret;
+	}
 
-	return ret < 0 ? ret : len;
+	return len;
 }
 
 static __poll_t qrtr_tun_poll(struct file *filp, poll_table *wait)
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA
From: Gomes, Vinicius @ 2019-09-11  0:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: David Miller, f.fainelli@gmail.com, vivien.didelot@gmail.com,
	andrew@lunn.ch, Patel, Vedang, richardcochran@gmail.com,
	Voon, Weifeng, jiri@mellanox.com, m-karicheri2@ti.com,
	Jose.Abreu@synopsys.com, ilias.apalodimas@linaro.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	kurt.kanzenbach@linutronix.de, netdev@vger.kernel.org
In-Reply-To: <CA+h21hoQ-DaFGzALVmGo2mDJancUp5Fndc=o0f4LfD_9yaNi0g@mail.gmail.com>

Hi Vladimir,

[...]

> 
> I'll make sure this subtlety is more clearly formulated in the next version of the
> patch.
> 

Ack.

> Actually let me ask you a few questions as well:
> 
> - I'm trying to understand what is the correct use of the tc-mqprio "queues"
> argument. I've only tested it with "1@0 1@1 1@2 1@3 1@4 1@5
> 1@6 1@7", which I believe is equivalent to not specifying it at all? I believe it
> should be interpreted as: "allocate this many netdev queues for each traffic
> class", where "traffic class" means a group of queues having the same priority
> (equal to the traffic class's number), but engaged in a strict priority scheme with
> other groups of queues (traffic classes). Right?

Specifying the "queues" is mandatory, IIRC. Yeah, your reading of those arguments
for you example matches mine.

So you mean, that you only tested situations when only one queue is "open" at a time?
I think this is another good thing to test.

> 
> - DSA can only formally support multi-queue, because its connection to the Linux
> host is through an Ethernet MAC (FIFO). Even if the DSA master netdevice may
> be multi-queue, allocating and separating those queues for each front-panel
> switch port is a task best left to the user/administrator. This means that DSA
> should reject all other "queues" mappings except the trivial one I pointed to
> above?
> 
> - I'm looking at the "tc_mask_to_queue_mask" function that I'm carrying along
> from your initial offload RFC. Are you sure this is the right approach? I don't feel
> a need to translate from traffic class to netdev queues, considering that in the
> general case, a traffic class is a group of queues, and 802.1Qbv doesn't really
> specify that you can gate individual queues from a traffic class. In the software
> implementation you are only looking at netdev_get_prio_tc_map, which is not
> equivalent as far as my understanding goes, but saner.
> Actually 802.1Q-2018 does not really clarify this either. It looks to me like they
> use the term "queue" and "traffic class" interchangeably.
> See two examples below (emphasis mine):

I spent quite a long time thinking about this, still not sure that I got it right. Let me begin
with the objective for that "translation". Scheduled traffic only makes sense when
the whole network shares the same schedule, so, I wanted a way so I minimize the
amount of information of each schedule that's controller dependent, Linux already 
does most of it with the separation of traffic classes and queues (you are right that 
802.1Q is confusing on this), the idea is that the only thing that needs to change from 
one node to another in the network is the "queues" parameter. Because each node might 
have different number of queues, or assign different priorities to different queues.  

So, that's the idea of doing that intermediate "transformation" step: taprio knows about
traffic classes and HW queues, but the driver only knows about HW queues. And unless I made
a mistake, tc_mask_to_queue_mask() should be equivalent to:  

netdev_get_prio_tc_map() + scanning the gatemask for BIT(tc).

(Thinking more about this, I am having a few ideas about ways to simplify software mode :-)

> 
> Q.2 Using gate operations to create protected windows The enhancements for
> scheduled traffic described in 8.6.8.4 allow transmission to be switched on and
> off on a timed basis for each _traffic class_ that is implemented on a port. This
> switching is achieved by means of individual on/off transmission gates
> associated with each _traffic class_ and a list of gate operations that control the
> gates; an individual SetGateStates operation has a time delay parameter that
> indicates the delay after the gate operation is executed until the next operation
> is to occur, and a GateState parameter that defines a vector of up to eight state
> values (open or
> closed) that is to be applied to each gate when the operation is executed. The
> gate operations allow any combination of open/closed states to be defined, and
> the mechanism makes no assumptions about which _traffic classes_ are being
> “protected” and which are “unprotected”; any such assumptions are left to the
> designer of the sequence of gate operations.
> 
> Table 8-7—Gate operations
> The GateState parameter indicates a value, open or closed, for each of the
> Port’s _queues_.
> 
> - What happens with the "clockid" argument now that hardware offload is
> possible? Do we allow "/dev/ptp0" to be specified as input?
> Actually this question is relevant to your txtime-assist mode as well:
> doesn't it assume that there is an implicit phc2sys instance running to keep the
> system time in sync with the PHC?

That's a very interesting question. I think, for now, allowing specifying /dev/ptp* clocks
won't work "always": if the driver or something needs to add a timer to be able to run 
the schedule, it won't be able to use /dev/ptp* clocks (hrtimers and ptp clocks don’t mix).
But for "full" offloads, it should work.

So, you are right, taprio and txtime-assisted (and ETF) require the system clock and phc 
clock to be synchronized, via something like phc2sys.

Hope I got all your questions.

Cheers,
--
Vinicius


^ permalink raw reply

* Re: [PATCH net-next] tcp: force a PSH flag on TSO packets
From: Neal Cardwell @ 2019-09-11  0:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Eric Dumazet, Soheil Hassas Yeganeh,
	Yuchung Cheng, Daniel Borkmann, Tariq Toukan
In-Reply-To: <20190910214928.220727-1-edumazet@google.com>

On Tue, Sep 10, 2019 at 5:49 PM Eric Dumazet <edumazet@google.com> wrote:
>
> When tcp sends a TSO packet, adding a PSH flag on it
> reduces the sojourn time of GRO packet in GRO receivers.
>
> This is particularly the case under pressure, since RX queues
> receive packets for many concurrent flows.
>
> A sender can give a hint to GRO engines when it is
> appropriate to flush a super-packet, especially when pacing
> is in the picture, since next packet is probably delayed by
> one ms.
>
> Having less packets in GRO engine reduces chance
> of LRU eviction or inflated RTT, and reduces GRO cost.
>
> We found recently that we must not set the PSH flag on
> individual full-size MSS segments [1] :
>
>  Under pressure (CWR state), we better let the packet sit
>  for a small delay (depending on NAPI logic) so that the
>  ACK packet is delayed, and thus next packet we send is
>  also delayed a bit. Eventually the bottleneck queue can
>  be drained. DCTCP flows with CWND=1 have demonstrated
>  the issue.
>
> This patch allows to slowdown the aggregate traffic without
> involving high resolution timers on senders and/or
> receivers.
>
> It has been used at Google for about four years,
> and has been discussed at various networking conferences.
>
> [1] segments smaller than MSS already have PSH flag set
>     by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE
>     has been requested by the user.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>  net/ipv4/tcp_output.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 42abc9bd687a5fea627cbc7cfa750d022f393d84..fec6d67bfd146dc78f0f25173fd71b8b8cc752fe 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1050,11 +1050,22 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
>         tcb = TCP_SKB_CB(skb);
>         memset(&opts, 0, sizeof(opts));
>
> -       if (unlikely(tcb->tcp_flags & TCPHDR_SYN))
> +       if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) {
>                 tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5);
> -       else
> +       } else {
>                 tcp_options_size = tcp_established_options(sk, skb, &opts,
>                                                            &md5);
> +               /* Force a PSH flag on all (GSO) packets to expedite GRO flush
> +                * at receiver : This slightly improve GRO performance.
> +                * Note that we do not force the PSH flag for non GSO packets,
> +                * because they might be sent under high congestion events,
> +                * and in this case it is better to delay the delivery of 1-MSS
> +                * packets and thus the corresponding ACK packet that would
> +                * release the following packet.
> +                */
> +               if (tcp_skb_pcount(skb) > 1)
> +                       tcb->tcp_flags |= TCPHDR_PSH;
> +       }
>         tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
>
>         /* if no packet is in qdisc/device queue, then allow XPS to select

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply

* Re: Strange routing with VRF and 5.2.7+
From: Ben Greear @ 2019-09-11  1:08 UTC (permalink / raw)
  To: netdev
In-Reply-To: <91749b17-7800-44c0-d137-5242b8ceb819@candelatech.com>

On 9/10/19 3:17 PM, Ben Greear wrote:
> Today we were testing creating 200 virtual station vdevs on ath9k, and using
> VRF for the routing.

Looks like the same issue happens w/out VRF, but there I have oodles of routing
rules, so it is an area ripe for failure.

Will upgrade to 5.2.14+ and retest, and try 4.20 as well....

Thanks,
Ben

> 
> This really slows down the machine in question.
> 
> During the minutes that it takes to bring these up and configure them,
> we loose network connectivity on the management port.
> 
> If I do 'ip route show', it just shows the default route out of eth0, and
> the subnet route.  But, if I try to ping the gateway, I get an ICMP error
> coming back from the gateway of one of the virtual stations (which should be
> safely using VRFs and so not in use when I do a plain 'ping' from the shell).
> 
> I tried running tshark on eth0 in the background and running ping, and it captures
> no packets leaving eth0.
> 
> After some time (and during this time, my various scripts will be (re)configuring
> vrfs and stations and related vrf routing tables and such,
> but should *not* be messing with the main routing table, then suddenly
> things start working again.
> 
> I am curious if anyone has seen anything similar or has suggestions for more
> ways to debug this.  It seems reproducible, but it is a pain to
> debug.
> 
> Thanks,
> Ben
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH net-next 03/11] net: aquantia: add basic ptp_clock callbacks
From: kbuild test robot @ 2019-09-09 23:27 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: kbuild-all, netdev@vger.kernel.org, richardcochran@gmail.com,
	davem@davemloft.net, Egor Pomozov, Sergey Samoilenko,
	Dmitry Bezrukov, Igor Russkikh
In-Reply-To: <8868449ec5508f498131ee141399149bf801ea94.1568034880.git.igor.russkikh@aquantia.com>

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-PTP-support-for-AQC-devices/20190909-233041
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/net/ethernet/aquantia/atlantic/aq_nic.o: in function `aq_nic_update_link_status':
   aq_nic.c:(.text+0xe1): undefined reference to `aq_ptp_clock_init'
   ld: drivers/net/ethernet/aquantia/atlantic/aq_ptp.o: in function `aq_ptp_init':
   aq_ptp.c:(.text+0x367): undefined reference to `aq_ptp_clock_init'
   ld: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o: in function `hw_atl_b0_adj_clock_freq':
>> hw_atl_b0.c:(.text+0xad): undefined reference to `__udivdi3'
>> ld: hw_atl_b0.c:(.text+0xc2): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x11b): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x12b): undefined reference to `__divdi3'
>> ld: hw_atl_b0.c:(.text+0x165): undefined reference to `__udivdi3'
   ld: hw_atl_b0.c:(.text+0x177): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x1ca): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x1da): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x247): undefined reference to `__divdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68904 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 03/11] net: aquantia: add basic ptp_clock callbacks
From: kbuild test robot @ 2019-09-10  2:45 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: kbuild-all, netdev@vger.kernel.org, richardcochran@gmail.com,
	davem@davemloft.net, Egor Pomozov, Sergey Samoilenko,
	Dmitry Bezrukov, Igor Russkikh
In-Reply-To: <8868449ec5508f498131ee141399149bf801ea94.1568034880.git.igor.russkikh@aquantia.com>

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-PTP-support-for-AQC-devices/20190909-233041
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ERROR: "aq_ptp_clock_init" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
>> ERROR: "__udivdi3" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!
>> ERROR: "__divdi3" [drivers/net/ethernet/aquantia/atlantic/atlantic.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69589 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 11/11] net: aquantia: add support for PIN funcs
From: kbuild test robot @ 2019-09-10  2:45 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: kbuild-all, netdev@vger.kernel.org, richardcochran@gmail.com,
	davem@davemloft.net, Egor Pomozov, Sergey Samoilenko,
	Dmitry Bezrukov, Igor Russkikh, Nikita Danilov
In-Reply-To: <b3eeb5dd7d38dab79ded5f4b7cca2a84505c8fac.1568034880.git.igor.russkikh@aquantia.com>

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-PTP-support-for-AQC-devices/20190909-233041
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/net/ethernet/aquantia/atlantic/aq_ptp.o: in function `aq_ptp_gpio_feature_enable':
>> aq_ptp.c:(.text+0x7fe): undefined reference to `__umoddi3'
   ld: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.o: in function `hw_atl_b0_adj_clock_freq':
   hw_atl_b0.c:(.text+0xdd): undefined reference to `__udivdi3'
   ld: hw_atl_b0.c:(.text+0xf2): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x14b): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x15b): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x195): undefined reference to `__udivdi3'
   ld: hw_atl_b0.c:(.text+0x1a7): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x1fa): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x20a): undefined reference to `__divdi3'
   ld: hw_atl_b0.c:(.text+0x277): undefined reference to `__divdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68904 bytes --]

^ permalink raw reply

* [PATCH v2 net] net: sonic: replace dev_kfree_skb in sonic_send_packet
From: Mao Wenan @ 2019-09-11  1:36 UTC (permalink / raw)
  To: tsbogend, davem; +Cc: netdev, linux-kernel, kernel-janitors, Mao Wenan
In-Reply-To: <a48a6690-eeb4-91d2-bed8-439d14b63e2f@cogentembedded.com>

sonic_send_packet will be processed in irq or non-irq 
context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.

Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
 v2: change 'none irq' to 'non-irq'.
 drivers/net/ethernet/natsemi/sonic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
index 18fd62fbfb64..b339125b2f09 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -233,7 +233,7 @@ static int sonic_send_packet(struct sk_buff *skb, struct net_device *dev)
 	laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
 	if (!laddr) {
 		pr_err_ratelimited("%s: failed to map tx DMA buffer.\n", dev->name);
-		dev_kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
 	}
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net 1/2] sctp: remove redundant assignment when call sctp_get_port_local
From: maowenan @ 2019-09-11  1:30 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: vyasevich, nhorman, marcelo.leitner, davem, linux-sctp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <20190910192207.GE20699@kadam>



On 2019/9/11 3:22, Dan Carpenter wrote:
> On Tue, Sep 10, 2019 at 09:57:10PM +0300, Dan Carpenter wrote:
>> On Tue, Sep 10, 2019 at 03:13:42PM +0800, Mao Wenan wrote:
>>> There are more parentheses in if clause when call sctp_get_port_local
>>> in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
>>> do cleanup.
>>>
>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>> ---
>>>  net/sctp/socket.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index 9d1f83b10c0a..766b68b55ebe 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -399,9 +399,8 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
>>>  	 * detection.
>>>  	 */
>>>  	addr->v4.sin_port = htons(snum);
>>> -	if ((ret = sctp_get_port_local(sk, addr))) {
>>> +	if (sctp_get_port_local(sk, addr))
>>>  		return -EADDRINUSE;
>>
>> sctp_get_port_local() returns a long which is either 0,1 or a pointer
>> casted to long.  It's not documented what it means and neither of the
>> callers use the return since commit 62208f12451f ("net: sctp: simplify
>> sctp_get_port").
> 
> Actually it was commit 4e54064e0a13 ("sctp: Allow only 1 listening
> socket with SO_REUSEADDR") from 11 years ago.  That patch fixed a bug,
> because before the code assumed that a pointer casted to an int was the
> same as a pointer casted to a long.

commit 4e54064e0a13 treated non-zero return value as unexpected, so the current
cleanup is ok?

> 
> regards,
> dan carpenter
> 
> 
> .
> 


^ permalink raw reply

* Re: [PATCH 0/3] rtlwifi: use generic rtl_evm_db_to_percentage
From: Pkshih @ 2019-09-11  1:31 UTC (permalink / raw)
  To: kvalo@codeaurora.org, straube.linux@gmail.com
  Cc: linux-wireless@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190910190422.63378-1-straube.linux@gmail.com>

On Tue, 2019-09-10 at 21:04 +0200, Michael Straube wrote:
> Functions _rtl92{c,d}_evm_db_to_percentage are functionally identical
> to the generic version rtl_evm_db_to percentage. This series converts
> rtl8192ce, rtl8192cu and rtl8192de to use the generic version.
> 
> Michael Straube (3):
>   rtlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic
>     version
>   rtlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic
>     version
>   rtlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic
>     version
> 
>  .../wireless/realtek/rtlwifi/rtl8192ce/trx.c  | 23 +------------------
>  .../wireless/realtek/rtlwifi/rtl8192cu/mac.c  | 18 +--------------
>  .../wireless/realtek/rtlwifi/rtl8192de/trx.c  | 18 ++-------------
>  3 files changed, 4 insertions(+), 55 deletions(-)
> 

I checked the generic version and removed functions, and they are indeed
identical. Thanks for your patches.

Acked-by: Ping-Ke Shih <pkshih@realtek.com>



^ permalink raw reply

* Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport
From: Tiwei Bie @ 2019-09-11  1:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, kvm, virtualization, netdev, linux-kernel, kwankhede,
	alex.williamson, cohuck, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, idos, xiao.w.wang, haotian.wang
In-Reply-To: <20190910081935.30516-4-jasowang@redhat.com>

On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:
> This path introduces a new mdev transport for virtio. This is used to
> use kernel virtio driver to drive the mediated device that is capable
> of populating virtqueue directly.
> 
> A new virtio-mdev driver will be registered to the mdev bus, when a
> new virtio-mdev device is probed, it will register the device with
> mdev based config ops. This means, unlike the exist hardware
> transport, this is a software transport between mdev driver and mdev
> device. The transport was implemented through:
> 
> - configuration access was implemented through parent_ops->read()/write()
> - vq/config callback was implemented through parent_ops->ioctl()
> 
> This transport is derived from virtio MMIO protocol and was wrote for
> kernel driver. But for the transport itself, but the design goal is to
> be generic enough to support userspace driver (this part will be added
> in the future).
> 
> Note:
> - current mdev assume all the parameter of parent_ops was from
>   userspace. This prevents us from implementing the kernel mdev
>   driver. For a quick POC, this patch just abuse those parameter and
>   assume the mdev device implementation will treat them as kernel
>   pointer. This should be addressed in the formal series by extending
>   mdev_parent_ops.
> - for a quick POC, I just drive the transport from MMIO, I'm pretty
>   there's lot of optimization space for this.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vfio/mdev/Kconfig        |   7 +
>  drivers/vfio/mdev/Makefile       |   1 +
>  drivers/vfio/mdev/virtio_mdev.c  | 500 +++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_mdev.h | 131 ++++++++
>  4 files changed, 639 insertions(+)
>  create mode 100644 drivers/vfio/mdev/virtio_mdev.c
>  create mode 100644 include/uapi/linux/virtio_mdev.h
> 
[...]
> diff --git a/include/uapi/linux/virtio_mdev.h b/include/uapi/linux/virtio_mdev.h
> new file mode 100644
> index 000000000000..8040de6b960a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_mdev.h
> @@ -0,0 +1,131 @@
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + *
> + * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
> + *
> + * This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include <linux/interrupt.h>
> +#include <linux/vringh.h>
> +#include <uapi/linux/virtio_net.h>
> +
> +/*
> + * Ioctls
> + */
> +
> +struct virtio_mdev_callback {
> +	irqreturn_t (*callback)(void *);
> +	void *private;
> +};
> +
> +#define VIRTIO_MDEV 0xAF
> +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
> +					 struct virtio_mdev_callback)
> +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
> +					struct virtio_mdev_callback)
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING		"virtio-mdev"
> +
> +/*
> + * Control registers
> + */
> +
> +/* Magic value ("virt" string) - Read Only */
> +#define VIRTIO_MDEV_MAGIC_VALUE		0x000
> +
> +/* Virtio device version - Read Only */
> +#define VIRTIO_MDEV_VERSION		0x004
> +
> +/* Virtio device ID - Read Only */
> +#define VIRTIO_MDEV_DEVICE_ID		0x008
> +
> +/* Virtio vendor ID - Read Only */
> +#define VIRTIO_MDEV_VENDOR_ID		0x00c
> +
> +/* Bitmask of the features supported by the device (host)
> + * (32 bits per set) - Read Only */
> +#define VIRTIO_MDEV_DEVICE_FEATURES	0x010
> +
> +/* Device (host) features set selector - Write Only */
> +#define VIRTIO_MDEV_DEVICE_FEATURES_SEL	0x014
> +
> +/* Bitmask of features activated by the driver (guest)
> + * (32 bits per set) - Write Only */
> +#define VIRTIO_MDEV_DRIVER_FEATURES	0x020
> +
> +/* Activated features set selector - Write Only */
> +#define VIRTIO_MDEV_DRIVER_FEATURES_SEL	0x024
> +
> +/* Queue selector - Write Only */
> +#define VIRTIO_MDEV_QUEUE_SEL		0x030
> +
> +/* Maximum size of the currently selected queue - Read Only */
> +#define VIRTIO_MDEV_QUEUE_NUM_MAX	0x034
> +
> +/* Queue size for the currently selected queue - Write Only */
> +#define VIRTIO_MDEV_QUEUE_NUM		0x038
> +
> +/* Ready bit for the currently selected queue - Read Write */
> +#define VIRTIO_MDEV_QUEUE_READY		0x044
> +
> +/* Alignment of virtqueue - Read Only */
> +#define VIRTIO_MDEV_QUEUE_ALIGN		0x048
> +
> +/* Queue notifier - Write Only */
> +#define VIRTIO_MDEV_QUEUE_NOTIFY	0x050
> +
> +/* Device status register - Read Write */
> +#define VIRTIO_MDEV_STATUS		0x060
> +
> +/* Selected queue's Descriptor Table address, 64 bits in two halves */
> +#define VIRTIO_MDEV_QUEUE_DESC_LOW	0x080
> +#define VIRTIO_MDEV_QUEUE_DESC_HIGH	0x084
> +
> +/* Selected queue's Available Ring address, 64 bits in two halves */
> +#define VIRTIO_MDEV_QUEUE_AVAIL_LOW	0x090
> +#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH	0x094
> +
> +/* Selected queue's Used Ring address, 64 bits in two halves */
> +#define VIRTIO_MDEV_QUEUE_USED_LOW	0x0a0
> +#define VIRTIO_MDEV_QUEUE_USED_HIGH	0x0a4
> +
> +/* Configuration atomicity value */
> +#define VIRTIO_MDEV_CONFIG_GENERATION	0x0fc
> +
> +/* The config space is defined by each driver as
> + * the per-driver configuration space - Read Write */
> +#define VIRTIO_MDEV_CONFIG		0x100

IIUC, we can use above registers with virtio-mdev parent's
read()/write() to access the mdev device from kernel driver.
As you suggested, it's a choice to build vhost-mdev on top
of this abstraction as well. But virtio is the frontend
device which lacks some vhost backend features, e.g. get
vring base, set vring base, negotiate vhost features, etc.
So I'm wondering, does it make sense to reserve some space
for vhost-mdev in kernel to do vhost backend specific setups?
Or do you have any other thoughts?

Besides, I'm also wondering, what's the purpose of making
above registers part of UAPI? And if we make them part
of UAPI, do we also need to make them part of virtio spec?

Thanks!
Tiwei

> +
> +#endif
> +
> +
> +/* Ready bit for the currently selected queue - Read Write */
> -- 
> 2.19.1
> 

^ permalink raw reply

* Re: [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: tanhuazhong @ 2019-09-11  2:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski, huangguangbin2
In-Reply-To: <20190910.192516.1686418457520996592.davem@davemloft.net>



On 2019/9/11 1:25, David Miller wrote:
> From: Huazhong Tan <tanhuazhong@huawei.com>
> Date: Tue, 10 Sep 2019 16:58:22 +0800
> 
>> +	/* Set to user value, no larger than max_rss_size. */
>> +	if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
>> +	    kinfo->req_rss_size <= max_rss_size) {
>> +		dev_info(&hdev->pdev->dev, "rss changes from %u to %u\n",
>> +			 kinfo->rss_size, kinfo->req_rss_size);
>> +		kinfo->rss_size = kinfo->req_rss_size;
> 
> Please do not emit kernel log messages for normal operations.
> 

Will remove this log in V2.
Thanks.

> .
> 


^ permalink raw reply

* Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport
From: Jason Wang @ 2019-09-11  2:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, kwankhede,
	alex.williamson, cohuck, tiwei.bie, maxime.coquelin,
	cunming.liang, zhihong.wang, rob.miller, idos, xiao.w.wang,
	haotian.wang
In-Reply-To: <20190910094807-mutt-send-email-mst@kernel.org>


On 2019/9/10 下午9:52, Michael S. Tsirkin wrote:
> On Tue, Sep 10, 2019 at 09:13:02PM +0800, Jason Wang wrote:
>> On 2019/9/10 下午6:01, Michael S. Tsirkin wrote:
>>>> +#ifndef _LINUX_VIRTIO_MDEV_H
>>>> +#define _LINUX_VIRTIO_MDEV_H
>>>> +
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/vringh.h>
>>>> +#include <uapi/linux/virtio_net.h>
>>>> +
>>>> +/*
>>>> + * Ioctls
>>>> + */
>>> Pls add a bit more content here. It's redundant to state these
>>> are ioctls. Much better to document what does each one do.
>>
>> Ok.
>>
>>
>>>> +
>>>> +struct virtio_mdev_callback {
>>>> +	irqreturn_t (*callback)(void *);
>>>> +	void *private;
>>>> +};
>>>> +
>>>> +#define VIRTIO_MDEV 0xAF
>>>> +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
>>>> +					 struct virtio_mdev_callback)
>>>> +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
>>>> +					struct virtio_mdev_callback)
>>> Function pointer in an ioctl parameter? How does this ever make sense?
>>
>> I admit this is hacky (casting).
>>
>>
>>> And can't we use a couple of registers for this, and avoid ioctls?
>>
>> Yes, how about something like interrupt numbers for each virtqueue and
>> config?
> Should we just reuse VIRTIO_PCI_COMMON_Q_XXX then?


You mean something like VIRTIO_PCI_COMMON_Q_MSIX? Then it becomes a PCI 
transport in fact. And using either MSIX or irq number is actually 
another layer of indirection. So I think we can just write callback 
function and parameter through registers.


>
>
>>>> +
>>>> +#define VIRTIO_MDEV_DEVICE_API_STRING		"virtio-mdev"
>>>> +
>>>> +/*
>>>> + * Control registers
>>>> + */
>>>> +
>>>> +/* Magic value ("virt" string) - Read Only */
>>>> +#define VIRTIO_MDEV_MAGIC_VALUE		0x000
>>>> +
>>>> +/* Virtio device version - Read Only */
>>>> +#define VIRTIO_MDEV_VERSION		0x004
>>>> +
>>>> +/* Virtio device ID - Read Only */
>>>> +#define VIRTIO_MDEV_DEVICE_ID		0x008
>>>> +
>>>> +/* Virtio vendor ID - Read Only */
>>>> +#define VIRTIO_MDEV_VENDOR_ID		0x00c
>>>> +
>>>> +/* Bitmask of the features supported by the device (host)
>>>> + * (32 bits per set) - Read Only */
>>>> +#define VIRTIO_MDEV_DEVICE_FEATURES	0x010
>>>> +
>>>> +/* Device (host) features set selector - Write Only */
>>>> +#define VIRTIO_MDEV_DEVICE_FEATURES_SEL	0x014
>>>> +
>>>> +/* Bitmask of features activated by the driver (guest)
>>>> + * (32 bits per set) - Write Only */
>>>> +#define VIRTIO_MDEV_DRIVER_FEATURES	0x020
>>>> +
>>>> +/* Activated features set selector - Write Only */
>>>> +#define VIRTIO_MDEV_DRIVER_FEATURES_SEL	0x024
>>>> +
>>>> +/* Queue selector - Write Only */
>>>> +#define VIRTIO_MDEV_QUEUE_SEL		0x030
>>>> +
>>>> +/* Maximum size of the currently selected queue - Read Only */
>>>> +#define VIRTIO_MDEV_QUEUE_NUM_MAX	0x034
>>>> +
>>>> +/* Queue size for the currently selected queue - Write Only */
>>>> +#define VIRTIO_MDEV_QUEUE_NUM		0x038
>>>> +
>>>> +/* Ready bit for the currently selected queue - Read Write */
>>>> +#define VIRTIO_MDEV_QUEUE_READY		0x044
>>> Is this same as started?
>>
>> Do you mean "status"?
> I really meant "enabled", didn't remember the correct name.
> As in:  VIRTIO_PCI_COMMON_Q_ENABLE


Yes, it's the same.

Thanks


>
>>>> +
>>>> +/* Alignment of virtqueue - Read Only */
>>>> +#define VIRTIO_MDEV_QUEUE_ALIGN		0x048
>>>> +
>>>> +/* Queue notifier - Write Only */
>>>> +#define VIRTIO_MDEV_QUEUE_NOTIFY	0x050
>>>> +
>>>> +/* Device status register - Read Write */
>>>> +#define VIRTIO_MDEV_STATUS		0x060
>>>> +
>>>> +/* Selected queue's Descriptor Table address, 64 bits in two halves */
>>>> +#define VIRTIO_MDEV_QUEUE_DESC_LOW	0x080
>>>> +#define VIRTIO_MDEV_QUEUE_DESC_HIGH	0x084
>>>> +
>>>> +/* Selected queue's Available Ring address, 64 bits in two halves */
>>>> +#define VIRTIO_MDEV_QUEUE_AVAIL_LOW	0x090
>>>> +#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH	0x094
>>>> +
>>>> +/* Selected queue's Used Ring address, 64 bits in two halves */
>>>> +#define VIRTIO_MDEV_QUEUE_USED_LOW	0x0a0
>>>> +#define VIRTIO_MDEV_QUEUE_USED_HIGH	0x0a4
>>>> +
>>>> +/* Configuration atomicity value */
>>>> +#define VIRTIO_MDEV_CONFIG_GENERATION	0x0fc
>>>> +
>>>> +/* The config space is defined by each driver as
>>>> + * the per-driver configuration space - Read Write */
>>>> +#define VIRTIO_MDEV_CONFIG		0x100
>>> Mixing device and generic config space is what virtio pci did,
>>> caused lots of problems with extensions.
>>> It would be better to reserve much more space.
>>
>> I see, will do this.
>>
>> Thanks
>>
>>
>>>
>>>> +
>>>> +#endif
>>>> +
>>>> +
>>>> +/* Ready bit for the currently selected queue - Read Write */
>>>> -- 
>>>> 2.19.1

^ permalink raw reply

* [PATCH V2 net-next 5/7] net: hns3: modify some logs format
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

The pfc_en and pfc_map need to be displayed in hexadecimal notation,
printing dma address should use %pad, and the end of printed string
needs to be add "\n".

This patch modifies them.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c      | 7 +++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 5cf4c1e..28961a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -166,6 +166,7 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	struct hns3_enet_ring *ring;
 	u32 tx_index, rx_index;
 	u32 q_num, value;
+	dma_addr_t addr;
 	int cnt;
 
 	cnt = sscanf(&cmd_buf[8], "%u %u", &q_num, &tx_index);
@@ -194,8 +195,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	}
 
 	tx_desc = &ring->desc[tx_index];
+	addr = le64_to_cpu(tx_desc->addr);
 	dev_info(dev, "TX Queue Num: %u, BD Index: %u\n", q_num, tx_index);
-	dev_info(dev, "(TX)addr: 0x%llx\n", tx_desc->addr);
+	dev_info(dev, "(TX)addr: %pad\n", &addr);
 	dev_info(dev, "(TX)vlan_tag: %u\n", tx_desc->tx.vlan_tag);
 	dev_info(dev, "(TX)send_size: %u\n", tx_desc->tx.send_size);
 	dev_info(dev, "(TX)vlan_tso: %u\n", tx_desc->tx.type_cs_vlan_tso);
@@ -217,8 +219,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const char *cmd_buf)
 	rx_index = (cnt == 1) ? value : tx_index;
 	rx_desc	 = &ring->desc[rx_index];
 
+	addr = le64_to_cpu(rx_desc->addr);
 	dev_info(dev, "RX Queue Num: %u, BD Index: %u\n", q_num, rx_index);
-	dev_info(dev, "(RX)addr: 0x%llx\n", rx_desc->addr);
+	dev_info(dev, "(RX)addr: %pad\n", &addr);
 	dev_info(dev, "(RX)l234_info: %u\n", rx_desc->rx.l234_info);
 	dev_info(dev, "(RX)pkt_len: %u\n", rx_desc->rx.pkt_len);
 	dev_info(dev, "(RX)size: %u\n", rx_desc->rx.size);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 816f920..c063301 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -342,7 +342,7 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 	hdev->tm_info.pfc_en = pfc->pfc_en;
 
 	netif_dbg(h, drv, netdev,
-		  "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+		  "set pfc: pfc_en=%x, pfc_map=%x, num_tc=%u\n",
 		  pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
 
 	hclge_tm_pfc_info_update(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 8d4dc1b..bc5bad3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3751,7 +3751,7 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
 	else if (time_after(jiffies, (hdev->last_reset_time + 4 * 5 * HZ)))
 		hdev->reset_level = HNAE3_FUNC_RESET;
 
-	dev_info(&hdev->pdev->dev, "received reset event , reset type is %d",
+	dev_info(&hdev->pdev->dev, "received reset event, reset type is %d\n",
 		 hdev->reset_level);
 
 	/* request reset & schedule reset task */
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 6/7] net: hns3: check NULL pointer before use
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

This patch checks ops->set_default_reset_request whether is NULL
before using it in function hns3_slot_reset.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8dbaf36..616cad0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2006,7 +2006,8 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev *pdev)
 
 	ops = ae_dev->ops;
 	/* request the reset */
-	if (ops->reset_event && ops->get_reset_level) {
+	if (ops->reset_event && ops->get_reset_level &&
+	    ops->set_default_reset_request) {
 		if (ae_dev->hw_err_reset_req) {
 			reset_type = ops->get_reset_level(ae_dev,
 						&ae_dev->hw_err_reset_req);
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 7/7] net: hns3: add some DFX info for reset issue
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

This patch adds more information for reset DFX. Also, adds some
cleanups to reset info, move reset_fail_cnt into struct
hclge_rst_stats, and modifies some print formats.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 ++++++++++++++++------
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 11 ++++----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index 6dcce48..d0128d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -931,22 +931,36 @@ static void hclge_dbg_fd_tcam(struct hclge_dev *hdev)
 
 static void hclge_dbg_dump_rst_info(struct hclge_dev *hdev)
 {
-	dev_info(&hdev->pdev->dev, "PF reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "PF reset count: %u\n",
 		 hdev->rst_stats.pf_rst_cnt);
-	dev_info(&hdev->pdev->dev, "FLR reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "FLR reset count: %u\n",
 		 hdev->rst_stats.flr_rst_cnt);
-	dev_info(&hdev->pdev->dev, "CORE reset count: %d\n",
-		 hdev->rst_stats.core_rst_cnt);
-	dev_info(&hdev->pdev->dev, "GLOBAL reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "GLOBAL reset count: %u\n",
 		 hdev->rst_stats.global_rst_cnt);
-	dev_info(&hdev->pdev->dev, "IMP reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "IMP reset count: %u\n",
 		 hdev->rst_stats.imp_rst_cnt);
-	dev_info(&hdev->pdev->dev, "reset done count: %d\n",
+	dev_info(&hdev->pdev->dev, "reset done count: %u\n",
 		 hdev->rst_stats.reset_done_cnt);
-	dev_info(&hdev->pdev->dev, "HW reset done count: %d\n",
+	dev_info(&hdev->pdev->dev, "HW reset done count: %u\n",
 		 hdev->rst_stats.hw_reset_done_cnt);
-	dev_info(&hdev->pdev->dev, "reset count: %d\n",
+	dev_info(&hdev->pdev->dev, "reset count: %u\n",
 		 hdev->rst_stats.reset_cnt);
+	dev_info(&hdev->pdev->dev, "reset count: %u\n",
+		 hdev->rst_stats.reset_cnt);
+	dev_info(&hdev->pdev->dev, "reset fail count: %u\n",
+		 hdev->rst_stats.reset_fail_cnt);
+	dev_info(&hdev->pdev->dev, "vector0 interrupt enable status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_VECTOR_REG_BASE));
+	dev_info(&hdev->pdev->dev, "reset interrupt source: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_RESET_STS_REG));
+	dev_info(&hdev->pdev->dev, "reset interrupt status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_MISC_VECTOR_INT_STS));
+	dev_info(&hdev->pdev->dev, "hardware reset status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_GLOBAL_RESET_REG));
+	dev_info(&hdev->pdev->dev, "handshake status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_NIC_CSQ_DEPTH_REG));
+	dev_info(&hdev->pdev->dev, "function reset status: 0x%x\n",
+		 hclge_read_dev(&hdev->hw, HCLGE_FUN_RST_ING));
 }
 
 static void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bc5bad3..fd7f943 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3547,12 +3547,12 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev)
 			 "reset failed because new reset interrupt\n");
 		hclge_clear_reset_cause(hdev);
 		return false;
-	} else if (hdev->reset_fail_cnt < MAX_RESET_FAIL_CNT) {
-		hdev->reset_fail_cnt++;
+	} else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
+		hdev->rst_stats.reset_fail_cnt++;
 		set_bit(hdev->reset_type, &hdev->reset_pending);
 		dev_info(&hdev->pdev->dev,
 			 "re-schedule reset task(%d)\n",
-			 hdev->reset_fail_cnt);
+			 hdev->rst_stats.reset_fail_cnt);
 		return true;
 	}
 
@@ -3679,7 +3679,8 @@ static void hclge_reset(struct hclge_dev *hdev)
 	/* ignore RoCE notify error if it fails HCLGE_RESET_MAX_FAIL_CNT - 1
 	 * times
 	 */
-	if (ret && hdev->reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
+	if (ret &&
+	    hdev->rst_stats.reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
 		goto err_reset;
 
 	rtnl_lock();
@@ -3695,7 +3696,7 @@ static void hclge_reset(struct hclge_dev *hdev)
 		goto err_reset;
 
 	hdev->last_reset_time = jiffies;
-	hdev->reset_fail_cnt = 0;
+	hdev->rst_stats.reset_fail_cnt = 0;
 	hdev->rst_stats.reset_done_cnt++;
 	ae_dev->reset_type = HNAE3_NONE_RESET;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 870550f..3e9574a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -659,6 +659,7 @@ struct hclge_rst_stats {
 	u32 global_rst_cnt;	/* the number of GLOBAL */
 	u32 imp_rst_cnt;	/* the number of IMP reset */
 	u32 reset_cnt;		/* the number of reset */
+	u32 reset_fail_cnt;	/* the number of reset fail */
 };
 
 /* time and register status when mac tunnel interruption occur */
@@ -725,7 +726,6 @@ struct hclge_dev {
 	unsigned long reset_request;	/* reset has been requested */
 	unsigned long reset_pending;	/* client rst is pending to be served */
 	struct hclge_rst_stats rst_stats;
-	u32 reset_fail_cnt;
 	u32 fw_version;
 	u16 num_vmdq_vport;		/* Num vmdq vport this PF has set up */
 	u16 num_tqps;			/* Num task queue pairs of this PF */
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 4/7] net: hns3: fix port setting handle for fibre port
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

For hardware doesn't support use specified speed and duplex
to negotiate, it's unnecessary to check and modify the port
speed and duplex for fibre port when autoneg is on.

Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support for fibre port")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index f5a681d..680c350 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -726,6 +726,12 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
 	u8 duplex;
 	int ret;
 
+	/* hw doesn't support use specified speed and duplex to negotiate,
+	 * unnecessary to check them when autoneg on.
+	 */
+	if (cmd->base.autoneg)
+		return 0;
+
 	if (ops->get_ksettings_an_result) {
 		ops->get_ksettings_an_result(handle, &autoneg, &speed, &duplex);
 		if (cmd->base.autoneg == autoneg && cmd->base.speed == speed &&
@@ -787,6 +793,15 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 			return ret;
 	}
 
+	/* hw doesn't support use specified speed and duplex to negotiate,
+	 * ignore them when autoneg on.
+	 */
+	if (cmd->base.autoneg) {
+		netdev_info(netdev,
+			    "autoneg is on, ignore the speed and duplex\n");
+		return 0;
+	}
+
 	if (ops->cfg_mac_speed_dup_h)
 		ret = ops->cfg_mac_speed_dup_h(handle, cmd->base.speed,
 					       cmd->base.duplex);
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 0/7] net: hns3: add a feature & bugfixes & cleanups
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski

This patch-set includes a VF feature, bugfixes and cleanups for the HNS3
ethernet controller driver.

[patch 01/07] adds ethtool_ops.set_channels support for HNS3 VF driver

[patch 02/07] adds a recovery for setting channel fail.

[patch 03/07] fixes an error related to shaper parameter algorithm.

[patch 04/07] fixes an error related to ksetting.

[patch 05/07] adds cleanups for some log pinting.

[patch 06/07] adds a NULL pointer check before function calling.

[patch 07/07] adds some debugging information for reset issue.

Change log:
V1->V2: fixes comment from David Miller.

Guangbin Huang (4):
  net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
  net: hns3: fix port setting handle for fibre port
  net: hns3: modify some logs format
  net: hns3: check NULL pointer before use

Huazhong Tan (1):
  net: hns3: add some DFX info for reset issue

Peng Li (1):
  net: hns3: revert to old channel when setting new channel num fail

Yonglong Liu (1):
  net: hns3: fix shaper parameter algorithm

 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c |  7 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 54 ++++++++++----
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 16 +++++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c |  2 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 ++++++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 13 ++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 11 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 83 ++++++++++++++++++++--
 9 files changed, 174 insertions(+), 46 deletions(-)

-- 
2.7.4


^ permalink raw reply

* [PATCH V2 net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Guangbin Huang <huangguangbin2@huawei.com>

This patch adds ethtool_ops.set_channels support for HNS3 VF driver,
and updates related TQP information and RSS information, to support
modification of VF TQP number, and uses current rss_size instead of
max_rss_size to initialize RSS.

Also, fixes a format error in hclgevf_get_rss().

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  1 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 83 ++++++++++++++++++++--
 2 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index aa692b1..f5a681d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1397,6 +1397,7 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
 	.set_rxfh = hns3_set_rss,
 	.get_link_ksettings = hns3_get_link_ksettings,
 	.get_channels = hns3_get_channels,
+	.set_channels = hns3_set_channels,
 	.get_coalesce = hns3_get_coalesce,
 	.set_coalesce = hns3_set_coalesce,
 	.get_regs_len = hns3_get_regs_len,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 594cae8..e3090b3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -743,7 +743,7 @@ static int hclgevf_get_rss(struct hnae3_handle *handle, u32 *indir, u8 *key,
 }
 
 static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
-			   const  u8 *key, const  u8 hfunc)
+			   const u8 *key, const u8 hfunc)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
 	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
@@ -2060,9 +2060,10 @@ static int hclgevf_config_gro(struct hclgevf_dev *hdev, bool en)
 static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 {
 	struct hclgevf_rss_cfg *rss_cfg = &hdev->rss_cfg;
-	int i, ret;
+	int ret;
+	u32 i;
 
-	rss_cfg->rss_size = hdev->rss_size_max;
+	rss_cfg->rss_size = hdev->nic.kinfo.rss_size;
 
 	if (hdev->pdev->revision >= 0x21) {
 		rss_cfg->hash_algo = HCLGEVF_RSS_HASH_ALGO_SIMPLE;
@@ -2099,13 +2100,13 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 
 	/* Initialize RSS indirect table */
 	for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
-		rss_cfg->rss_indirection_tbl[i] = i % hdev->rss_size_max;
+		rss_cfg->rss_indirection_tbl[i] = i % rss_cfg->rss_size;
 
 	ret = hclgevf_set_rss_indir_table(hdev);
 	if (ret)
 		return ret;
 
-	return hclgevf_set_rss_tc_mode(hdev, hdev->rss_size_max);
+	return hclgevf_set_rss_tc_mode(hdev, rss_cfg->rss_size);
 }
 
 static int hclgevf_init_vlan_config(struct hclgevf_dev *hdev)
@@ -2835,6 +2836,77 @@ static void hclgevf_get_tqps_and_rss_info(struct hnae3_handle *handle,
 	*max_rss_size = hdev->rss_size_max;
 }
 
+static void hclgevf_update_rss_size(struct hnae3_handle *handle,
+				    u32 new_tqps_num)
+{
+	struct hnae3_knic_private_info *kinfo = &handle->kinfo;
+	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	u16 max_rss_size;
+
+	kinfo->req_rss_size = new_tqps_num;
+
+	max_rss_size = min_t(u16, hdev->rss_size_max,
+			     hdev->num_tqps / kinfo->num_tc);
+
+	/* Use the user's configuration when it is not larger than
+	 * max_rss_size, otherwise, use the maximum specification value.
+	 */
+	if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
+	    kinfo->req_rss_size <= max_rss_size)
+		kinfo->rss_size = kinfo->req_rss_size;
+	else if (kinfo->rss_size > max_rss_size ||
+		 (!kinfo->req_rss_size && kinfo->rss_size < max_rss_size))
+		kinfo->rss_size = max_rss_size;
+
+	kinfo->num_tqps = kinfo->num_tc * kinfo->rss_size;
+}
+
+static int hclgevf_set_channels(struct hnae3_handle *handle, u32 new_tqps_num,
+				bool rxfh_configured)
+{
+	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+	struct hnae3_knic_private_info *kinfo = &handle->kinfo;
+	u16 cur_rss_size = kinfo->rss_size;
+	u16 cur_tqps = kinfo->num_tqps;
+	u32 *rss_indir;
+	unsigned int i;
+	int ret;
+
+	hclgevf_update_rss_size(handle, new_tqps_num);
+
+	ret = hclgevf_set_rss_tc_mode(hdev, kinfo->rss_size);
+	if (ret)
+		return ret;
+
+	/* RSS indirection table has been configuared by user */
+	if (rxfh_configured)
+		goto out;
+
+	/* Reinitializes the rss indirect table according to the new RSS size */
+	rss_indir = kcalloc(HCLGEVF_RSS_IND_TBL_SIZE, sizeof(u32), GFP_KERNEL);
+	if (!rss_indir)
+		return -ENOMEM;
+
+	for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
+		rss_indir[i] = i % kinfo->rss_size;
+
+	ret = hclgevf_set_rss(handle, rss_indir, NULL, 0);
+	if (ret)
+		dev_err(&hdev->pdev->dev, "set rss indir table fail, ret=%d\n",
+			ret);
+
+	kfree(rss_indir);
+
+out:
+	if (!ret)
+		dev_info(&hdev->pdev->dev,
+			 "Channels changed, rss_size from %u to %u, tqps from %u to %u",
+			 cur_rss_size, kinfo->rss_size,
+			 cur_tqps, kinfo->rss_size * kinfo->num_tc);
+
+	return ret;
+}
+
 static int hclgevf_get_status(struct hnae3_handle *handle)
 {
 	struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
@@ -3042,6 +3114,7 @@ static const struct hnae3_ae_ops hclgevf_ops = {
 	.enable_hw_strip_rxvtag = hclgevf_en_hw_strip_rxvtag,
 	.reset_event = hclgevf_reset_event,
 	.set_default_reset_request = hclgevf_set_def_reset_request,
+	.set_channels = hclgevf_set_channels,
 	.get_channels = hclgevf_get_channels,
 	.get_tqps_and_rss_info = hclgevf_get_tqps_and_rss_info,
 	.get_regs_len = hclgevf_get_regs_len,
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 2/7] net: hns3: revert to old channel when setting new channel num fail
From: Huazhong Tan @ 2019-09-11  2:40 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	jakub.kicinski
In-Reply-To: <1568169639-43658-1-git-send-email-tanhuazhong@huawei.com>

From: Peng Li <lipeng321@huawei.com>

After setting new channel num, it needs free old ring memory and
allocate new ring memory. If there is no enough memory and allocate
new ring memory fail, the ring may initialize fail. To make sure
the network interface can work normally, driver should revert the
channel to the old configuration.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 51 ++++++++++++++++++-------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 9f3f8e3..8dbaf36 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4410,6 +4410,30 @@ static int hns3_reset_notify(struct hnae3_handle *handle,
 	return ret;
 }
 
+static int hns3_change_channels(struct hnae3_handle *handle, u32 new_tqp_num,
+				bool rxfh_configured)
+{
+	int ret;
+
+	ret = handle->ae_algo->ops->set_channels(handle, new_tqp_num,
+						 rxfh_configured);
+	if (ret) {
+		dev_err(&handle->pdev->dev,
+			"Change tqp num(%u) fail.\n", new_tqp_num);
+		return ret;
+	}
+
+	ret = hns3_reset_notify(handle, HNAE3_INIT_CLIENT);
+	if (ret)
+		return ret;
+
+	ret =  hns3_reset_notify(handle, HNAE3_UP_CLIENT);
+	if (ret)
+		hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT);
+
+	return ret;
+}
+
 int hns3_set_channels(struct net_device *netdev,
 		      struct ethtool_channels *ch)
 {
@@ -4450,24 +4474,23 @@ int hns3_set_channels(struct net_device *netdev,
 		return ret;
 
 	org_tqp_num = h->kinfo.num_tqps;
-	ret = h->ae_algo->ops->set_channels(h, new_tqp_num, rxfh_configured);
+	ret = hns3_change_channels(h, new_tqp_num, rxfh_configured);
 	if (ret) {
-		ret = h->ae_algo->ops->set_channels(h, org_tqp_num,
-						    rxfh_configured);
-		if (ret) {
-			/* If revert to old tqp failed, fatal error occurred */
-			dev_err(&netdev->dev,
-				"Revert to old tqp num fail, ret=%d", ret);
-			return ret;
+		int ret1;
+
+		netdev_warn(netdev,
+			    "Change channels fail, revert to old value\n");
+		ret1 = hns3_change_channels(h, org_tqp_num, rxfh_configured);
+		if (ret1) {
+			netdev_err(netdev,
+				   "revert to old channel fail\n");
+			return ret1;
 		}
-		dev_info(&netdev->dev,
-			 "Change tqp num fail, Revert to old tqp num");
-	}
-	ret = hns3_reset_notify(h, HNAE3_INIT_CLIENT);
-	if (ret)
+
 		return ret;
+	}
 
-	return hns3_reset_notify(h, HNAE3_UP_CLIENT);
+	return 0;
 }
 
 static const struct hns3_hw_error_info hns3_hw_err[] = {
-- 
2.7.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox