Netdev List
 help / color / mirror / Atom feed
* [PATCH v3 1/5] VSOCK: export socket tables for sock_diag interface
From: Stefan Hajnoczi @ 2017-10-05 20:46 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Jorgen Hansen, Dexuan Cui, Stefan Hajnoczi
In-Reply-To: <20171005204654.2737-1-stefanha@redhat.com>

The socket table symbols need to be exported from vsock.ko so that the
vsock_diag.ko module will be able to traverse sockets.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/net/af_vsock.h   |  5 +++++
 net/vmw_vsock/af_vsock.c | 10 ++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f9fb566e75cf..30cba806e344 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -27,6 +27,11 @@
 
 #define LAST_RESERVED_PORT 1023
 
+#define VSOCK_HASH_SIZE         251
+extern struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+extern struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+extern spinlock_t vsock_table_lock;
+
 #define vsock_sk(__sk)    ((struct vsock_sock *)__sk)
 #define sk_vsock(__vsk)   (&(__vsk)->sk)
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index dfc8c51e4d74..9afe4da8c67d 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -153,7 +153,6 @@ EXPORT_SYMBOL_GPL(vm_sockets_get_local_cid);
  * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.  The hash function
  * mods with VSOCK_HASH_SIZE to ensure this.
  */
-#define VSOCK_HASH_SIZE         251
 #define MAX_PORT_RETRIES        24
 
 #define VSOCK_HASH(addr)        ((addr)->svm_port % VSOCK_HASH_SIZE)
@@ -168,9 +167,12 @@ EXPORT_SYMBOL_GPL(vm_sockets_get_local_cid);
 #define vsock_connected_sockets_vsk(vsk)				\
 	vsock_connected_sockets(&(vsk)->remote_addr, &(vsk)->local_addr)
 
-static struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
-static struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
-static DEFINE_SPINLOCK(vsock_table_lock);
+struct list_head vsock_bind_table[VSOCK_HASH_SIZE + 1];
+EXPORT_SYMBOL_GPL(vsock_bind_table);
+struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
+EXPORT_SYMBOL_GPL(vsock_connected_table);
+DEFINE_SPINLOCK(vsock_table_lock);
+EXPORT_SYMBOL_GPL(vsock_table_lock);
 
 /* Autobind this socket to the local address if necessary. */
 static int vsock_auto_bind(struct vsock_sock *vsk)
-- 
2.13.6

^ permalink raw reply related

* [PATCH v3 0/5] VSOCK: add sock_diag interface
From: Stefan Hajnoczi @ 2017-10-05 20:46 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Jorgen Hansen, Dexuan Cui, Stefan Hajnoczi

v3:
 * Rebased onto net-next/master and resolved Hyper-V transport conflict

v2:
 * Moved tests to tools/testing/vsock/.  I was unable to put them in selftests/
   because they require manual setup of a VMware/KVM guest.
 * Moved to __vsock_in_bound/connected_table() to af_vsock.h
 * Fixed local variable ordering in Patch 4

There is currently no way for userspace to query open AF_VSOCK sockets.  This
means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets.

This patch series adds the netlink sock_diag interface for AF_VSOCK.  Userspace
programs sent a DUMP request including an sk_state bitmap to filter sockets
based on their state (connected, listening, etc).  The vsock_diag.ko module
replies with information about matching sockets.  This userspace ABI is defined
in <linux/vm_sockets_diag.h>.

The final patch adds a test suite that exercises the basic cases.

Jorgen and Dexuan: I have only tested the virtio transport but this should also
work for VMCI and Hyper-V.  Please give it a shot if you have time.

Stefan Hajnoczi (5):
  VSOCK: export socket tables for sock_diag interface
  VSOCK: move __vsock_in_bound/connected_table() to af_vsock.h
  VSOCK: use TCP state constants for sk_state
  VSOCK: add sock_diag interface
  VSOCK: add tools/testing/vsock/vsock_diag_test

 MAINTAINERS                                  |   3 +
 net/vmw_vsock/Makefile                       |   3 +
 tools/testing/vsock/Makefile                 |   9 +
 include/net/af_vsock.h                       |  20 +-
 include/uapi/linux/vm_sockets_diag.h         |  33 ++
 tools/testing/vsock/control.h                |  13 +
 tools/testing/vsock/timeout.h                |  14 +
 net/vmw_vsock/af_vsock.c                     |  66 +--
 net/vmw_vsock/diag.c                         | 186 ++++++++
 net/vmw_vsock/hyperv_transport.c             |  12 +-
 net/vmw_vsock/virtio_transport.c             |   2 +-
 net/vmw_vsock/virtio_transport_common.c      |  22 +-
 net/vmw_vsock/vmci_transport.c               |  34 +-
 net/vmw_vsock/vmci_transport_notify.c        |   2 +-
 net/vmw_vsock/vmci_transport_notify_qstate.c |   2 +-
 tools/testing/vsock/control.c                | 219 +++++++++
 tools/testing/vsock/timeout.c                |  64 +++
 tools/testing/vsock/vsock_diag_test.c        | 681 +++++++++++++++++++++++++++
 net/vmw_vsock/Kconfig                        |  10 +
 tools/testing/vsock/.gitignore               |   2 +
 tools/testing/vsock/README                   |  36 ++
 21 files changed, 1360 insertions(+), 73 deletions(-)
 create mode 100644 tools/testing/vsock/Makefile
 create mode 100644 include/uapi/linux/vm_sockets_diag.h
 create mode 100644 tools/testing/vsock/control.h
 create mode 100644 tools/testing/vsock/timeout.h
 create mode 100644 net/vmw_vsock/diag.c
 create mode 100644 tools/testing/vsock/control.c
 create mode 100644 tools/testing/vsock/timeout.c
 create mode 100644 tools/testing/vsock/vsock_diag_test.c
 create mode 100644 tools/testing/vsock/.gitignore
 create mode 100644 tools/testing/vsock/README

-- 
2.13.6

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: Kalderon, Michal @ 2017-10-05 20:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel
In-Reply-To: <20171005.120629.2161199733119811102.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Sent: Thursday, October 5, 2017 10:06 PM
>> From: Kalderon, Michal
>> Sent: Tuesday, October 3, 2017 9:05 PM
>> To: David Miller
>>>From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>>>Sent: Tuesday, October 3, 2017 8:17 PM
>>>>> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
>>>>>  }
>>>>>
>>>>>  static int
>>>>> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
>>>>> +                     struct qed_ll2_info *p_ll2_conn,
>>>>> +                     union core_rx_cqe_union *p_cqe,
>>>>> +                     unsigned long *p_lock_flags)
>>>>> +{
>>>>...
>>>>> +     spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);
>>>>> +
>>>>
>>>>You can't drop this lock.
>>>>
>>>>Another thread can enter the loop of our caller and process RX queue
>>>>entries, then we would return from here and try to process the same
>>>>entries again.
>>>
>>>The lock is there to synchronize access to chains between qed_ll2_rxq_completion
>>>and qed_ll2_post_rx_buffer. qed_ll2_rxq_completion can't be called from
>>>different threads, the light l2 uses the single sp status block we have.
>>>The reason we release the lock is to avoid a deadlock where as a result of calling
>>>upper-layer driver it will potentially post additional rx-buffers.
>>
>> Dave, is there anything else needed from me on this?
>> Noticed the series is still in "Changes Requested".
>
>I'm still not convinced that the lock dropping is legitimate.  What if a
>spurious interrupt arrives?
We're in the context of a dedicated tasklet here. So even if there is a spurious
interrupt, we're covered.

>
>If the execution path in the caller is serialized for some reason, why
>are you using a spinlock and don't use that serialization for the mutual
>exclusion necessary for these queue indexes?
Posting of rx-buffers back to the light-l2 is not always serialized and can be
called from different threads depending on the light-l2 client.
Unlocking before calling the callback enables the cb function to post rx buffers,
in this case, serialization protects us. The spinlock is required for the case
that rx buffers are posted from a different thread, where it could be run
simultaneously to the rxq_completion.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next 1/1] [net] bonding: Add NUMA notice
From: Patrick Talbert @ 2017-10-05 20:23 UTC (permalink / raw)
  To: netdev; +Cc: Patrick Talbert

Network performance can suffer when a load balancing bond uses slave
interfaces which are in different NUMA domains.

This compares the NUMA domain of a newly enslaved interface against any
existing enslaved interfaces and prints a warning if they do not match.

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
---
:100644 100644 b19dc03... 250a969... M	drivers/net/bonding/bond_main.c
 drivers/net/bonding/bond_main.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b19dc03..250a969 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -55,6 +55,7 @@
 #include <asm/dma.h>
 #include <linux/uaccess.h>
 #include <linux/errno.h>
+#include <linux/device.h>
 #include <linux/netdevice.h>
 #include <linux/inetdevice.h>
 #include <linux/igmp.h>
@@ -1450,6 +1451,21 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		}
 	}
 
+	if (bond_has_slaves(bond)) {
+		struct list_head *iter;
+		struct slave *slave;
+
+		bond_for_each_slave(bond, slave, iter) {
+			if (slave_dev->dev.numa_node !=
+			    slave->dev->dev.numa_node) {
+				netdev_warn(bond_dev,
+					    "%s does not match NUMA domain of existing slaves. This could have a performance impact.",
+					    slave_dev->name);
+				break;
+			}
+		}
+	}
+
 	call_netdevice_notifiers(NETDEV_JOIN, slave_dev);
 
 	/* If this is the first slave, then we need to set the master's hardware
-- 
1.8.3.1

^ permalink raw reply related

* Re: Regression in throughput between kvm guests over virtual bridge
From: Matthew Rosato @ 2017-10-05 20:07 UTC (permalink / raw)
  To: Jason Wang, netdev; +Cc: davem, mst
In-Reply-To: <78678f33-c9ba-bf85-7778-b2d0676b78dd@linux.vnet.ibm.com>

On 09/25/2017 04:18 PM, Matthew Rosato wrote:
> On 09/22/2017 12:03 AM, Jason Wang wrote:
>>
>>
>> On 2017年09月21日 03:38, Matthew Rosato wrote:
>>>> Seems to make some progress on wakeup mitigation. Previous patch tries
>>>> to reduce the unnecessary traversal of waitqueue during rx. Attached
>>>> patch goes even further which disables rx polling during processing tx.
>>>> Please try it to see if it has any difference.
>>> Unfortunately, this patch doesn't seem to have made a difference.  I
>>> tried runs with both this patch and the previous patch applied, as well
>>> as only this patch applied for comparison (numbers from vhost thread of
>>> sending VM):
>>>
>>> 4.12    4.13     patch1   patch2   patch1+2
>>> 2.00%   +3.69%   +2.55%   +2.81%   +2.69%   [...] __wake_up_sync_key
>>>
>>> In each case, the regression in throughput was still present.
>>
>> This probably means some other cases of the wakeups were missed. Could
>> you please record the callers of __wake_up_sync_key()?
>>
> 
> Hi Jason,
> 
> With your 2 previous patches applied, every call to __wake_up_sync_key
> (for both sender and server vhost threads) shows the following stack trace:
> 
>      vhost-11478-11520 [002] ....   312.927229: __wake_up_sync_key
> <-sock_def_readable
>      vhost-11478-11520 [002] ....   312.927230: <stack trace>
>  => dev_hard_start_xmit
>  => sch_direct_xmit
>  => __dev_queue_xmit
>  => br_dev_queue_push_xmit
>  => br_forward_finish
>  => __br_forward
>  => br_handle_frame_finish
>  => br_handle_frame
>  => __netif_receive_skb_core
>  => netif_receive_skb_internal
>  => tun_get_user
>  => tun_sendmsg
>  => handle_tx
>  => vhost_worker
>  => kthread
>  => kernel_thread_starter
>  => kernel_thread_starter
> 

Ping...  Jason, any other ideas or suggestions?

^ permalink raw reply

* Re: [PATCH 2/3 v2] net: phy: DP83822 initial driver submission
From: Dan Murphy @ 2017-10-05 20:06 UTC (permalink / raw)
  To: Andrew Lunn, Woojung.Huh; +Cc: f.fainelli, netdev, afd
In-Reply-To: <20171004235307.GD16612@lunn.ch>

Andrew

On 10/04/2017 06:53 PM, Andrew Lunn wrote:
> On Wed, Oct 04, 2017 at 10:44:36PM +0000, Woojung.Huh@microchip.com wrote:
>>> +static int dp83822_suspend(struct phy_device *phydev)
>>> +{
>>> +	int value;
>>> +
>>> +	mutex_lock(&phydev->lock);
>>> +	value = phy_read_mmd(phydev, DP83822_DEVADDR,
>>> MII_DP83822_WOL_CFG);
>>> +	mutex_unlock(&phydev->lock);
> 
>> Would we need mutex to access phy_read_mmd()?
>> phy_read_mmd() has mdio_lock for indirect access.
> 
> Hi Woojung
> 
> The mdio lock is not sufficient. It protects against two mdio
> accesses. But here we need to protect against two phy operations.
> There is a danger something else tries to access the phy during
> suspend.
> 
>>> +	if (!(value & DP83822_WOL_EN))
>>> +		genphy_suspend(phydev);
> 
> Releasing the lock before calling genphy_suspend() is not so nice.
> Maybe add a version which assumes the lock has already been taken?
> 

The marvell driver does not take a lock and calls genphy_suspend/resume
so I am wondering if this driver needs to take a lock.

The at803x needs to take the lock because it does not call into the genphy
functions.

Dan

>       Andrew
> 


-- 
------------------
Dan Murphy

^ permalink raw reply

* [PATCH] doc: Fix typo "8023.ad" in bonding documentation
From: Axel Beckert @ 2017-10-05 20:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Jonathan Corbet, Jiri Kosina

Should be "802.3ad" like everywhere else in the document.

Signed-off-by: Axel Beckert <abe@deuxchevaux.org>
---
 Documentation/networking/bonding.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 57f52cdce32e..9ba04c0bab8d 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -2387,7 +2387,7 @@ broadcast: Like active-backup, there is not much advantage to this
 	and packet type ID), so in a "gatewayed" configuration, all
 	outgoing traffic will generally use the same device.  Incoming
 	traffic may also end up on a single device, but that is
-	dependent upon the balancing policy of the peer's 8023.ad
+	dependent upon the balancing policy of the peer's 802.3ad
 	implementation.  In a "local" configuration, traffic will be
 	distributed across the devices in the bond.
 
-- 
2.14.2

^ permalink raw reply related

* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Vinicius Costa Gomes @ 2017-10-05 19:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, intel-wired-lan, jhs, xiyou.wangcong, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, henrik, levipearson, rodney.cummings
In-Reply-To: <20171004063650.GA1895@nanopsycho>

Hi Jiri,

Jiri Pirko <jiri@resnulli.us> writes:

> Wed, Oct 04, 2017 at 02:28:30AM CEST, vinicius.gomes@intel.com wrote:
>>This queueing discipline implements the shaper algorithm defined by
>>the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.
>>
>>It's primary usage is to apply some bandwidth reservation to user
>>defined traffic classes, which are mapped to different queues via the
>>mqprio qdisc.
>>
>>Initially, it only supports offloading the traffic shaping work to
>>supporting controllers.
>>
>>Later, when a software implementation is added, the current dependency
>>on being installed "under" mqprio can be lifted.
>>
>>Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>>Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>---
>> include/linux/netdevice.h      |   1 +
>> include/net/pkt_sched.h        |   9 ++
>> include/uapi/linux/pkt_sched.h |  17 ++++
>> net/sched/Kconfig              |  11 ++
>> net/sched/Makefile             |   1 +
>> net/sched/sch_cbs.c            | 225 +++++++++++++++++++++++++++++++++++++++++
>> 6 files changed, 264 insertions(+)
>> create mode 100644 net/sched/sch_cbs.c
>>
>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>index e1d6ef130611..b8798adc214f 100644
>>--- a/include/linux/netdevice.h
>>+++ b/include/linux/netdevice.h
>>@@ -775,6 +775,7 @@ enum tc_setup_type {
>> 	TC_SETUP_CLSFLOWER,
>> 	TC_SETUP_CLSMATCHALL,
>> 	TC_SETUP_CLSBPF,
>>+	TC_SETUP_CBS,
>
> Please split this into 2 patches. One will introduce the new qdisc,
> second will add offload capabilities.
>

Of course.

> [...]
>
>
>>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
>>+	.next		=	NULL,
>>+	.id		=	"cbs",
>>+	.priv_size	=	sizeof(struct cbs_sched_data),
>>+	.enqueue	=	cbs_enqueue,
>>+	.dequeue	=	qdisc_dequeue_head,
>>+	.peek		=	qdisc_peek_dequeued,
>>+	.init		=	cbs_init,
>>+	.reset		=	qdisc_reset_queue,
>>+	.destroy	=	cbs_destroy,
>>+	.change		=	cbs_change,
>>+	.dump		=	cbs_dump,
>>+	.owner		=	THIS_MODULE,
>>+};
>
> I don't see a software implementation for this. Looks like you are
> trying abuse tc subsystem to bypass kernel. Could you please explain
> this? The golden rule is: implement in kernel, then offload.

The reason was that we didn't have a use case for the software
implementation right now, it would be added in a later series.

But as that was requested (and it makes sense), I will add it for the
next version of this series (it is already written, just need to test it
better).


Cheers,

^ permalink raw reply

* [PATCH] net/ipv6: remove unused err variable on icmpv6_push_pending_frames
From: Tim Hansen @ 2017-10-05 19:45 UTC (permalink / raw)
  To: davem; +Cc: kuznet, yoshfuji, netdev, linux-kernel, alexander.levin,
	devtimhansen

int err is unused by icmpv6_push_pending_frames(), this patch returns removes the variable and returns the function with 0.

git bisect shows this variable has been around since linux has been in git in commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2.  

This was found by running make coccicheck M=net/ipv6/ on linus' tree on commit 77ede3a014a32746002f7889211f0cecf4803163 (current HEAD as of this patch).

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
---
 net/ipv6/icmp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 5acb544..aeb49b4 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -255,7 +255,6 @@ int icmpv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 {
 	struct sk_buff *skb;
 	struct icmp6hdr *icmp6h;
-	int err = 0;
 
 	skb = skb_peek(&sk->sk_write_queue);
 	if (!skb)
@@ -288,7 +287,7 @@ int icmpv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 	}
 	ip6_push_pending_frames(sk);
 out:
-	return err;
+	return 0;
 }
 
 struct icmpv6_msg {
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH] net: qcom/emac: make function emac_isr static
From: Timur Tabi @ 2017-10-05 19:31 UTC (permalink / raw)
  To: Colin King, netdev; +Cc: kernel-janitors, linux-kernel
In-Reply-To: <20171005091023.27781-1-colin.king@canonical.com>

On 10/05/2017 04:10 AM, Colin King wrote:
> From: Colin Ian King<colin.king@canonical.com>
> 
> The function emac_isr is local to the source and does not need to
> be in global scope, so make it static.
> 
> Cleans up sparse warnings:
> symbol 'emac_isr' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King<colin.king@canonical.com>

ACK

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH v2] isdn/gigaset: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 19:31 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Karsten Keil, David S. Miller, Johan Hovold, linux-kernel,
	gigaset307x-common, netdev

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johan Hovold <johan@kernel.org>
Cc: gigaset307x-common@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
This requires commit 686fef928bba ("timer: Prepare to change timer
callback argument type") in v4.14-rc3, but should be otherwise
stand-alone.

v2:
- split kzalloc() into a separate patch; pebolle.
---
 drivers/isdn/gigaset/bas-gigaset.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/isdn/gigaset/bas-gigaset.c b/drivers/isdn/gigaset/bas-gigaset.c
index 33151f05e744..c990c6bbffc2 100644
--- a/drivers/isdn/gigaset/bas-gigaset.c
+++ b/drivers/isdn/gigaset/bas-gigaset.c
@@ -433,10 +433,11 @@ static void check_pending(struct bas_cardstate *ucs)
  * argument:
  *	controller state structure
  */
-static void cmd_in_timeout(unsigned long data)
+static void cmd_in_timeout(struct timer_list *t)
 {
-	struct cardstate *cs = (struct cardstate *) data;
-	struct bas_cardstate *ucs = cs->hw.bas;
+	struct bas_cardstate *ucs = from_timer(ucs, t, timer_cmd_in);
+	struct urb *urb = ucs->urb_int_in;
+	struct cardstate *cs = urb->context;
 	int rc;
 
 	if (!ucs->rcvbuf_size) {
@@ -639,10 +640,11 @@ static void int_in_work(struct work_struct *work)
  * argument:
  *	controller state structure
  */
-static void int_in_resubmit(unsigned long data)
+static void int_in_resubmit(struct timer_list *t)
 {
-	struct cardstate *cs = (struct cardstate *) data;
-	struct bas_cardstate *ucs = cs->hw.bas;
+	struct bas_cardstate *ucs = from_timer(ucs, t, timer_int_in);
+	struct urb *urb = ucs->urb_int_in;
+	struct cardstate *cs = urb->context;
 	int rc;
 
 	if (ucs->retry_int_in++ >= BAS_RETRY) {
@@ -1441,10 +1443,11 @@ static void read_iso_tasklet(unsigned long data)
  * argument:
  *	controller state structure
  */
-static void req_timeout(unsigned long data)
+static void req_timeout(struct timer_list *t)
 {
-	struct cardstate *cs = (struct cardstate *) data;
-	struct bas_cardstate *ucs = cs->hw.bas;
+	struct bas_cardstate *ucs = from_timer(ucs, t, timer_ctrl);
+	struct urb *urb = ucs->urb_int_in;
+	struct cardstate *cs = urb->context;
 	int pending;
 	unsigned long flags;
 
@@ -1837,10 +1840,11 @@ static void write_command_callback(struct urb *urb)
  * argument:
  *	controller state structure
  */
-static void atrdy_timeout(unsigned long data)
+static void atrdy_timeout(struct timer_list *t)
 {
-	struct cardstate *cs = (struct cardstate *) data;
-	struct bas_cardstate *ucs = cs->hw.bas;
+	struct bas_cardstate *ucs = from_timer(ucs, t, timer_atrdy);
+	struct urb *urb = ucs->urb_int_in;
+	struct cardstate *cs = urb->context;
 
 	dev_warn(cs->dev, "timeout waiting for HD_READY_SEND_ATDATA\n");
 
@@ -2213,10 +2217,10 @@ static int gigaset_initcshw(struct cardstate *cs)
 	}
 
 	spin_lock_init(&ucs->lock);
-	setup_timer(&ucs->timer_ctrl, req_timeout, (unsigned long) cs);
-	setup_timer(&ucs->timer_atrdy, atrdy_timeout, (unsigned long) cs);
-	setup_timer(&ucs->timer_cmd_in, cmd_in_timeout, (unsigned long) cs);
-	setup_timer(&ucs->timer_int_in, int_in_resubmit, (unsigned long) cs);
+	timer_setup(&ucs->timer_ctrl, req_timeout, 0);
+	timer_setup(&ucs->timer_atrdy, atrdy_timeout, 0);
+	timer_setup(&ucs->timer_cmd_in, cmd_in_timeout, 0);
+	timer_setup(&ucs->timer_int_in, int_in_resubmit, 0);
 	init_waitqueue_head(&ucs->waitqueue);
 	INIT_WORK(&ucs->int_in_wq, int_in_work);
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* [PATCH] isdn/gigaset: Use kzalloc instead of open-coded field zeroing
From: Kees Cook @ 2017-10-05 19:30 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Karsten Keil, David S. Miller, Johan Hovold, linux-kernel,
	gigaset307x-common, netdev

This replaces a kmalloc followed by a bunch of per-field zeroing with a
single kzalloc call, reducing the lines of code.

Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johan Hovold <johan@kernel.org>
Cc: gigaset307x-common@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/isdn/gigaset/bas-gigaset.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/isdn/gigaset/bas-gigaset.c b/drivers/isdn/gigaset/bas-gigaset.c
index 2da3ff650e1d..33151f05e744 100644
--- a/drivers/isdn/gigaset/bas-gigaset.c
+++ b/drivers/isdn/gigaset/bas-gigaset.c
@@ -2200,7 +2200,7 @@ static int gigaset_initcshw(struct cardstate *cs)
 {
 	struct bas_cardstate *ucs;
 
-	cs->hw.bas = ucs = kmalloc(sizeof *ucs, GFP_KERNEL);
+	cs->hw.bas = ucs = kzalloc(sizeof(*ucs), GFP_KERNEL);
 	if (!ucs) {
 		pr_err("out of memory\n");
 		return -ENOMEM;
@@ -2212,15 +2212,7 @@ static int gigaset_initcshw(struct cardstate *cs)
 		return -ENOMEM;
 	}
 
-	ucs->urb_cmd_in = NULL;
-	ucs->urb_cmd_out = NULL;
-	ucs->rcvbuf = NULL;
-	ucs->rcvbuf_size = 0;
-
 	spin_lock_init(&ucs->lock);
-	ucs->pending = 0;
-
-	ucs->basstate = 0;
 	setup_timer(&ucs->timer_ctrl, req_timeout, (unsigned long) cs);
 	setup_timer(&ucs->timer_atrdy, atrdy_timeout, (unsigned long) cs);
 	setup_timer(&ucs->timer_cmd_in, cmd_in_timeout, (unsigned long) cs);
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH net-next v3 2/2] libbpf: use map_flags when creating maps
From: Daniel Borkmann @ 2017-10-05 19:26 UTC (permalink / raw)
  To: Craig Gallek, Alexei Starovoitov, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev
In-Reply-To: <20171005144158.14860-3-kraigatgoog@gmail.com>

On 10/05/2017 04:41 PM, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type
> which requires flags.
>
> Signed-off-by: Craig Gallek <kraig@google.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] libbpf: parse maps sections of varying size
From: Daniel Borkmann @ 2017-10-05 19:25 UTC (permalink / raw)
  To: Craig Gallek, Alexei Starovoitov, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev
In-Reply-To: <20171005144158.14860-2-kraigatgoog@gmail.com>

On 10/05/2017 04:41 PM, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> This library previously assumed a fixed-size map options structure.
> Any new options were ignored.  In order to allow the options structure
> to grow and to support parsing older programs, this patch updates
> the maps section parsing to handle varying sizes.
>
> Object files with maps sections smaller than expected will have the new
> fields initialized to zero.  Object files which have larger than expected
> maps sections will be rejected unless all of the unrecognized data is zero.
>
> This change still assumes that each map definition in the maps section
> is the same size.
>
> Signed-off-by: Craig Gallek <kraig@google.com>

Thanks,

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH] isdn/gigaset: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 19:17 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Karsten Keil, David S. Miller, Johan Hovold, gigaset307x-common,
	Network Development, Thomas Gleixner, LKML
In-Reply-To: <1507190336.2167.5.camel@tiscali.nl>

On Thu, Oct 5, 2017 at 12:58 AM, Paul Bolle <pebolle@tiscali.nl> wrote:
> Hi Kees,
>
> On Wed, 2017-10-04 at 17:52 -0700, Kees Cook wrote:
>> Also uses kzmalloc to replace open-coded field assignments to NULL and zero.
>
> If I'm allowed to whine (chances that I'm allowed to do that are not so great
> as Dave tends to apply gigaset patches before I even have a chance to look at
> them properly!): I'd prefer it if that was done separately in a preceding
> patch. Would that bother you?

Sure, that's fine, I'll split it and re-send.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* RE: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Rodney Cummings @ 2017-10-05 19:17 UTC (permalink / raw)
  To: David Miller
  Cc: levipearson@gmail.com, jiri@resnulli.us, vinicius.gomes@intel.com,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	andre.guedes@intel.com, ivan.briano@intel.com,
	jesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com,
	richardcochran@gmail.com, henrik@austad.us
In-Reply-To: <20171005.120508.2267452751875787466.davem@davemloft.net>

No excuse. If the software cannot meet the standard's requirements, it is non-conformant,
which means it cannot be called a standard credit-based shaper.

But... I have no objection if someone wants to try software-only. I'm just saying that it
is a waste of time for me.

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, October 5, 2017 2:05 PM
> To: Rodney Cummings <rodney.cummings@ni.com>
> Cc: levipearson@gmail.com; jiri@resnulli.us; vinicius.gomes@intel.com;
> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
> jhs@mojatatu.com; xiyou.wangcong@gmail.com; andre.guedes@intel.com;
> ivan.briano@intel.com; jesus.sanchez-palencia@intel.com;
> boon.leong.ong@intel.com; richardcochran@gmail.com; henrik@austad.us
> Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based
> Shaper (CBS) qdisc
> 
> From: Rodney Cummings <rodney.cummings@ni.com>
> Date: Thu, 5 Oct 2017 18:41:48 +0000
> 
> > The IEEE Std 802.1Q specs for credit-based shaper require precise
> transmit decisions
> > within a 125 microsecond window of time.
> >
> > Even with the Preempt RT patch or similar enhancements, that isn't very
> practical
> > as software-only. I doubt that software would conform to the standard's
> > requirements.
> >
> > This is analogous to memory, or CPU.
> 
> I feel like this is looking for an excuse to not have to at least try to
> implement
> the software version of CBS.

^ permalink raw reply

* Re: [PATCH] mwifiex: Use put_unaligned_le32
From: Himanshu Jha @ 2017-10-05 19:07 UTC (permalink / raw)
  To: Brian Norris
  Cc: Kalle Valo, amitkarwar, nishants, gbhat, huxm, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20171005180248.GA94139@google.com>

On Thu, Oct 05, 2017 at 11:02:50AM -0700, Brian Norris wrote:
> On Thu, Oct 05, 2017 at 08:52:33PM +0530, Himanshu Jha wrote:
> > There are various instances where a function used in file say for eg
> > int func_align (void* a)
> > is used and it is defined in align.h
> > But many files don't *directly* include align.h and rather include
> > any other header which includes align.h
> 
> I believe the general rule is that you should included headers for all
> symbols you use, and not rely on implicit includes.
> 
> The modification to the general rule is that not all headers are
> intended to be included directly, and in such cases there's likely a
> parent header that is the more appropriate target.
> 
> In this case, the key is CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. It
> seems that asm-generic/unaligned.h is set up to include different
> headers, based on the expected architecture behavior.
>
Yes, asm-generic/unaligned.h looks more appopriate and is most generic
implementation of unaligned accesses and  arc specific.

Let's see what Kalle Valo recommends! And then I will send v2 of the
patch.

Thanks for the information!

Himanshu Jha

> I wonder if include/linux/unaligned/access_ok.h should have a safety
> check (e.g., raise an #error if
> !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS?).
> 
> > Is compiling the file the only way to check if apppropriate header is
> > included or is there some other way to check for it.
> 
> I believe it's mostly manual. Implicit includes have been a problem for
> anyone who refactors header files.
> 
> Brian

^ permalink raw reply

* Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned
From: Marcelo Ricardo Leitner @ 2017-10-05 19:07 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, luto, baijiaju1990, nhorman, vyasevich, kvalo,
	linux-crypto, netdev, linux-sctp, linux-wireless
In-Reply-To: <20171005131631.GA1553@gondor.apana.org.au>

On Thu, Oct 05, 2017 at 09:16:31PM +0800, Herbert Xu wrote:
> On Thu, Oct 05, 2017 at 06:16:20PM +0800, Herbert Xu wrote:
> >
> > That was my point.  Functions like sctp_pack_cookie shouldn't be
> > setting the key in the first place.  The setkey should happen at
> > the point when the key is generated.  That's sctp_endpoint_init
> > which AFAICS only gets called in GFP_KERNEL context.
> > 
> > Or is there a code-path where sctp_endpoint_init is called in
> > softirq context?
> 
> OK, there are indeed code paths where the key is derived in softirq
> context.  Notably sctp_auth_calculate_hmac.
> 
> So I think this patch is the correct fix and I will push it upstream
> as well as back to stable.

Okay, thanks.

  Marcelo

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: David Miller @ 2017-10-05 19:06 UTC (permalink / raw)
  To: Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA
In-Reply-To: <CY1PR0701MB2012A2F8E3E923D98B1E1A6488700-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>

From: "Kalderon, Michal" <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Date: Thu, 5 Oct 2017 18:59:04 +0000

> From: Kalderon, Michal
> Sent: Tuesday, October 3, 2017 9:05 PM
> To: David Miller
>>From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>>Sent: Tuesday, October 3, 2017 8:17 PM
>>>> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
>>>>  }
>>>>
>>>>  static int
>>>> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
>>>> +                     struct qed_ll2_info *p_ll2_conn,
>>>> +                     union core_rx_cqe_union *p_cqe,
>>>> +                     unsigned long *p_lock_flags)
>>>> +{
>>>...
>>>> +     spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);
>>>> +
>>>
>>>You can't drop this lock.
>>>
>>>Another thread can enter the loop of our caller and process RX queue
>>>entries, then we would return from here and try to process the same
>>>entries again.
>>
>>The lock is there to synchronize access to chains between qed_ll2_rxq_completion
>>and qed_ll2_post_rx_buffer. qed_ll2_rxq_completion can't be called from
>>different threads, the light l2 uses the single sp status block we have.
>>The reason we release the lock is to avoid a deadlock where as a result of calling
>>upper-layer driver it will potentially post additional rx-buffers.
> 
> Dave, is there anything else needed from me on this? 
> Noticed the series is still in "Changes Requested". 

I'm still not convinced that the lock dropping is legitimate.  What if a
spurious interrupt arrives?

If the execution path in the caller is serialized for some reason, why
are you using a spinlock and don't use that serialization for the mutual
exclusion necessary for these queue indexes?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: David Miller @ 2017-10-05 19:05 UTC (permalink / raw)
  To: rodney.cummings
  Cc: levipearson, jiri, vinicius.gomes, netdev, intel-wired-lan, jhs,
	xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran, henrik
In-Reply-To: <CY1PR0401MB1536A44D0AB459BB9618664A92700@CY1PR0401MB1536.namprd04.prod.outlook.com>

From: Rodney Cummings <rodney.cummings@ni.com>
Date: Thu, 5 Oct 2017 18:41:48 +0000

> The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions
> within a 125 microsecond window of time.
> 
> Even with the Preempt RT patch or similar enhancements, that isn't very practical
> as software-only. I doubt that software would conform to the standard's
> requirements.
> 
> This is analogous to memory, or CPU.

I feel like this is looking for an excuse to not have to at least try to implement
the software version of CBS.

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: Kalderon, Michal @ 2017-10-05 18:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel
In-Reply-To: <CY1PR0701MB20128130D21FD3C54E45B5A188720-UpKza+2NMNLHMJvQ0dyT705OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>

From: Kalderon, Michal
Sent: Tuesday, October 3, 2017 9:05 PM
To: David Miller
>From: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>Sent: Tuesday, October 3, 2017 8:17 PM
>>> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
>>>  }
>>>
>>>  static int
>>> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
>>> +                     struct qed_ll2_info *p_ll2_conn,
>>> +                     union core_rx_cqe_union *p_cqe,
>>> +                     unsigned long *p_lock_flags)
>>> +{
>>...
>>> +     spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);
>>> +
>>
>>You can't drop this lock.
>>
>>Another thread can enter the loop of our caller and process RX queue
>>entries, then we would return from here and try to process the same
>>entries again.
>
>The lock is there to synchronize access to chains between qed_ll2_rxq_completion
>and qed_ll2_post_rx_buffer. qed_ll2_rxq_completion can't be called from
>different threads, the light l2 uses the single sp status block we have.
>The reason we release the lock is to avoid a deadlock where as a result of calling
>upper-layer driver it will potentially post additional rx-buffers.

Dave, is there anything else needed from me on this? 
Noticed the series is still in "Changes Requested". 

thanks,
Michal


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ipv6: gso: fix payload length when gso_size is zero
From: Duyck, Alexander H @ 2017-10-05 18:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org, alexey.kodanev@oracle.com
  Cc: davem@davemloft.net, steffen.klassert@secunet.com
In-Reply-To: <1507223207-17557-1-git-send-email-alexey.kodanev@oracle.com>

On Thu, 2017-10-05 at 20:06 +0300, Alexey Kodanev wrote:
> When gso_size reset to zero for the tail segment in skb_segment(), later
> in ipv6_gso_segment(), we will get incorrect payload_len for that segment.
> inet_gso_segment() already has a check for gso_size before calculating
> payload so fixing only IPv6 part.
> 
> The issue was found with LTP vxlan & gre tests over ixgbe NIC.
> 
> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/ipv6/ip6_offload.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index cdb3728..4a87f94 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
>  
>  	for (skb = segs; skb; skb = skb->next) {
>  		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
> -		if (gso_partial)
> +		if (gso_partial && skb_is_gso(skb))
>  			payload_len = skb_shinfo(skb)->gso_size +
>  				      SKB_GSO_CB(skb)->data_offset +
>  				      skb->head - (unsigned char *)(ipv6h + 1);

So looking over this change it looks good to me. I'm just wondering if
you have looked at the code in __skb_udp_tunnel_segment or
gre_gso_segment? It seems like if you needed this change here you
should need to make similar changes to those functions as well. I'm
wondering if we just aren't seeing issues due to the segments already
being MSS sized before being handed to us for segmentation.

- Alex

^ permalink raw reply

* RE: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Rodney Cummings @ 2017-10-05 18:41 UTC (permalink / raw)
  To: David Miller, levipearson@gmail.com
  Cc: jiri@resnulli.us, vinicius.gomes@intel.com,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com,
	andre.guedes@intel.com, ivan.briano@intel.com,
	jesus.sanchez-palencia@intel.com, boon.leong.ong@intel.com,
	richardcochran@gmail.com, henrik@austad.us
In-Reply-To: <20171005.112909.2052593524154643514.davem@davemloft.net>

The IEEE Std 802.1Q specs for credit-based shaper require precise transmit decisions
within a 125 microsecond window of time.

Even with the Preempt RT patch or similar enhancements, that isn't very practical
as software-only. I doubt that software would conform to the standard's
requirements.

This is analogous to memory, or CPU.
.

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, October 5, 2017 1:29 PM
> To: levipearson@gmail.com
> Cc: jiri@resnulli.us; vinicius.gomes@intel.com; netdev@vger.kernel.org;
> intel-wired-lan@lists.osuosl.org; jhs@mojatatu.com;
> xiyou.wangcong@gmail.com; andre.guedes@intel.com; ivan.briano@intel.com;
> jesus.sanchez-palencia@intel.com; boon.leong.ong@intel.com;
> richardcochran@gmail.com; henrik@austad.us; Rodney Cummings
> <rodney.cummings@ni.com>
> Subject: Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based
> Shaper (CBS) qdisc
> 
> From: Levi Pearson <levipearson@gmail.com>
> Date: Thu, 5 Oct 2017 12:09:32 -0600
> 
> > It would be a shame if this were blocked due to a missing software
> > implementation.
> 
> Quite the contrary, I think a software implementation is a minimum
> requirement for inclusion of this feature.
> 
> Without a software implementation, there is no clear definition of
> what is supposed to happen, and no clear way for people to test those
> expectations unless they have the specific hardware.
> 
> I completely agree with Jiri.  Hardware offload first is _not_ how
> we do things in the Linux networking.

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] ethtool: support for forward error correction mode setting on a link
From: Jakub Kicinski @ 2017-10-05 18:30 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: davem@davemloft.net, John W. Linville, netdev@vger.kernel.org,
	Vidya Sagar Ravipati, Dustin Byford, Dave Olson, Casey Leedom,
	Gal Pressman, Andrew Lunn, Manoj Malviya, Santosh Rastapur,
	yuval.mintz, odedw, Ariel Almog, Jeff Kirsher, Dirk van der Merwe
In-Reply-To: <CAJieiUgM=t0GhnRXP5YYAZytFoKYRutpM0ZiVsDMjrpLZwCHsQ@mail.gmail.com>

On Fri, 28 Jul 2017 23:28:26 -0700, Roopa Prabhu wrote:
> On Fri, Jul 28, 2017 at 9:46 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> > On Fri, 28 Jul 2017 07:53:01 -0700, Roopa Prabhu wrote:  
> >> On Thu, Jul 27, 2017 at 7:33 PM, Jakub Kicinski <kubakici@wp.pl> wrote:  
> >> > On Thu, 27 Jul 2017 16:47:25 -0700, Roopa Prabhu wrote:  
> >> >> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> >> >>
> >> >> Forward Error Correction (FEC) modes i.e Base-R
> >> >> and Reed-Solomon modes are introduced in 25G/40G/100G standards
> >> >> for providing good BER at high speeds. Various networking devices
> >> >> which support 25G/40G/100G provides ability to manage supported FEC
> >> >> modes and the lack of FEC encoding control and reporting today is a
> >> >> source for interoperability issues for many vendors.
> >> >> FEC capability as well as specific FEC mode i.e. Base-R
> >> >> or RS modes can be requested or advertised through bits D44:47 of base link
> >> >> codeword.
> >> >>
> >> >> This patch set intends to provide option under ethtool to manage and
> >> >> report FEC encoding settings for networking devices as per IEEE 802.3
> >> >> bj, bm and by specs.
> >> >>
> >> >> v2 :
> >> >>         - minor patch format fixes and typos pointed out by Andrew
> >> >>         - there was a pending discussion on the use of 'auto' vs
> >> >>           'automatic' for fec settings. I have left it as 'auto'
> >> >>           because in most cases today auto is used in place of
> >> >>           automatic to represent automatically generated values.
> >> >>           We use it in other networking config too. I would prefer
> >> >>           leaving it as auto.  
> >> >
> >> > On the subject of resetting the values when module is replugged I
> >> > assume what was previously described remains:
> >> >  - we always allow users to set the FEC regardless of the module type;
> >> >  - if user set an incorrect FEC for the module type (or module gets
> >> >    swapped) the link will be administratively taken down by either
> >> >    the driver or FW.
> >> >
> >> > Is that correct?  Am I misremembering?  
> >>
> >> yes, correct. And possible future sfp hotplug events can give user-space
> >> more info to react to module type changes etc.  
> >
> > OK, if nobody else objects and we go with that - lets make sure we
> > document clearly those are expected :)  My concern is that if there is
> > ever 10G + RS FEC standard we don't want to end up in a situation where
> > some drivers silently ignore FEC settings in 10G and other apply it.
> > So let's make it clear what the intended Linux behaviour is.  It could
> > be in the ethtool man page, or the kernel somewhere.  
> 
> sure :), ack. We will document it in the ethtool manpage.

Hi Roopa!  Did you ever publish the ethtool user space patches at all?
I can't find them...

^ permalink raw reply

* Re: [next-queue PATCH v4 3/4] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: David Miller @ 2017-10-05 18:29 UTC (permalink / raw)
  To: levipearson
  Cc: jiri, vinicius.gomes, netdev, intel-wired-lan, jhs,
	xiyou.wangcong, andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran, henrik, rodney.cummings
In-Reply-To: <CAEYbN3RjUXGMyxo0t88-ASNVEVQdfXkMzBbMtMHAhqWScOO=Cg@mail.gmail.com>

From: Levi Pearson <levipearson@gmail.com>
Date: Thu, 5 Oct 2017 12:09:32 -0600

> It would be a shame if this were blocked due to a missing software
> implementation.

Quite the contrary, I think a software implementation is a minimum
requirement for inclusion of this feature.

Without a software implementation, there is no clear definition of
what is supposed to happen, and no clear way for people to test those
expectations unless they have the specific hardware.

I completely agree with Jiri.  Hardware offload first is _not_ how
we do things in the Linux networking.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox