* Re: [[PATCH iproute2/net-next v2] 1/4] tc: flower: remove references to eth_type in manpage
From: Stephen Hemminger @ 2016-12-02 23:05 UTC (permalink / raw)
To: Simon Horman; +Cc: netdev, Paul Blakey
In-Reply-To: <1480668321-20875-2-git-send-email-simon.horman@netronome.com>
On Fri, 2 Dec 2016 09:45:18 +0100
Simon Horman <simon.horman@netronome.com> wrote:
> Remove references to eth_type and ether_type (spelling error) in
> the tc flower manpage.
>
> Also correct formatting of boldface text with whitespace.
>
> Cc: Paul Blakey <paulb@mellanox.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Applied this one. Later ones still need rebase.
^ permalink raw reply
* [PATCH net-next] liquidio: 'imply' ptp instead of 'select'
From: Arnd Bergmann @ 2016-12-02 23:04 UTC (permalink / raw)
To: David S. Miller
Cc: Arnd Bergmann, Felix Manlunas, Thomas Gleixner, David Daney,
Satanand Burla, Raghu Vatsavayi, Nicolas Pitre, Sunil Goutham,
netdev, linux-kernel
ptp now depends on the optional POSIX_TIMERS setting and fails to build
if we select it without that:
warning: (LIQUIDIO_VF && TI_CPTS) selects PTP_1588_CLOCK which has unmet direct dependencies (NET && POSIX_TIMERS)
warning: (LIQUIDIO_VF && TI_CPTS) selects PTP_1588_CLOCK which has unmet direct dependencies (NET && POSIX_TIMERS)
ERROR: "posix_clock_unregister" [drivers/ptp/ptp.ko] undefined!
ERROR: "posix_clock_register" [drivers/ptp/ptp.ko] undefined!
ERROR: "pps_unregister_source" [drivers/ptp/ptp.ko] undefined!
ERROR: "pps_event" [drivers/ptp/ptp.ko] undefined!
ERROR: "pps_register_source" [drivers/ptp/ptp.ko] undefined!
It seems that two patches have collided here, the build failure
is a result of the combination. Changing the new option to 'imply'
as well fixes it.
Fixes: 111fc64a237f ("liquidio CN23XX: VF registration")
Fixes: d1cbfd771ce8 ("ptp_clock: Allow for it to be optional")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/cavium/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
index bbc8bd16cb97..dcbce6cac63e 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -77,7 +77,7 @@ config OCTEON_MGMT_ETHERNET
config LIQUIDIO_VF
tristate "Cavium LiquidIO VF support"
depends on 64BIT && PCI_MSI
- select PTP_1588_CLOCK
+ imply PTP_1588_CLOCK
---help---
This driver supports Cavium LiquidIO Intelligent Server Adapter
based on CN23XX chips.
--
2.9.0
^ permalink raw reply related
* [PATCH net-next] phy: add phy fixup unregister functions
From: Woojung.Huh @ 2016-12-02 23:04 UTC (permalink / raw)
To: davem, f.fainelli, andrew; +Cc: netdev
From: Woojung Huh <woojung.huh@microchip.com>
Add functions to unregister phy fixup for modules.
phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
Unregister phy fixup matches bus_id, phy_uid and phy_uid_mask
from phy_fixup_list.
Return 0 when find matched one and remove from the list.
Return -ENODEV when fail to find it on the list.
phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
Unregister phy fixup from phy_fixup_list.
Use it for fixup registered by phy_register_fixup_for_uid()
Return 0 when find matched one and remove from the list.
Return -ENODEV when fail to find it on the list.
phy_unregister_fixup_for_id(const char *bus_id)
Unregister phy fixup from phy_fixup_list.
Use it for fixup registered by phy_register_fixup_for_id()
Return 0 when find matched one and remove from the list.
Return -ENODEV when fail to find it on the list.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
---
Documentation/networking/phy.txt | 9 ++++++++
drivers/net/phy/phy_device.c | 47 ++++++++++++++++++++++++++++++++++++++++
include/linux/phy.h | 4 ++++
3 files changed, 60 insertions(+)
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index e017d93..16f90d8 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -407,6 +407,15 @@ Board Fixups
The stubs set one of the two matching criteria, and set the other one to
match anything.
+ When phy_register_fixup() or *_for_uid()/*_for_id() is called at module,
+ unregister fixup and free allocate memory are required.
+
+ Call one of following function before unloading module.
+
+ int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
+ int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
+ int phy_register_fixup_for_id(const char *phy_id);
+
Standards
IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index aeaf1bc..32fa7c7 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -235,6 +235,53 @@ int phy_register_fixup_for_id(const char *bus_id,
}
EXPORT_SYMBOL(phy_register_fixup_for_id);
+/**
+ * phy_unregister_fixup - remove a phy_fixup from the list
+ * @bus_id: A string matches fixup->bus_id (or PHY_ANY_ID) in phy_fixup_list
+ * @phy_uid: A phy id matches fixup->phy_id (or PHY_ANY_UID) in phy_fixup_list
+ * @phy_uid_mask: Applied to phy_uid and fixup->phy_uid before comparison
+ */
+int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
+{
+ struct list_head *pos, *n;
+ struct phy_fixup *fixup;
+ int ret;
+
+ ret = -ENODEV;
+
+ mutex_lock(&phy_fixup_lock);
+ list_for_each_safe(pos, n, &phy_fixup_list) {
+ fixup = list_entry(pos, struct phy_fixup, list);
+
+ if ((!strcmp(fixup->bus_id, bus_id)) &&
+ ((fixup->phy_uid & phy_uid_mask) ==
+ (phy_uid & phy_uid_mask))) {
+ list_del(&fixup->list);
+ kfree(fixup);
+ ret = 0;
+ break;
+ }
+ }
+ mutex_unlock(&phy_fixup_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL(phy_unregister_fixup);
+
+/* Unregisters a fixup of any PHY with the UID in phy_uid */
+int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
+{
+ return phy_unregister_fixup(PHY_ANY_ID, phy_uid, phy_uid_mask);
+}
+EXPORT_SYMBOL(phy_unregister_fixup_for_uid);
+
+/* Unregisters a fixup of the PHY with id string bus_id */
+int phy_unregister_fixup_for_id(const char *bus_id)
+{
+ return phy_unregister_fixup(bus_id, PHY_ANY_UID, 0xffffffff);
+}
+EXPORT_SYMBOL(phy_unregister_fixup_for_id);
+
/* Returns 1 if fixup matches phydev in bus_id and phy_uid.
* Fixups can be set to match any in one or more fields.
*/
diff --git a/include/linux/phy.h b/include/linux/phy.h
index b53177f..745661d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -859,6 +859,10 @@ int phy_register_fixup_for_id(const char *bus_id,
int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
+int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask);
+int phy_unregister_fixup_for_id(const char *bus_id);
+int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
+
int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable);
int phy_get_eee_err(struct phy_device *phydev);
int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next v2 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: Yuchung Cheng @ 2016-12-02 23:02 UTC (permalink / raw)
To: Paul Gortmaker
Cc: David Miller, Soheil Hassas Yeganeh, Francis Yan, netdev,
Neal Cardwell, Eric Dumazet, linux-next@vger.kernel.org
In-Reply-To: <CAP=VYLov97ujr_mmjb2HbsgBhsex2cBaAsp92DDU+HdmyC_gtQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 14342 bytes --]
On Fri, Dec 2, 2016 at 2:30 PM, Paul Gortmaker
<paul.gortmaker@windriver.com> wrote:
> On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
>> From: Francis Yan <francisyyan@gmail.com>
>>
>> This patch exports the sender chronograph stats via the socket
>> SO_TIMESTAMPING channel. Currently we can instrument how long a
>> particular application unit of data was queued in TCP by tracking
>> SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
>
> Seems a new linux-next failure leads back to here ; I did not run a
> full bisect, since the variable seems confined to this commit:
>
> net/socket.c:701: error: 'SCM_TIMESTAMPING_OPT_STATS' undeclared
> (first use in this function)
>
> http://kisskb.ellerman.id.au/kisskb/buildresult/12875981/
oops didn't patch avr32 arch. Does this fix work?
>
> Paul.
> --
>
>> these sender chronograph stats exported simultaneously along with
>> these timestamps allow further breaking down the various sender
>> limitation. For example, a video server can tell if a particular
>> chunk of video on a connection takes a long time to deliver because
>> TCP was experiencing small receive window. It is not possible to
>> tell before this patch without packet traces.
>>
>> To prepare these stats, the user needs to set
>> SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
>> while requesting other SOF_TIMESTAMPING TX timestamps. When the
>> timestamps are available in the error queue, the stats are returned
>> in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
>> in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
>> TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
>>
>> Signed-off-by: Francis Yan <francisyyan@gmail.com>
>> Signed-off-by: Yuchung Cheng <ycheng@google.com>
>> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
>> ---
>> ChangeLog since v1:
>> - fix build break if CONFIG_INET is not defined
>>
>> Documentation/networking/timestamping.txt | 10 ++++++++++
>> arch/alpha/include/uapi/asm/socket.h | 2 ++
>> arch/frv/include/uapi/asm/socket.h | 2 ++
>> arch/ia64/include/uapi/asm/socket.h | 2 ++
>> arch/m32r/include/uapi/asm/socket.h | 2 ++
>> arch/mips/include/uapi/asm/socket.h | 2 ++
>> arch/mn10300/include/uapi/asm/socket.h | 2 ++
>> arch/parisc/include/uapi/asm/socket.h | 2 ++
>> arch/powerpc/include/uapi/asm/socket.h | 2 ++
>> arch/s390/include/uapi/asm/socket.h | 2 ++
>> arch/sparc/include/uapi/asm/socket.h | 2 ++
>> arch/xtensa/include/uapi/asm/socket.h | 2 ++
>> include/linux/tcp.h | 2 ++
>> include/uapi/asm-generic/socket.h | 2 ++
>> include/uapi/linux/net_tstamp.h | 3 ++-
>> include/uapi/linux/tcp.h | 8 ++++++++
>> net/core/skbuff.c | 14 +++++++++++---
>> net/core/sock.c | 7 +++++++
>> net/ipv4/tcp.c | 20 ++++++++++++++++++++
>> net/socket.c | 7 ++++++-
>> 20 files changed, 90 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
>> index 671cccf..96f5069 100644
>> --- a/Documentation/networking/timestamping.txt
>> +++ b/Documentation/networking/timestamping.txt
>> @@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY:
>> the timestamp even if sysctl net.core.tstamp_allow_data is 0.
>> This option disables SOF_TIMESTAMPING_OPT_CMSG.
>>
>> +SOF_TIMESTAMPING_OPT_STATS:
>> +
>> + Optional stats that are obtained along with the transmit timestamps.
>> + It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
>> + transmit timestamp is available, the stats are available in a
>> + separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a
>> + list of TLVs (struct nlattr) of types. These stats allow the
>> + application to associate various transport layer stats with
>> + the transmit timestamps, such as how long a certain block of
>> + data was limited by peer's receiver window.
>>
>> New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
>> disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index 9e46d6e..afc901b 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -97,4 +97,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _UAPI_ASM_SOCKET_H */
>> diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
>> index afbc98f0..81e0353 100644
>> --- a/arch/frv/include/uapi/asm/socket.h
>> +++ b/arch/frv/include/uapi/asm/socket.h
>> @@ -90,5 +90,7 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _ASM_SOCKET_H */
>>
>> diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
>> index 0018fad..57feb0c 100644
>> --- a/arch/ia64/include/uapi/asm/socket.h
>> +++ b/arch/ia64/include/uapi/asm/socket.h
>> @@ -99,4 +99,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _ASM_IA64_SOCKET_H */
>> diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
>> index 5fe42fc..5853f8e9 100644
>> --- a/arch/m32r/include/uapi/asm/socket.h
>> +++ b/arch/m32r/include/uapi/asm/socket.h
>> @@ -90,4 +90,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _ASM_M32R_SOCKET_H */
>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>> index 2027240a..566ecdc 100644
>> --- a/arch/mips/include/uapi/asm/socket.h
>> +++ b/arch/mips/include/uapi/asm/socket.h
>> @@ -108,4 +108,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _UAPI_ASM_SOCKET_H */
>> diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
>> index 5129f23..0e12527 100644
>> --- a/arch/mn10300/include/uapi/asm/socket.h
>> +++ b/arch/mn10300/include/uapi/asm/socket.h
>> @@ -90,4 +90,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _ASM_SOCKET_H */
>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>> index 9c935d7..7a109b7 100644
>> --- a/arch/parisc/include/uapi/asm/socket.h
>> +++ b/arch/parisc/include/uapi/asm/socket.h
>> @@ -89,4 +89,6 @@
>>
>> #define SO_CNX_ADVICE 0x402E
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 0x402F
>> +
>> #endif /* _UAPI_ASM_SOCKET_H */
>> diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
>> index 1672e33..44583a5 100644
>> --- a/arch/powerpc/include/uapi/asm/socket.h
>> +++ b/arch/powerpc/include/uapi/asm/socket.h
>> @@ -97,4 +97,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _ASM_POWERPC_SOCKET_H */
>> diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
>> index 41b51c2..b24a64c 100644
>> --- a/arch/s390/include/uapi/asm/socket.h
>> +++ b/arch/s390/include/uapi/asm/socket.h
>> @@ -96,4 +96,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _ASM_SOCKET_H */
>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>> index 31aede3..a25dc32 100644
>> --- a/arch/sparc/include/uapi/asm/socket.h
>> +++ b/arch/sparc/include/uapi/asm/socket.h
>> @@ -86,6 +86,8 @@
>>
>> #define SO_CNX_ADVICE 0x0037
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 0x0038
>> +
>> /* Security levels - as per NRL IPv6 - don't actually do anything */
>> #define SO_SECURITY_AUTHENTICATION 0x5001
>> #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
>> diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
>> index 81435d9..9fdbe1f 100644
>> --- a/arch/xtensa/include/uapi/asm/socket.h
>> +++ b/arch/xtensa/include/uapi/asm/socket.h
>> @@ -101,4 +101,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* _XTENSA_SOCKET_H */
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index d5d3bd8..00e0ee8 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -428,4 +428,6 @@ static inline void tcp_saved_syn_free(struct tcp_sock *tp)
>> tp->saved_syn = NULL;
>> }
>>
>> +struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk);
>> +
>> #endif /* _LINUX_TCP_H */
>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>> index 67d632f..2c748dd 100644
>> --- a/include/uapi/asm-generic/socket.h
>> +++ b/include/uapi/asm-generic/socket.h
>> @@ -92,4 +92,6 @@
>>
>> #define SO_CNX_ADVICE 53
>>
>> +#define SCM_TIMESTAMPING_OPT_STATS 54
>> +
>> #endif /* __ASM_GENERIC_SOCKET_H */
>> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
>> index 264e515..464dcca 100644
>> --- a/include/uapi/linux/net_tstamp.h
>> +++ b/include/uapi/linux/net_tstamp.h
>> @@ -25,8 +25,9 @@ enum {
>> SOF_TIMESTAMPING_TX_ACK = (1<<9),
>> SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
>> SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
>> + SOF_TIMESTAMPING_OPT_STATS = (1<<12),
>>
>> - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TSONLY,
>> + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS,
>> SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>> SOF_TIMESTAMPING_LAST
>> };
>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>> index 2863b66..c53de26 100644
>> --- a/include/uapi/linux/tcp.h
>> +++ b/include/uapi/linux/tcp.h
>> @@ -220,6 +220,14 @@ struct tcp_info {
>> __u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
>> };
>>
>> +/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
>> +enum {
>> + TCP_NLA_PAD,
>> + TCP_NLA_BUSY, /* Time (usec) busy sending data */
>> + TCP_NLA_RWND_LIMITED, /* Time (usec) limited by receive window */
>> + TCP_NLA_SNDBUF_LIMITED, /* Time (usec) limited by send buffer */
>> +};
>> +
>> /* for TCP_MD5SIG socket option */
>> #define TCP_MD5SIG_MAXKEYLEN 80
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index d1d1a5a..ea6fa95 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3839,10 +3839,18 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>> if (!skb_may_tx_timestamp(sk, tsonly))
>> return;
>>
>> - if (tsonly)
>> - skb = alloc_skb(0, GFP_ATOMIC);
>> - else
>> + if (tsonly) {
>> +#ifdef CONFIG_INET
>> + if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
>> + sk->sk_protocol == IPPROTO_TCP &&
>> + sk->sk_type == SOCK_STREAM)
>> + skb = tcp_get_timestamping_opt_stats(sk);
>> + else
>> +#endif
>> + skb = alloc_skb(0, GFP_ATOMIC);
>> + } else {
>> skb = skb_clone(orig_skb, GFP_ATOMIC);
>> + }
>> if (!skb)
>> return;
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 14e6145..d8c7f8c 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -854,6 +854,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>> sk->sk_tskey = 0;
>> }
>> }
>> +
>> + if (val & SOF_TIMESTAMPING_OPT_STATS &&
>> + !(val & SOF_TIMESTAMPING_OPT_TSONLY)) {
>> + ret = -EINVAL;
>> + break;
>> + }
>> +
>> sk->sk_tsflags = val;
>> if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
>> sock_enable_timestamp(sk,
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index cdde20f..1149b48 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -2841,6 +2841,26 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
>> }
>> EXPORT_SYMBOL_GPL(tcp_get_info);
>>
>> +struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk)
>> +{
>> + const struct tcp_sock *tp = tcp_sk(sk);
>> + struct sk_buff *stats;
>> + struct tcp_info info;
>> +
>> + stats = alloc_skb(3 * nla_total_size_64bit(sizeof(u64)), GFP_ATOMIC);
>> + if (!stats)
>> + return NULL;
>> +
>> + tcp_get_info_chrono_stats(tp, &info);
>> + nla_put_u64_64bit(stats, TCP_NLA_BUSY,
>> + info.tcpi_busy_time, TCP_NLA_PAD);
>> + nla_put_u64_64bit(stats, TCP_NLA_RWND_LIMITED,
>> + info.tcpi_rwnd_limited, TCP_NLA_PAD);
>> + nla_put_u64_64bit(stats, TCP_NLA_SNDBUF_LIMITED,
>> + info.tcpi_sndbuf_limited, TCP_NLA_PAD);
>> + return stats;
>> +}
>> +
>> static int do_tcp_getsockopt(struct sock *sk, int level,
>> int optname, char __user *optval, int __user *optlen)
>> {
>> diff --git a/net/socket.c b/net/socket.c
>> index e2584c5..e631894 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -693,9 +693,14 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>> (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
>> ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
>> empty = 0;
>> - if (!empty)
>> + if (!empty) {
>> put_cmsg(msg, SOL_SOCKET,
>> SCM_TIMESTAMPING, sizeof(tss), &tss);
>> +
>> + if (skb->len && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS))
>> + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_OPT_STATS,
>> + skb->len, skb->data);
>> + }
>> }
>> EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
>>
>> --
>> 2.8.0.rc3.226.g39d4020
>>
[-- Attachment #2: ts-avr32-fix.diff --]
[-- Type: text/x-patch, Size: 335 bytes --]
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..5a65042 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
#define SO_CNX_ADVICE 53
+#define SCM_TIMESTAMPING_OPT_STATS 54
+
#endif /* _UAPI__ASM_AVR32_SOCKET_H */
^ permalink raw reply related
* [PATCH 3/3] uapi: export nf_log.h
From: Stephen Hemminger @ 2016-12-02 22:50 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
In-Reply-To: <20161202225050.30048-1-sthemmin@microsoft.com>
File is in uapi directory but not being copied on
make install_headers
Fixes commit 4ec9c8fbbc22 ("netfilter: nft_log: complete
NFTA_LOG_FLAGS attr support").
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
include/uapi/linux/netfilter/Kbuild | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild
index cd26d7a..03f194a 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -5,6 +5,7 @@ header-y += nf_conntrack_ftp.h
header-y += nf_conntrack_sctp.h
header-y += nf_conntrack_tcp.h
header-y += nf_conntrack_tuple_common.h
+header-y += nf_log.h
header-y += nf_tables.h
header-y += nf_tables_compat.h
header-y += nf_nat.h
--
2.10.2
^ permalink raw reply related
* [PATCH 1/3] uapi: export tc tunnel key file
From: Stephen Hemminger @ 2016-12-02 22:50 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
In-Reply-To: <20161202225050.30048-1-sthemmin@microsoft.com>
Fixes commit 21609ae32aaf6c6fab0e ("net/sched: Introduce act_tunnel_key")
The file is necessary for iproute2 headers but was not being
copied by make install_headers
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
include/uapi/linux/tc_act/Kbuild | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index e3969bd..9611c7b 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -11,3 +11,4 @@ header-y += tc_vlan.h
header-y += tc_bpf.h
header-y += tc_connmark.h
header-y += tc_ife.h
+header-y += tc_tunnel_key.h
--
2.10.2
^ permalink raw reply related
* [PATCH 2/3] uapi: export tc_skbmod.h
From: Stephen Hemminger @ 2016-12-02 22:50 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
In-Reply-To: <20161202225050.30048-1-sthemmin@microsoft.com>
Fixes commit 735cffe5d800 ("net_sched: Introduce skbmod action")
Not used by iproute2 but maybe in future.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
include/uapi/linux/tc_act/Kbuild | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index 9611c7b..721433e 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -12,3 +12,4 @@ header-y += tc_bpf.h
header-y += tc_connmark.h
header-y += tc_ife.h
header-y += tc_tunnel_key.h
+header-y += tc_sbkmod.h
--
2.10.2
^ permalink raw reply related
* [PATCH 0/3] UAPI export missing headers
From: Stephen Hemminger @ 2016-12-02 22:50 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
Files not being exported by make install headers
Stephen Hemminger (3):
tc: export tunnel key file
uapi: export tc_skbmod.h
uapi: export nf_log.h
include/uapi/linux/netfilter/Kbuild | 1 +
include/uapi/linux/tc_act/Kbuild | 2 ++
2 files changed, 3 insertions(+)
--
2.10.2
^ permalink raw reply
* [PATCH 3/3] uapi: export nf_log.h
From: Stephen Hemminger @ 2016-12-02 22:54 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
In-Reply-To: <20161202225400.32751-1-sthemmin@microsoft.com>
File is in uapi directory but not being copied on
make install_headers
Fixes commit 4ec9c8fbbc22 ("netfilter: nft_log: complete
NFTA_LOG_FLAGS attr support").
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
include/uapi/linux/netfilter/Kbuild | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild
index cd26d7a..03f194a 100644
--- a/include/uapi/linux/netfilter/Kbuild
+++ b/include/uapi/linux/netfilter/Kbuild
@@ -5,6 +5,7 @@ header-y += nf_conntrack_ftp.h
header-y += nf_conntrack_sctp.h
header-y += nf_conntrack_tcp.h
header-y += nf_conntrack_tuple_common.h
+header-y += nf_log.h
header-y += nf_tables.h
header-y += nf_tables_compat.h
header-y += nf_nat.h
--
2.10.2
^ permalink raw reply related
* [PATCH 2/3] uapi: export tc_skbmod.h
From: Stephen Hemminger @ 2016-12-02 22:53 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
In-Reply-To: <20161202225400.32751-1-sthemmin@microsoft.com>
Fixes commit 735cffe5d800 ("net_sched: Introduce skbmod action")
Not used by iproute2 but maybe in future.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
include/uapi/linux/tc_act/Kbuild | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index 9611c7b..e3db740 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -12,3 +12,4 @@ header-y += tc_bpf.h
header-y += tc_connmark.h
header-y += tc_ife.h
header-y += tc_tunnel_key.h
+header-y += tc_skbmod.h
--
2.10.2
^ permalink raw reply related
* [PATCH 1/3] uapi: export tc tunnel key file
From: Stephen Hemminger @ 2016-12-02 22:53 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
In-Reply-To: <20161202225400.32751-1-sthemmin@microsoft.com>
Fixes commit 21609ae32aaf6c6fab0e ("net/sched: Introduce act_tunnel_key")
The file is necessary for iproute2 headers but was not being
copied by make install_headers
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
include/uapi/linux/tc_act/Kbuild | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index e3969bd..9611c7b 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -11,3 +11,4 @@ header-y += tc_vlan.h
header-y += tc_bpf.h
header-y += tc_connmark.h
header-y += tc_ife.h
+header-y += tc_tunnel_key.h
--
2.10.2
^ permalink raw reply related
* [PATCH v2 0/3] uapi: add kbuild for some files
From: Stephen Hemminger @ 2016-12-02 22:53 UTC (permalink / raw)
To: davem; +Cc: netdev, pablo, jhs, Stephen Hemminger
Some files which are in uapi but not being copied
by make headers_install
Stephen Hemminger (3):
tc: export tunnel key file
uapi: export tc_skbmod.h
uapi: export nf_log.h
include/uapi/linux/netfilter/Kbuild | 1 +
include/uapi/linux/tc_act/Kbuild | 2 ++
2 files changed, 3 insertions(+)
V2 - typo in s/sbkmod/skbmod/
--
2.10.2
^ permalink raw reply
* [PATCH net v3] tcp: warn on bogus MSS and try to amend it
From: Marcelo Ricardo Leitner @ 2016-12-02 22:51 UTC (permalink / raw)
To: netdev
Cc: Jon Maxwell, Alex Sidorenko, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, tlfalcon, Brian King,
Eric Dumazet, davem, marcelo.leitner
There have been some reports lately about TCP connection stalls caused
by NIC drivers that aren't setting gso_size on aggregated packets on rx
path. This causes TCP to assume that the MSS is actually the size of the
aggregated packet, which is invalid.
Although the proper fix is to be done at each driver, it's often hard
and cumbersome for one to debug, come to such root cause and report/fix
it.
This patch amends this situation in two ways. First, it adds a warning
on when this situation occurs, so it gives a hint to those trying to
debug this. It also limit the maximum probed MSS to the adverised MSS,
as it should never be any higher than that.
The result is that the connection may not have the best performance ever
but it shouldn't stall, and the admin will have a hint on what to look
for.
Tested with virtio by forcing gso_size to 0.
v2: updated msg per David suggestion
v3: use skb_iif to find the interface and also log its name, per Eric
Dumazet suggestion. As the skb may be backlogged and the interface
gone by then, we need to check if the number still has a meaning.
Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
net/ipv4/tcp_input.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..042a8a895e97d04afbdc377830537e8fd3b15d1e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -144,7 +144,21 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
*/
len = skb_shinfo(skb)->gso_size ? : skb->len;
if (len >= icsk->icsk_ack.rcv_mss) {
- icsk->icsk_ack.rcv_mss = len;
+ static bool __once __read_mostly;
+
+ icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
+ tcp_sk(sk)->advmss);
+ if (icsk->icsk_ack.rcv_mss != len && !__once) {
+ struct net_device *dev;
+
+ __once = true;
+
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
+ pr_warn_once("%s: Driver has suspect GRO implementation, TCP performance may be compromised.\n",
+ dev ? dev->name : "Unknown driver");
+ rcu_read_unlock();
+ }
} else {
/* Otherwise, we make more careful check taking into account,
* that SACKs block is variable.
--
2.9.3
^ permalink raw reply related
* Re: [PATCH net v2] tcp: warn on bogus MSS and try to amend it
From: Marcelo Ricardo Leitner @ 2016-12-02 22:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, Jon Maxwell, Alex Sidorenko, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, tlfalcon,
Brian King, davem
In-Reply-To: <1480689924.18162.356.camel@edumazet-glaptop3.roam.corp.google.com>
On Fri, Dec 02, 2016 at 06:45:24AM -0800, Eric Dumazet wrote:
> On Fri, 2016-12-02 at 08:55 -0200, Marcelo Ricardo Leitner wrote:
> > There have been some reports lately about TCP connection stalls caused
> > by NIC drivers that aren't setting gso_size on aggregated packets on rx
> > path. This causes TCP to assume that the MSS is actually the size of the
> > aggregated packet, which is invalid.
> >
> > Although the proper fix is to be done at each driver, it's often hard
> > and cumbersome for one to debug, come to such root cause and report/fix
> > it.
> >
> > This patch amends this situation in two ways. First, it adds a warning
> > on when this situation occurs, so it gives a hint to those trying to
> > debug this. It also limit the maximum probed MSS to the adverised MSS,
> > as it should never be any higher than that.
> >
> > The result is that the connection may not have the best performance ever
> > but it shouldn't stall, and the admin will have a hint on what to look
> > for.
> >
> > Tested with virtio by forcing gso_size to 0.
> >
> > Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > ---
> > v2: Updated msg as suggested by David.
> >
> > net/ipv4/tcp_input.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..fd619eb93749b6de56a41669248b337c051d9fe2 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -144,7 +144,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> > */
> > len = skb_shinfo(skb)->gso_size ? : skb->len;
> > if (len >= icsk->icsk_ack.rcv_mss) {
> > - icsk->icsk_ack.rcv_mss = len;
> > + icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > + tcp_sk(sk)->advmss);
> > + if (icsk->icsk_ack.rcv_mss != len)
> > + pr_warn_once("Driver has suspect GRO implementation, TCP performance may be compromised.\n");
> > } else {
> > /* Otherwise, we make more careful check taking into account,
> > * that SACKs block is variable.
>
>
> skb->dev is indeed NULL, but it might be worth getting back the device
> using skb->skb_iif maybe ?
>
Yes, then it's possible. But I have to add an extra check because it
involves a search (iif -> net_device) and I can't wrap that inside
pr_warn_once(). I hope it doesn't get too cluttered then. Posting v3 in
a few.. Thanks
^ permalink raw reply
* Re: [PATCH net-next v2 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: Paul Gortmaker @ 2016-12-02 22:30 UTC (permalink / raw)
To: Yuchung Cheng
Cc: David Miller, soheil, francisyyan, netdev, ncardwell, edumazet,
linux-next@vger.kernel.org
In-Reply-To: <1480316838-154141-7-git-send-email-ycheng@google.com>
On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch exports the sender chronograph stats via the socket
> SO_TIMESTAMPING channel. Currently we can instrument how long a
> particular application unit of data was queued in TCP by tracking
> SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
Seems a new linux-next failure leads back to here ; I did not run a
full bisect, since the variable seems confined to this commit:
net/socket.c:701: error: 'SCM_TIMESTAMPING_OPT_STATS' undeclared
(first use in this function)
http://kisskb.ellerman.id.au/kisskb/buildresult/12875981/
Paul.
--
> these sender chronograph stats exported simultaneously along with
> these timestamps allow further breaking down the various sender
> limitation. For example, a video server can tell if a particular
> chunk of video on a connection takes a long time to deliver because
> TCP was experiencing small receive window. It is not possible to
> tell before this patch without packet traces.
>
> To prepare these stats, the user needs to set
> SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
> while requesting other SOF_TIMESTAMPING TX timestamps. When the
> timestamps are available in the error queue, the stats are returned
> in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
> in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
> TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---
> ChangeLog since v1:
> - fix build break if CONFIG_INET is not defined
>
> Documentation/networking/timestamping.txt | 10 ++++++++++
> arch/alpha/include/uapi/asm/socket.h | 2 ++
> arch/frv/include/uapi/asm/socket.h | 2 ++
> arch/ia64/include/uapi/asm/socket.h | 2 ++
> arch/m32r/include/uapi/asm/socket.h | 2 ++
> arch/mips/include/uapi/asm/socket.h | 2 ++
> arch/mn10300/include/uapi/asm/socket.h | 2 ++
> arch/parisc/include/uapi/asm/socket.h | 2 ++
> arch/powerpc/include/uapi/asm/socket.h | 2 ++
> arch/s390/include/uapi/asm/socket.h | 2 ++
> arch/sparc/include/uapi/asm/socket.h | 2 ++
> arch/xtensa/include/uapi/asm/socket.h | 2 ++
> include/linux/tcp.h | 2 ++
> include/uapi/asm-generic/socket.h | 2 ++
> include/uapi/linux/net_tstamp.h | 3 ++-
> include/uapi/linux/tcp.h | 8 ++++++++
> net/core/skbuff.c | 14 +++++++++++---
> net/core/sock.c | 7 +++++++
> net/ipv4/tcp.c | 20 ++++++++++++++++++++
> net/socket.c | 7 ++++++-
> 20 files changed, 90 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
> index 671cccf..96f5069 100644
> --- a/Documentation/networking/timestamping.txt
> +++ b/Documentation/networking/timestamping.txt
> @@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY:
> the timestamp even if sysctl net.core.tstamp_allow_data is 0.
> This option disables SOF_TIMESTAMPING_OPT_CMSG.
>
> +SOF_TIMESTAMPING_OPT_STATS:
> +
> + Optional stats that are obtained along with the transmit timestamps.
> + It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
> + transmit timestamp is available, the stats are available in a
> + separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a
> + list of TLVs (struct nlattr) of types. These stats allow the
> + application to associate various transport layer stats with
> + the transmit timestamps, such as how long a certain block of
> + data was limited by peer's receiver window.
>
> New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
> disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index 9e46d6e..afc901b 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -97,4 +97,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _UAPI_ASM_SOCKET_H */
> diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
> index afbc98f0..81e0353 100644
> --- a/arch/frv/include/uapi/asm/socket.h
> +++ b/arch/frv/include/uapi/asm/socket.h
> @@ -90,5 +90,7 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _ASM_SOCKET_H */
>
> diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
> index 0018fad..57feb0c 100644
> --- a/arch/ia64/include/uapi/asm/socket.h
> +++ b/arch/ia64/include/uapi/asm/socket.h
> @@ -99,4 +99,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _ASM_IA64_SOCKET_H */
> diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
> index 5fe42fc..5853f8e9 100644
> --- a/arch/m32r/include/uapi/asm/socket.h
> +++ b/arch/m32r/include/uapi/asm/socket.h
> @@ -90,4 +90,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _ASM_M32R_SOCKET_H */
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 2027240a..566ecdc 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -108,4 +108,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _UAPI_ASM_SOCKET_H */
> diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
> index 5129f23..0e12527 100644
> --- a/arch/mn10300/include/uapi/asm/socket.h
> +++ b/arch/mn10300/include/uapi/asm/socket.h
> @@ -90,4 +90,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _ASM_SOCKET_H */
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index 9c935d7..7a109b7 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -89,4 +89,6 @@
>
> #define SO_CNX_ADVICE 0x402E
>
> +#define SCM_TIMESTAMPING_OPT_STATS 0x402F
> +
> #endif /* _UAPI_ASM_SOCKET_H */
> diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
> index 1672e33..44583a5 100644
> --- a/arch/powerpc/include/uapi/asm/socket.h
> +++ b/arch/powerpc/include/uapi/asm/socket.h
> @@ -97,4 +97,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _ASM_POWERPC_SOCKET_H */
> diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
> index 41b51c2..b24a64c 100644
> --- a/arch/s390/include/uapi/asm/socket.h
> +++ b/arch/s390/include/uapi/asm/socket.h
> @@ -96,4 +96,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _ASM_SOCKET_H */
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 31aede3..a25dc32 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -86,6 +86,8 @@
>
> #define SO_CNX_ADVICE 0x0037
>
> +#define SCM_TIMESTAMPING_OPT_STATS 0x0038
> +
> /* Security levels - as per NRL IPv6 - don't actually do anything */
> #define SO_SECURITY_AUTHENTICATION 0x5001
> #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
> diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
> index 81435d9..9fdbe1f 100644
> --- a/arch/xtensa/include/uapi/asm/socket.h
> +++ b/arch/xtensa/include/uapi/asm/socket.h
> @@ -101,4 +101,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* _XTENSA_SOCKET_H */
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index d5d3bd8..00e0ee8 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -428,4 +428,6 @@ static inline void tcp_saved_syn_free(struct tcp_sock *tp)
> tp->saved_syn = NULL;
> }
>
> +struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk);
> +
> #endif /* _LINUX_TCP_H */
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 67d632f..2c748dd 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -92,4 +92,6 @@
>
> #define SO_CNX_ADVICE 53
>
> +#define SCM_TIMESTAMPING_OPT_STATS 54
> +
> #endif /* __ASM_GENERIC_SOCKET_H */
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 264e515..464dcca 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -25,8 +25,9 @@ enum {
> SOF_TIMESTAMPING_TX_ACK = (1<<9),
> SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
> SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
> + SOF_TIMESTAMPING_OPT_STATS = (1<<12),
>
> - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TSONLY,
> + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS,
> SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
> SOF_TIMESTAMPING_LAST
> };
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 2863b66..c53de26 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -220,6 +220,14 @@ struct tcp_info {
> __u64 tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
> };
>
> +/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
> +enum {
> + TCP_NLA_PAD,
> + TCP_NLA_BUSY, /* Time (usec) busy sending data */
> + TCP_NLA_RWND_LIMITED, /* Time (usec) limited by receive window */
> + TCP_NLA_SNDBUF_LIMITED, /* Time (usec) limited by send buffer */
> +};
> +
> /* for TCP_MD5SIG socket option */
> #define TCP_MD5SIG_MAXKEYLEN 80
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d1d1a5a..ea6fa95 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3839,10 +3839,18 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> if (!skb_may_tx_timestamp(sk, tsonly))
> return;
>
> - if (tsonly)
> - skb = alloc_skb(0, GFP_ATOMIC);
> - else
> + if (tsonly) {
> +#ifdef CONFIG_INET
> + if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
> + sk->sk_protocol == IPPROTO_TCP &&
> + sk->sk_type == SOCK_STREAM)
> + skb = tcp_get_timestamping_opt_stats(sk);
> + else
> +#endif
> + skb = alloc_skb(0, GFP_ATOMIC);
> + } else {
> skb = skb_clone(orig_skb, GFP_ATOMIC);
> + }
> if (!skb)
> return;
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 14e6145..d8c7f8c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -854,6 +854,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
> sk->sk_tskey = 0;
> }
> }
> +
> + if (val & SOF_TIMESTAMPING_OPT_STATS &&
> + !(val & SOF_TIMESTAMPING_OPT_TSONLY)) {
> + ret = -EINVAL;
> + break;
> + }
> +
> sk->sk_tsflags = val;
> if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
> sock_enable_timestamp(sk,
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index cdde20f..1149b48 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2841,6 +2841,26 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
> }
> EXPORT_SYMBOL_GPL(tcp_get_info);
>
> +struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk)
> +{
> + const struct tcp_sock *tp = tcp_sk(sk);
> + struct sk_buff *stats;
> + struct tcp_info info;
> +
> + stats = alloc_skb(3 * nla_total_size_64bit(sizeof(u64)), GFP_ATOMIC);
> + if (!stats)
> + return NULL;
> +
> + tcp_get_info_chrono_stats(tp, &info);
> + nla_put_u64_64bit(stats, TCP_NLA_BUSY,
> + info.tcpi_busy_time, TCP_NLA_PAD);
> + nla_put_u64_64bit(stats, TCP_NLA_RWND_LIMITED,
> + info.tcpi_rwnd_limited, TCP_NLA_PAD);
> + nla_put_u64_64bit(stats, TCP_NLA_SNDBUF_LIMITED,
> + info.tcpi_sndbuf_limited, TCP_NLA_PAD);
> + return stats;
> +}
> +
> static int do_tcp_getsockopt(struct sock *sk, int level,
> int optname, char __user *optval, int __user *optlen)
> {
> diff --git a/net/socket.c b/net/socket.c
> index e2584c5..e631894 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -693,9 +693,14 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
> empty = 0;
> - if (!empty)
> + if (!empty) {
> put_cmsg(msg, SOL_SOCKET,
> SCM_TIMESTAMPING, sizeof(tss), &tss);
> +
> + if (skb->len && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS))
> + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_OPT_STATS,
> + skb->len, skb->data);
> + }
> }
> EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
>
> --
> 2.8.0.rc3.226.g39d4020
>
^ permalink raw reply
* Re: [PATCH iproute2/net-next] ss: initialise variables outside of for loop
From: Stephen Hemminger @ 2016-12-02 22:18 UTC (permalink / raw)
To: Simon Horman; +Cc: netdev
In-Reply-To: <1480679765-9014-1-git-send-email-simon.horman@netronome.com>
On Fri, 2 Dec 2016 12:56:05 +0100
Simon Horman <simon.horman@netronome.com> wrote:
> Initialise for loops outside of for loops. GCC flags this as being
> out of spec unless C99 or C11 mode is used.
>
> With this change the entire tree appears to compile cleanly with -Wall.
>
> $ gcc --version
> gcc (Debian 4.9.2-10) 4.9.2
> ...
> $ make
> ...
> ss.c: In function ‘unix_show_sock’:
> ss.c:3128:4: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
> ...
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Applied.
Note, I used to have -Wall in Makefile but old GCC were broken and would give
aliasing warnings.
^ permalink raw reply
* Re: [PATCH iproute2 V5 0/3] tc: Support for ip tunnel metadata set/unset/classify
From: Stephen Hemminger @ 2016-12-02 22:16 UTC (permalink / raw)
To: Amir Vadai; +Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Roi Dayan
In-Reply-To: <20161202112515.11705-1-amir@vadai.me>
On Fri, 2 Dec 2016 13:25:12 +0200
Amir Vadai <amir@vadai.me> wrote:
> Hi,
>
> This short series adds support for matching and setting metadata for ip tunnel
> shared device using the TC system, introduced in kernel 4.9 [1].
>
> Applied and tested on top of commit b6c7fc61faab ("ss: print new tcp_info
> fields: busy, rwnd-limited, sndbuf-limited times")
>
>
> Example usage:
>
> $ tc filter add dev vxlan0 protocol ip parent ffff: \
> flower \
> enc_src_ip 11.11.0.2 \
> enc_dst_ip 11.11.0.1 \
> enc_key_id 11 \
> dst_ip 11.11.11.1 \
> action mirred egress redirect dev vnet0
>
> $ tc filter add dev net0 protocol ip parent ffff: \
> flower \
> ip_proto 1 \
> dst_ip 11.11.11.2 \
> action tunnel_key set \
> src_ip 11.11.0.1 \
> dst_ip 11.11.0.2 \
> id 11 \
> action mirred egress redirect dev vxlan0
>
> [1] - d1ba24feb466 ("Merge branch 'act_tunnel_key'")
>
> Thanks,
> Amir
>
> Changes from V4:
> - Fix rebase conflicts for net-next
>
> Changes from V3:
> - Fix bad wording in the man page about the use of the 'unset' operation
>
> Changes from V2:
> - Use const where needed
> - Don't lose return value
> - Introduce rta_getattr_be16() and rta_getattr_be32()
>
> Changes from V1:
> - Updated Patch 2/2 ("tc/act_tunnel: Introduce ip tunnel action") commit log
> and the man page tc-tunnel_key to reflect the fact that 'unset' operation is
> no mandatory.
> And describe when it might be needed.
> - Rename the 'release' operation to 'unset'
>
> Amir Vadai (3):
> libnetlink: Introduce rta_getattr_be*()
> tc/cls_flower: Classify packet in ip tunnels
> tc/act_tunnel: Introduce ip tunnel action
>
> Amir Vadai (3):
> libnetlink: Introduce rta_getattr_be*()
> tc/cls_flower: Classify packet in ip tunnels
> tc/act_tunnel: Introduce ip tunnel action
>
> bridge/fdb.c | 4 +-
> include/libnetlink.h | 9 ++
> include/linux/tc_act/tc_tunnel_key.h | 42 ++++++
> ip/iplink_geneve.c | 2 +-
> ip/iplink_vxlan.c | 2 +-
> man/man8/tc-flower.8 | 17 ++-
> man/man8/tc-tunnel_key.8 | 112 +++++++++++++++
> tc/Makefile | 1 +
> tc/f_flower.c | 84 +++++++++++-
> tc/m_tunnel_key.c | 258 +++++++++++++++++++++++++++++++++++
> 10 files changed, 522 insertions(+), 9 deletions(-)
> create mode 100644 include/linux/tc_act/tc_tunnel_key.h
> create mode 100644 man/man8/tc-tunnel_key.8
> create mode 100644 tc/m_tunnel_key.c
>
Series applied
^ permalink raw reply
* Re: [PATCH net-next 3/4] tcp: tsq: add shortcut in tcp_tasklet_func()
From: Eric Dumazet @ 2016-12-02 22:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, netdev
In-Reply-To: <1480703159-2327-4-git-send-email-edumazet@google.com>
On Fri, 2016-12-02 at 10:25 -0800, Eric Dumazet wrote:
> Under high stress, I've seen tcp_tasklet_func() consuming
> ~700 usec, handling ~150 tcp sockets.
>
> By setting TCP_TSQ_DEFERRED in tcp_wfree(), we give a chance
> for other cpus/threads entering tcp_write_xmit() to grab it,
> allowing tcp_tasklet_func() to skip sockets that already did
> an xmit cycle.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
...
> @@ -884,7 +884,7 @@ void tcp_wfree(struct sk_buff *skb)
> if (!(oval & TSQF_THROTTLED) || (oval & TSQF_QUEUED))
> goto out;
>
> - nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED;
> + nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | TCP_TSQ_DEFERRED;
Typo here...
Should be :
nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | TCPF_TSQ_DEFERRED;
> nval = cmpxchg(&tp->tsq_flags, oval, nval);
> if (nval != oval)
> continue;
^ permalink raw reply
* Re: [iproute PATCH v2 00/18] ss: Minor code review
From: Stephen Hemminger @ 2016-12-02 22:09 UTC (permalink / raw)
To: Phil Sutter; +Cc: netdev
In-Reply-To: <20161202104002.17310-1-phil@nwl.cc>
On Fri, 2 Dec 2016 11:39:44 +0100
Phil Sutter <phil@nwl.cc> wrote:
> This is a series of misc changes to ss code which happened as fall-out
> when working on a unified output formatter (still unfinished).
>
> Changes since v1:
> - Rebased onto current upstream, resolved conflicts in patch 4 generated
> by previously added SCTP socket support.
>
> Phil Sutter (18):
> ss: Mark fall through in arg parsing switch()
> ss: Drop empty lines in UDP output
> ss: Add missing tab when printing UNIX details
> ss: Use sockstat->type in all socket types
> ss: introduce proc_ctx_print()
> ss: Drop list traversal from unix_stats_print()
> ss: Eliminate unix_use_proc()
> ss: Turn generic_proc_open() wrappers into macros
> ss: Make tmr_name local to tcp_timer_print()
> ss: Make user_ent_hash_build_init local to user_ent_hash_build()
> ss: Make some variables function-local
> ss: Make slabstat_ids local to get_slabstat()
> ss: Get rid of useless goto in handle_follow_request()
> ss: Get rid of single-fielded struct snmpstat
> ss: Make unix_state_map local to unix_show()
> ss: Make sstate_name local to sock_state_print()
> ss: Make sstate_namel local to scan_state()
> ss: unix_show: No need to initialize members of calloc'ed structs
>
> misc/ss.c | 532 ++++++++++++++++++++++++++------------------------------------
> 1 file changed, 224 insertions(+), 308 deletions(-)
>
Applied, thanks
^ permalink raw reply
* Re: [PATCH iproute2 1/1] tc: updated man page to reflect handle-id use in filter GET command.
From: Stephen Hemminger @ 2016-12-02 22:06 UTC (permalink / raw)
To: Roman Mashak; +Cc: netdev, sathya.perla
In-Reply-To: <1480623644-27533-1-git-send-email-mrv@mojatatu.com>
On Thu, 1 Dec 2016 15:20:44 -0500
Roman Mashak <mrv@mojatatu.com> wrote:
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
> man/man8/tc.8 | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/man/man8/tc.8 b/man/man8/tc.8
> index 8a47a2b..d957ffa 100644
> --- a/man/man8/tc.8
> +++ b/man/man8/tc.8
> @@ -32,7 +32,9 @@ class-id ] qdisc
> DEV
> .B [ parent
> qdisc-id
> -.B | root ] protocol
> +.B | root ] [ handle
> +handle-id ]
> +.B protocol
> protocol
> .B prio
> priority filtertype
> @@ -577,7 +579,7 @@ it is created.
>
> .TP
> get
> -Displays a single filter given the interface, parent ID, priority, protocol and handle ID.
> +Displays a single filter given the interface, qdisc-id, priority, protocol and handle-id.
>
> .TP
> show
The proper syntax for man page usage section is to put keywords in bold and any value
that is variable in italic.
I know this whole man page doesn't do this correctly. But that doesn't mean that new
additions should continue with the mistake.
Please revise and resubmit. Extra bonus points for fixing the other bits.
^ permalink raw reply
* Re: [PATCHv2 net-next 4/4] net: dsa: mv88e6xxx: Refactor CPU and DSA port setup
From: Vivien Didelot @ 2016-12-02 22:03 UTC (permalink / raw)
To: Andrew Lunn; +Cc: David Miller, netdev
In-Reply-To: <20161202211806.GF30716@lunn.ch>
Hi Andrew,
Andrew Lunn <andrew@lunn.ch> writes:
>> The port's EgressMode, FrameMode and EtherType are really tied together
>> to compose the mode of the port.
>
> Setting the EtherType is somewhat separate. It is only needed on ports
> using EDSA. And that can only happen on a CPU port. Humm, actually, i
> set it when i should not. But putting this in a wrapper actually hides
> this.
Wrong. The datasheet says:
> This Ether Type is used for many features depending upon the mode
> of the port (as defined by the port’s EgressMode and FrameMode
> bits – in Port Control, port offset 0x04).
It says that in Normal Network mode, this register can be used to trap,
mirror, etc. Also used in Provider and EDSA modes.
That is why it would be better to wrap them together to ensure correct
values when configuring a port's mode.
>
>> Could you add an helper in chip.c like:
>>
>> static int mv88e6xxx_set_port_mode(struct mv88e6xxx_chip *chip, int port,
>> enum mv88e6xxx_frame_mode frame_mode,
>> u16 egress_mode, bool egress_unknown,
>> u16 ethertype)
>> {
>> int err;
>>
>> if (chip->info->ops->port_set_frame_mode) {
>> err = chip->info->ops->port_set_frame_mode(chip, port, frame_mode);
>> if (err)
>> return err;
>> }
>
> Ignoring that it is not implemented here is wrong. It must be
> implemented, or the device is not going to work. It is a question of,
> do we want an oops, or return an error code.
Since that is done at setup time, returning an error is enough IMO to
inform the DSA layer that something went wrong.
Thanks,
Vivien
^ permalink raw reply
* [PATCH net-next v5] ipv6 addrconf: Implemented enhanced DAD (RFC7527)
From: Erik Nordmark @ 2016-12-02 22:00 UTC (permalink / raw)
To: davem; +Cc: netdev, hannes, Erik Nordmark, Bob Gilligan
Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
v2: renamed sysctl and made it default to true, plus minor code review fixes
v3: respun with later net-next; fixed whitespace issues
v4: fixed kbuild test robot for route.c; added Reviewed-by
v5: using %pM for printk of nonce
Documentation/networking/ip-sysctl.txt | 9 +++++++++
include/linux/ipv6.h | 1 +
include/net/if_inet6.h | 1 +
include/net/ndisc.h | 5 ++++-
include/uapi/linux/ipv6.h | 1 +
net/ipv6/addrconf.c | 22 +++++++++++++++++++++-
net/ipv6/ndisc.c | 29 ++++++++++++++++++++++++++---
net/ipv6/route.c | 2 +-
8 files changed, 64 insertions(+), 6 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5af48dd..d9ef566 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1729,6 +1729,15 @@ drop_unsolicited_na - BOOLEAN
By default this is turned off.
+enhanced_dad - BOOLEAN
+ Include a nonce option in the IPv6 neighbor solicitation messages used for
+ duplicate address detection per RFC7527. A received DAD NS will only signal
+ a duplicate address if the nonce is different. This avoids any false
+ detection of duplicates due to loopback of the NS messages that we send.
+ The nonce option will be sent on an interface unless both of
+ conf/{all,interface}/enhanced_dad are set to FALSE.
+ Default: TRUE
+
icmp/*:
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 3f95233..671d014 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -68,6 +68,7 @@ struct ipv6_devconf {
#ifdef CONFIG_IPV6_SEG6_HMAC
__s32 seg6_require_hmac;
#endif
+ __u32 enhanced_dad;
struct ctl_table_header *sysctl_header;
};
diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index b0576cb..0fa4c32 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -55,6 +55,7 @@ struct inet6_ifaddr {
__u8 stable_privacy_retry;
__u16 scope;
+ __u64 dad_nonce;
unsigned long cstamp; /* created timestamp */
unsigned long tstamp; /* updated timestamp */
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index be1fe228..d562a2f 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -31,6 +31,7 @@ enum {
ND_OPT_PREFIX_INFO = 3, /* RFC2461 */
ND_OPT_REDIRECT_HDR = 4, /* RFC2461 */
ND_OPT_MTU = 5, /* RFC2461 */
+ ND_OPT_NONCE = 14, /* RFC7527 */
__ND_OPT_ARRAY_MAX,
ND_OPT_ROUTE_INFO = 24, /* RFC4191 */
ND_OPT_RDNSS = 25, /* RFC5006 */
@@ -121,6 +122,7 @@ struct ndisc_options {
#define nd_opts_pi_end nd_opt_array[__ND_OPT_PREFIX_INFO_END]
#define nd_opts_rh nd_opt_array[ND_OPT_REDIRECT_HDR]
#define nd_opts_mtu nd_opt_array[ND_OPT_MTU]
+#define nd_opts_nonce nd_opt_array[ND_OPT_NONCE]
#define nd_802154_opts_src_lladdr nd_802154_opt_array[ND_OPT_SOURCE_LL_ADDR]
#define nd_802154_opts_tgt_lladdr nd_802154_opt_array[ND_OPT_TARGET_LL_ADDR]
@@ -398,7 +400,8 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
int ndisc_rcv(struct sk_buff *skb);
void ndisc_send_ns(struct net_device *dev, const struct in6_addr *solicit,
- const struct in6_addr *daddr, const struct in6_addr *saddr);
+ const struct in6_addr *daddr, const struct in6_addr *saddr,
+ u64 nonce);
void ndisc_send_rs(struct net_device *dev,
const struct in6_addr *saddr, const struct in6_addr *daddr);
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 53561be..eaf65dc 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -181,6 +181,7 @@ enum {
DEVCONF_RTR_SOLICIT_MAX_INTERVAL,
DEVCONF_SEG6_ENABLED,
DEVCONF_SEG6_REQUIRE_HMAC,
+ DEVCONF_ENHANCED_DAD,
DEVCONF_MAX
};
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4c387dc..c1e124b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -242,6 +242,7 @@ static bool ipv6_chk_same_addr(struct net *net, const struct in6_addr *addr,
#ifdef CONFIG_IPV6_SEG6_HMAC
.seg6_require_hmac = 0,
#endif
+ .enhanced_dad = 1,
};
static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -292,6 +293,7 @@ static bool ipv6_chk_same_addr(struct net *net, const struct in6_addr *addr,
#ifdef CONFIG_IPV6_SEG6_HMAC
.seg6_require_hmac = 0,
#endif
+ .enhanced_dad = 1,
};
/* Check if a valid qdisc is available */
@@ -3735,12 +3737,21 @@ static void addrconf_dad_kick(struct inet6_ifaddr *ifp)
{
unsigned long rand_num;
struct inet6_dev *idev = ifp->idev;
+ u64 nonce;
if (ifp->flags & IFA_F_OPTIMISTIC)
rand_num = 0;
else
rand_num = prandom_u32() % (idev->cnf.rtr_solicit_delay ? : 1);
+ nonce = 0;
+ if (idev->cnf.enhanced_dad ||
+ dev_net(idev->dev)->ipv6.devconf_all->enhanced_dad) {
+ do
+ get_random_bytes(&nonce, 6);
+ while (nonce == 0);
+ }
+ ifp->dad_nonce = nonce;
ifp->dad_probes = idev->cnf.dad_transmits;
addrconf_mod_dad_work(ifp, rand_num);
}
@@ -3918,7 +3929,8 @@ static void addrconf_dad_work(struct work_struct *w)
/* send a neighbour solicitation for our addr */
addrconf_addr_solict_mult(&ifp->addr, &mcaddr);
- ndisc_send_ns(ifp->idev->dev, &ifp->addr, &mcaddr, &in6addr_any);
+ ndisc_send_ns(ifp->idev->dev, &ifp->addr, &mcaddr, &in6addr_any,
+ ifp->dad_nonce);
out:
in6_ifa_put(ifp);
rtnl_unlock();
@@ -4962,6 +4974,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
#ifdef CONFIG_IPV6_SEG6_HMAC
array[DEVCONF_SEG6_REQUIRE_HMAC] = cnf->seg6_require_hmac;
#endif
+ array[DEVCONF_ENHANCED_DAD] = cnf->enhanced_dad;
}
static inline size_t inet6_ifla6_size(void)
@@ -6070,6 +6083,13 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct ctl_table *ctl,
},
#endif
{
+ .procname = "enhanced_dad",
+ .data = &ipv6_devconf.enhanced_dad,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
/* sentinel */
}
};
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index d8e6714..7ebac63 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -233,6 +233,7 @@ struct ndisc_options *ndisc_parse_options(const struct net_device *dev,
case ND_OPT_SOURCE_LL_ADDR:
case ND_OPT_TARGET_LL_ADDR:
case ND_OPT_MTU:
+ case ND_OPT_NONCE:
case ND_OPT_REDIRECT_HDR:
if (ndopts->nd_opt_array[nd_opt->nd_opt_type]) {
ND_PRINTK(2, warn,
@@ -568,7 +569,8 @@ static void ndisc_send_unsol_na(struct net_device *dev)
}
void ndisc_send_ns(struct net_device *dev, const struct in6_addr *solicit,
- const struct in6_addr *daddr, const struct in6_addr *saddr)
+ const struct in6_addr *daddr, const struct in6_addr *saddr,
+ u64 nonce)
{
struct sk_buff *skb;
struct in6_addr addr_buf;
@@ -588,6 +590,8 @@ void ndisc_send_ns(struct net_device *dev, const struct in6_addr *solicit,
if (inc_opt)
optlen += ndisc_opt_addr_space(dev,
NDISC_NEIGHBOUR_SOLICITATION);
+ if (nonce != 0)
+ optlen += 8;
skb = ndisc_alloc_skb(dev, sizeof(*msg) + optlen);
if (!skb)
@@ -605,6 +609,13 @@ void ndisc_send_ns(struct net_device *dev, const struct in6_addr *solicit,
ndisc_fill_addr_option(skb, ND_OPT_SOURCE_LL_ADDR,
dev->dev_addr,
NDISC_NEIGHBOUR_SOLICITATION);
+ if (nonce != 0) {
+ u8 *opt = skb_put(skb, 8);
+
+ opt[0] = ND_OPT_NONCE;
+ opt[1] = 8 >> 3;
+ memcpy(opt + 2, &nonce, 6);
+ }
ndisc_send_skb(skb, daddr, saddr);
}
@@ -693,12 +704,12 @@ static void ndisc_solicit(struct neighbour *neigh, struct sk_buff *skb)
"%s: trying to ucast probe in NUD_INVALID: %pI6\n",
__func__, target);
}
- ndisc_send_ns(dev, target, target, saddr);
+ ndisc_send_ns(dev, target, target, saddr, 0);
} else if ((probes -= NEIGH_VAR(neigh->parms, APP_PROBES)) < 0) {
neigh_app_ns(neigh);
} else {
addrconf_addr_solict_mult(target, &mcaddr);
- ndisc_send_ns(dev, target, &mcaddr, saddr);
+ ndisc_send_ns(dev, target, &mcaddr, saddr, 0);
}
}
@@ -742,6 +753,7 @@ static void ndisc_recv_ns(struct sk_buff *skb)
int dad = ipv6_addr_any(saddr);
bool inc;
int is_router = -1;
+ u64 nonce = 0;
if (skb->len < sizeof(struct nd_msg)) {
ND_PRINTK(2, warn, "NS: packet too short\n");
@@ -786,6 +798,8 @@ static void ndisc_recv_ns(struct sk_buff *skb)
return;
}
}
+ if (ndopts.nd_opts_nonce)
+ memcpy(&nonce, (u8 *)(ndopts.nd_opts_nonce + 1), 6);
inc = ipv6_addr_is_multicast(daddr);
@@ -794,6 +808,15 @@ static void ndisc_recv_ns(struct sk_buff *skb)
have_ifp:
if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
if (dad) {
+ if (nonce != 0 && ifp->dad_nonce == nonce) {
+ u8 *np = (u8 *)&nonce;
+ /* Matching nonce if looped back */
+ ND_PRINTK(2, notice,
+ "%s: IPv6 DAD loopback for address %pI6c nonce %pM ignored\n",
+ ifp->idev->dev->name,
+ &ifp->addr, np);
+ goto out;
+ }
/*
* We are colliding with another node
* who is doing DAD
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b317bb1..aac7818 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -527,7 +527,7 @@ static void rt6_probe_deferred(struct work_struct *w)
container_of(w, struct __rt6_probe_work, work);
addrconf_addr_solict_mult(&work->target, &mcaddr);
- ndisc_send_ns(work->dev, &work->target, &mcaddr, NULL);
+ ndisc_send_ns(work->dev, &work->target, &mcaddr, NULL, 0);
dev_put(work->dev);
kfree(work);
}
--
1.8.1.4
^ permalink raw reply related
* Re: [PATCHv2 net-next 2/4] net: dsa: mv88e6xxx: Monitor and Management tables
From: Vivien Didelot @ 2016-12-02 21:53 UTC (permalink / raw)
To: Andrew Lunn; +Cc: David Miller, netdev
In-Reply-To: <20161202205656.GD30716@lunn.ch>
Hi Andrew,
Andrew Lunn <andrew@lunn.ch> writes:
> On Fri, Dec 02, 2016 at 02:32:39PM -0500, Vivien Didelot wrote:
>> Hi Andrew,
>>
>> Andrew Lunn <andrew@lunn.ch> writes:
>>
>> > @@ -3184,6 +3186,8 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
>> > .stats_get_sset_count = mv88e6095_stats_get_sset_count,
>> > .stats_get_strings = mv88e6095_stats_get_strings,
>> > .stats_get_stats = mv88e6095_stats_get_stats,
>> > + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
>> > + .g1_set_egress_port = mv88e6095_g1_set_egress_port,
>> > };
>>
>> I like the implementation in this version better. But please explain me
>> why you are prefixing these operations with g1_?
>
> The prefix gives some basic grouping. port_ indicates it operates on a
> port, and is likely to be found in port.c. stats_ indicates it
> operates on statistics, ppu that is operates on the phy polling unit.
Yes, port_* operations operate on ports. But the port.c file is there to
implement the function of "Port Registers". "Port" can be confusing, but
it refers to the SMI internal device at address 0xsomething.
"port_", "ppu_", "stats_", in the mv88e6xxx_ops structure just give
implicit namespaces for the **features**, not their location!
> We are going to have some things which don't fall into a simple
> category, like these two. But it would however be nice to group them,
> so i picked which register bank they are in. These operations are
> always in g1. It is a useful hint as to where to find the different
> variants.
Absolutely not!
.set_egress_port = mv88e6095_g1_set_egress_port,
^
That is the useful hint!
At the higher level of chip.c, we don't care about where is implemented
the switch MAC setter. We just have to call the correctly defined
.set_switch_mac routine.
However if you do care to know, its _ops.set_switch_mac pointer will
tell you (_g1 vs _g2 prefix).
>> But let's imagine we can set the CPU port in some Global 2 registers.
>> You are going to wrap this in chip.c with something like:
>>
>> int mv88e6xxx_set_cpu_port(struct mv88e6xxx_chip *chip, int port)
>> {
>> if (chip->info->ops->g2_set_cpu_port)
>> return chip->info->ops->g2_set_cpu_port(chip, port);
>> else if (chip->info->ops->g1_set_cpu_port)
>> return chip->info->ops->g1_set_cpu_port(chip, port);
>> else
>> return -EOPNOTSUPP;
>> }
>
> I answered in one of my other emails. Frames with reserved MAC
> addresses can be forwarded to the CPU. For most devices, this is a g2
> operation. However, for 6390, it is a g1. In that case, my code does
> not use a prefix. Not having a prefix, when all the others do, also
> gives you information. It means the ops are spread around and you need
> to make a bigger effort to go find them.
Again, absolutely not. This is your interpretation of having a prefix or
not. A chip has only one way to access a feature, not two. Since you
seem to be focused on the Rsvd2CPU feature, here's an example with it:
What's the point of writing this:
/* Consider the given MAC as MGMT */
int mv88e6xxx_reserve_mac(struct mv88e6xxx_chip *chip, u8 *addr)
{
if (mac_is_0x(addr)) {
if (chip->info->ops->g1_set_rsvd2cpu0)
return chip->info->ops->g1_set_rsvd2cpu0(...);
else if (chip->info->ops->g2_set_rsvd2cpu0)
return chip->info->ops->g2_set_rsvd2cpu0(...);
} else if (mac_is_2x(addr)) {
if (chip->info->ops->g1_set_rsvd2cpu2)
return chip->info->ops->g1_set_rsvd2cpu2(...);
else if (chip->info->ops->g2_set_rsvd2cpu2)
return chip->info->ops->g2_set_rsvd2cpu2(...);
}
return mv88e6xxx_atu_load(chip, addr, MGMT);
}
Compared to this:
/* Consider the given MAC as MGMT */
int mv88e6xxx_reserve_mac(struct mv88e6xxx_chip *chip, u8 *addr)
{
if (mac_is_0x(addr)) {
if (chip->info->ops->set_rsvd2cpu0)
return chip->info->ops->set_rsvd2cpu0(...);
} else if (mac_is_2x(addr)) {
if (chip->info->ops->set_rsvd2cpu2)
return chip->info->ops->set_rsvd2cpu2(...);
}
return mv88e6xxx_atu_load(chip, addr, MGMT);
}
Your higher level API (chip.c) doesn't need to know where is implemented
a given feature. It just needs to know if it supports it or not.
Thanks,
Vivien
^ permalink raw reply
* [PATCH] adm80211: add checks for dma mapping errors
From: Alexey Khoroshilov @ 2016-12-02 21:52 UTC (permalink / raw)
To: Kalle Valo
Cc: Alexey Khoroshilov, linux-wireless, netdev, linux-kernel,
ldv-project
The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
---
drivers/net/wireless/admtek/adm8211.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/admtek/adm8211.c b/drivers/net/wireless/admtek/adm8211.c
index 70ecd82d674d..2b4a3eb38dfa 100644
--- a/drivers/net/wireless/admtek/adm8211.c
+++ b/drivers/net/wireless/admtek/adm8211.c
@@ -413,6 +413,13 @@ static void adm8211_interrupt_rci(struct ieee80211_hw *dev)
skb_tail_pointer(newskb),
RX_PKT_SIZE,
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(priv->pdev,
+ priv->rx_buffers[entry].mapping)) {
+ priv->rx_buffers[entry].skb = NULL;
+ dev_kfree_skb(newskb);
+ skb = NULL;
+ /* TODO: update rx dropped stats */
+ }
} else {
skb = NULL;
/* TODO: update rx dropped stats */
@@ -1450,6 +1457,12 @@ static int adm8211_init_rings(struct ieee80211_hw *dev)
skb_tail_pointer(rx_info->skb),
RX_PKT_SIZE,
PCI_DMA_FROMDEVICE);
+ if (pci_dma_mapping_error(priv->pdev, rx_info->mapping)) {
+ dev_kfree_skb(rx_info->skb);
+ rx_info->skb = NULL;
+ break;
+ }
+
desc->buffer1 = cpu_to_le32(rx_info->mapping);
desc->status = cpu_to_le32(RDES0_STATUS_OWN | RDES0_STATUS_SQL);
}
@@ -1613,7 +1626,7 @@ static void adm8211_calc_durations(int *dur, int *plcp, size_t payload_len, int
}
/* Transmit skb w/adm8211_tx_hdr (802.11 header created by hardware) */
-static void adm8211_tx_raw(struct ieee80211_hw *dev, struct sk_buff *skb,
+static int adm8211_tx_raw(struct ieee80211_hw *dev, struct sk_buff *skb,
u16 plcp_signal,
size_t hdrlen)
{
@@ -1625,6 +1638,8 @@ static void adm8211_tx_raw(struct ieee80211_hw *dev, struct sk_buff *skb,
mapping = pci_map_single(priv->pdev, skb->data, skb->len,
PCI_DMA_TODEVICE);
+ if (pci_dma_mapping_error(priv->pdev, mapping))
+ return -ENOMEM;
spin_lock_irqsave(&priv->lock, flags);
@@ -1657,6 +1672,8 @@ static void adm8211_tx_raw(struct ieee80211_hw *dev, struct sk_buff *skb,
/* Trigger transmit poll */
ADM8211_CSR_WRITE(TDR, 0);
+
+ return 0;
}
/* Put adm8211_tx_hdr on skb and transmit */
@@ -1710,7 +1727,10 @@ static void adm8211_tx(struct ieee80211_hw *dev,
txhdr->retry_limit = info->control.rates[0].count;
- adm8211_tx_raw(dev, skb, plcp_signal, hdrlen);
+ if (adm8211_tx_raw(dev, skb, plcp_signal, hdrlen)) {
+ /* Drop packet */
+ ieee80211_free_txskb(dev, skb);
+ }
}
static int adm8211_alloc_rings(struct ieee80211_hw *dev)
--
2.7.4
^ permalink raw reply related
* Re: [PATCH next] dctcp: update cwnd on congestion event
From: Florian Westphal @ 2016-12-02 21:49 UTC (permalink / raw)
To: Neal Cardwell
Cc: Florian Westphal, Netdev, Lawrence Brakmo, Andrew Shewmaker,
Glenn Judd, Daniel Borkmann, Yuchung Cheng, Eric Dumazet,
Soheil Hassas Yeganeh
In-Reply-To: <CADVnQymNZ+FQ5xJ92HuSkheAJfOTUyh-PsA11bxRWERZkD5zdQ@mail.gmail.com>
Neal Cardwell <ncardwell@google.com> wrote:
> On Mon, Nov 14, 2016 at 10:42 AM, Florian Westphal <fw@strlen.de> wrote:
> >
> > draft-ietf-tcpm-dctcp-02 says:
> >
> > ... when the sender receives an indication of congestion
> > (ECE), the sender SHOULD update cwnd as follows:
> >
> > cwnd = cwnd * (1 - DCTCP.Alpha / 2)
> >
> > So, lets do this and reduce cwnd more smoothly (and faster), as per
> > current congestion estimate.
>
> AFAICT this is doing a multiplicative decrease of cwnd on every ACK
> that has an ECE bit.
>
> If I am reading the code correctly, then I would have two concerns:
>
> 1) Has that been tested? That seems like an extremely dramatic
> decrease in cwnd. For example, if the cwnd is 80, and there are 40
> ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
> calculations seem to suggest that after just 11 ACKs the cwnd would be
> down to a minimal value of 2:
>
> ack 1 cwnd=60
> ack 2 cwnd=45
> ack 3 cwnd=33
[..]
You are assuming alpha = 0.5?
Then, yes, looks correct. Since some of these acks will most likely
also end an observation window acks might also cause change to alpha.
> 2) That seems to contradict another passage in the draft (v 02 or 03). Consider
> https://tools.ietf.org/html/draft-ietf-tcpm-dctcp-03
> where it says
>
> Just as specified in [RFC3168], DCTCP does not react to congestion
> indications more than once for every window of data.
>
> So the draft seems to advocate not reacting to congestion indications
> more than once per window. Yet this patch reacts on every ECE-marked
> ACK within a window.
>
> Am I reading something incorrectly?
No, I will raise this on tcpm next monday (if you want you
can of course do this yourself).
Would be easy to make it so this cwnd update only happens once in each
observation cycle, but it would be even better if this would get input
from draft authors.
Thanks Neal!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox