* Re: [PATCH net-next] net: dsa: remove tag_protocol from dsa_switch
From: David Miller @ 2016-04-21 17:44 UTC (permalink / raw)
To: vivien.didelot; +Cc: netdev, linux-kernel, kernel, f.fainelli, andrew
In-Reply-To: <1461018244-5371-1-git-send-email-vivien.didelot@savoirfairelinux.com>
From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Mon, 18 Apr 2016 18:24:04 -0400
> Having the tag protocol in dsa_switch_driver for setup time and in
> dsa_switch_tree for runtime is enough. Remove dsa_switch's one.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Applied.
^ permalink raw reply
* Re: [PATCHv2 net] openvswitch: Orphan skbs before IPv6 defrag
From: David Miller @ 2016-04-21 17:42 UTC (permalink / raw)
To: joe; +Cc: netdev, fw, netfilter-devel, diproiettod, pablo
In-Reply-To: <1461016307-14098-1-git-send-email-joe@ovn.org>
From: Joe Stringer <joe@ovn.org>
Date: Mon, 18 Apr 2016 14:51:47 -0700
> This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always
> orphan skbs inside ip_defrag()").
>
> Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
> clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
> cloned (implicitly orphaning) prior to queueing for reassembly. As such,
> when the IPv6 message is eventually reassembled, the skb->sk for all
> fragments would be NULL. After that commit was introduced, rather than
> cloning, the original skbs were queued directly without orphaning. The
> end result is that all frags except for the first and last may have a
> socket attached.
>
> This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
> prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
...
> Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone
> operations")
> Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
> Signed-off-by: Joe Stringer <joe@ovn.org>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
From: Edward Cree @ 2016-04-21 17:24 UTC (permalink / raw)
To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers
In-Reply-To: <5716347D.3030808@solarflare.com>
On 19/04/16 14:37, Edward Cree wrote:
> Also involved adding a way to run a netfilter hook over a list of packets.
Turns out, this breaks the build if netfilter is *disabled*, because I forgot to
add a stub in that case. Next version of patch (if there is one) will have the
following under #ifndef CONFIG_NETFILTER:
static inline void
NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
struct sk_buff_head *list, struct sk_buff_head *sublist,
struct net_device *in, struct net_device *out,
int (*okfn)(struct net *, struct sock *, struct sk_buff *))
{
__skb_queue_head_init(sublist);
/* Move everything to the sublist */
skb_queue_splice_init(list, sublist);
}
-Ed
^ permalink raw reply
* [PATCH] ixgbevf: Fix relaxed order settings in VF driver
From: Babu Moger @ 2016-04-21 17:21 UTC (permalink / raw)
To: jeffrey.t.kirsher, jesse.brandeburg, shannon.nelson,
carolyn.wyborny, donald.c.skidmore, bruce.w.allan, john.ronciak,
mitch.a.williams
Cc: intel-wired-lan, netdev, linux-kernel, babu.moger,
sowmini.varadhan
Current code writes the tx/rx relaxed order without reading it first.
This can lead to unintended consequences as we are forcibly writing
other bits.
We noticed this problem while testing VF driver on sparc. Relaxed
order settings for rx queue were all messed up which was causing
performance drop with VF interface.
Fixed it by reading the registers first and setting the specific
bit of interest. With this change we are able to match the bandwidth
equivalent to PF interface.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 0ea14c0..51abff1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1545,6 +1545,7 @@ static inline void ixgbevf_irq_enable(struct ixgbevf_adapter *adapter)
static void ixgbevf_configure_tx_ring(struct ixgbevf_adapter *adapter,
struct ixgbevf_ring *ring)
{
+ u32 regval;
struct ixgbe_hw *hw = &adapter->hw;
u64 tdba = ring->dma;
int wait_loop = 10;
@@ -1565,8 +1566,10 @@ static void ixgbevf_configure_tx_ring(struct ixgbevf_adapter *adapter,
IXGBE_WRITE_REG(hw, IXGBE_VFTDWBAL(reg_idx), 0);
/* enable relaxed ordering */
+ regval = IXGBE_READ_REG(hw, IXGBE_VFDCA_TXCTRL(reg_idx));
+
IXGBE_WRITE_REG(hw, IXGBE_VFDCA_TXCTRL(reg_idx),
- (IXGBE_DCA_TXCTRL_DESC_RRO_EN |
+ (regval | IXGBE_DCA_TXCTRL_DESC_RRO_EN |
IXGBE_DCA_TXCTRL_DATA_RRO_EN));
/* reset head and tail pointers */
@@ -1734,6 +1737,7 @@ static void ixgbevf_setup_vfmrqc(struct ixgbevf_adapter *adapter)
static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter,
struct ixgbevf_ring *ring)
{
+ u32 regval;
struct ixgbe_hw *hw = &adapter->hw;
u64 rdba = ring->dma;
u32 rxdctl;
@@ -1749,8 +1753,9 @@ static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter,
ring->count * sizeof(union ixgbe_adv_rx_desc));
/* enable relaxed ordering */
+ regval = IXGBE_READ_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx));
IXGBE_WRITE_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx),
- IXGBE_DCA_RXCTRL_DESC_RRO_EN);
+ regval | IXGBE_DCA_RXCTRL_DESC_RRO_EN);
/* reset head and tail pointers */
IXGBE_WRITE_REG(hw, IXGBE_VFRDH(reg_idx), 0);
--
1.7.1
^ permalink raw reply related
* [PATCH net-next v2 2/4] rtnl: use the new API to align IFLA_STATS*
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
net/core/rtnetlink.c | 22 +++++-----------------
1 file changed, 5 insertions(+), 17 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4a47a9aceb1d..5ec059d52823 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1051,14 +1051,9 @@ static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb,
{
struct rtnl_link_stats64 *sp;
struct nlattr *attr;
- int err;
-
- err = nla_align_64bit(skb, IFLA_PAD);
- if (err)
- return err;
- attr = nla_reserve(skb, IFLA_STATS64,
- sizeof(struct rtnl_link_stats64));
+ attr = nla_reserve_64bit(skb, IFLA_STATS64,
+ sizeof(struct rtnl_link_stats64), IFLA_PAD);
if (!attr)
return -EMSGSIZE;
@@ -3469,17 +3464,10 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
struct rtnl_link_stats64 *sp;
- int err;
-
- /* if necessary, add a zero length NOP attribute so that
- * IFLA_STATS_LINK_64 will be 64-bit aligned
- */
- err = nla_align_64bit(skb, IFLA_STATS_UNSPEC);
- if (err)
- goto nla_put_failure;
- attr = nla_reserve(skb, IFLA_STATS_LINK_64,
- sizeof(struct rtnl_link_stats64));
+ attr = nla_reserve_64bit(skb, IFLA_STATS_LINK_64,
+ sizeof(struct rtnl_link_stats64),
+ IFLA_STATS_UNSPEC);
if (!attr)
goto nla_put_failure;
--
2.4.2
^ permalink raw reply related
* [PATCH net-next v2 4/4] ip6mr: align RTA_MFC_STATS on 64-bit
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
net/ipv6/ip6mr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index a10e77103c88..bf678324fd52 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2268,7 +2268,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
mfcs.mfcs_packets = c->mfc_un.res.pkt;
mfcs.mfcs_bytes = c->mfc_un.res.bytes;
mfcs.mfcs_wrong_if = c->mfc_un.res.wrong_if;
- if (nla_put(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs) < 0)
+ if (nla_put_64bit(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs, RTA_PAD) < 0)
return -EMSGSIZE;
rtm->rtm_type = RTN_MULTICAST;
@@ -2411,7 +2411,7 @@ static int mr6_msgsize(bool unresolved, int maxvif)
+ nla_total_size(0) /* RTA_MULTIPATH */
+ maxvif * NLA_ALIGN(sizeof(struct rtnexthop))
/* RTA_MFC_STATS */
- + nla_total_size(sizeof(struct rta_mfc_stats))
+ + nla_total_size_64bit(sizeof(struct rta_mfc_stats))
;
return len;
--
2.4.2
^ permalink raw reply related
* [PATCH net-next v2 3/4] ipmr: align RTA_MFC_STATS on 64-bit
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/uapi/linux/rtnetlink.h | 1 +
net/ipv4/ipmr.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cc885c4e9065..a94e0b69c769 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -317,6 +317,7 @@ enum rtattr_type_t {
RTA_ENCAP_TYPE,
RTA_ENCAP,
RTA_EXPIRES,
+ RTA_PAD,
__RTA_MAX
};
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 395e2814a46d..21a38e296fe2 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2104,7 +2104,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
mfcs.mfcs_packets = c->mfc_un.res.pkt;
mfcs.mfcs_bytes = c->mfc_un.res.bytes;
mfcs.mfcs_wrong_if = c->mfc_un.res.wrong_if;
- if (nla_put(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs) < 0)
+ if (nla_put_64bit(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs, RTA_PAD) < 0)
return -EMSGSIZE;
rtm->rtm_type = RTN_MULTICAST;
@@ -2237,7 +2237,7 @@ static size_t mroute_msgsize(bool unresolved, int maxvif)
+ nla_total_size(0) /* RTA_MULTIPATH */
+ maxvif * NLA_ALIGN(sizeof(struct rtnexthop))
/* RTA_MFC_STATS */
- + nla_total_size(sizeof(struct rta_mfc_stats))
+ + nla_total_size_64bit(sizeof(struct rta_mfc_stats))
;
return len;
--
2.4.2
^ permalink raw reply related
* [PATCH net-next v2 1/4] libnl: add more helpers to align attributes on 64-bit
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/netlink.h | 39 +++++++++++++++-----
lib/nlattr.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 130 insertions(+), 8 deletions(-)
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 3c1fd92a52c8..6f51a8a06498 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -244,13 +244,21 @@ int nla_memcpy(void *dest, const struct nlattr *src, int count);
int nla_memcmp(const struct nlattr *nla, const void *data, size_t size);
int nla_strcmp(const struct nlattr *nla, const char *str);
struct nlattr *__nla_reserve(struct sk_buff *skb, int attrtype, int attrlen);
+struct nlattr *__nla_reserve_64bit(struct sk_buff *skb, int attrtype,
+ int attrlen, int padattr);
void *__nla_reserve_nohdr(struct sk_buff *skb, int attrlen);
struct nlattr *nla_reserve(struct sk_buff *skb, int attrtype, int attrlen);
+struct nlattr *nla_reserve_64bit(struct sk_buff *skb, int attrtype,
+ int attrlen, int padattr);
void *nla_reserve_nohdr(struct sk_buff *skb, int attrlen);
void __nla_put(struct sk_buff *skb, int attrtype, int attrlen,
const void *data);
+void __nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+ const void *data, int padattr);
void __nla_put_nohdr(struct sk_buff *skb, int attrlen, const void *data);
int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data);
+int nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+ const void *data, int padattr);
int nla_put_nohdr(struct sk_buff *skb, int attrlen, const void *data);
int nla_append(struct sk_buff *skb, int attrlen, const void *data);
@@ -1231,6 +1239,27 @@ static inline int nla_validate_nested(const struct nlattr *start, int maxtype,
}
/**
+ * nla_need_padding_for_64bit - test 64-bit alignment of the next attribute
+ * @skb: socket buffer the message is stored in
+ *
+ * Return true if padding is needed to align the next attribute (nla_data()) to
+ * a 64-bit aligned area.
+ */
+static inline bool nla_need_padding_for_64bit(struct sk_buff *skb)
+{
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+ /* The nlattr header is 4 bytes in size, that's why we test
+ * if the skb->data _is_ aligned. A NOP attribute, plus
+ * nlattr header for next attribute, will make nla_data()
+ * 8-byte aligned.
+ */
+ if (IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8))
+ return true;
+#endif
+ return false;
+}
+
+/**
* nla_align_64bit - 64-bit align the nla_data() of next attribute
* @skb: socket buffer the message is stored in
* @padattr: attribute type for the padding
@@ -1244,16 +1273,10 @@ static inline int nla_validate_nested(const struct nlattr *start, int maxtype,
*/
static inline int nla_align_64bit(struct sk_buff *skb, int padattr)
{
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- /* The nlattr header is 4 bytes in size, that's why we test
- * if the skb->data _is_ aligned. This NOP attribute, plus
- * nlattr header for next attribute, will make nla_data()
- * 8-byte aligned.
- */
- if (IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) &&
+ if (nla_need_padding_for_64bit(skb) &&
!nla_reserve(skb, padattr, 0))
return -EMSGSIZE;
-#endif
+
return 0;
}
diff --git a/lib/nlattr.c b/lib/nlattr.c
index f5907d23272d..2b82f1e2ebc2 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -355,6 +355,29 @@ struct nlattr *__nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
EXPORT_SYMBOL(__nla_reserve);
/**
+ * __nla_reserve_64bit - reserve room for attribute on the skb and align it
+ * @skb: socket buffer to reserve room on
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ *
+ * Adds a netlink attribute header to a socket buffer and reserves
+ * room for the payload but does not copy it. It also ensure that this
+ * attribute will be 64-bit aign.
+ *
+ * The caller is responsible to ensure that the skb provides enough
+ * tailroom for the attribute header and payload.
+ */
+struct nlattr *__nla_reserve_64bit(struct sk_buff *skb, int attrtype,
+ int attrlen, int padattr)
+{
+ if (nla_need_padding_for_64bit(skb))
+ nla_align_64bit(skb, padattr);
+
+ return __nla_reserve(skb, attrtype, attrlen);
+}
+EXPORT_SYMBOL(__nla_reserve_64bit);
+
+/**
* __nla_reserve_nohdr - reserve room for attribute without header
* @skb: socket buffer to reserve room on
* @attrlen: length of attribute payload
@@ -397,6 +420,35 @@ struct nlattr *nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
EXPORT_SYMBOL(nla_reserve);
/**
+ * nla_reserve_64bit - reserve room for attribute on the skb and align it
+ * @skb: socket buffer to reserve room on
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ *
+ * Adds a netlink attribute header to a socket buffer and reserves
+ * room for the payload but does not copy it. It also ensure that this
+ * attribute will be 64-bit aign.
+ *
+ * Returns NULL if the tailroom of the skb is insufficient to store
+ * the attribute header and payload.
+ */
+struct nlattr *nla_reserve_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+ int padattr)
+{
+ size_t len;
+
+ if (nla_need_padding_for_64bit(skb))
+ len = nla_total_size_64bit(attrlen);
+ else
+ len = nla_total_size(attrlen);
+ if (unlikely(skb_tailroom(skb) < len))
+ return NULL;
+
+ return __nla_reserve_64bit(skb, attrtype, attrlen, padattr);
+}
+EXPORT_SYMBOL(nla_reserve_64bit);
+
+/**
* nla_reserve_nohdr - reserve room for attribute without header
* @skb: socket buffer to reserve room on
* @attrlen: length of attribute payload
@@ -436,6 +488,26 @@ void __nla_put(struct sk_buff *skb, int attrtype, int attrlen,
EXPORT_SYMBOL(__nla_put);
/**
+ * __nla_put_64bit - Add a netlink attribute to a socket buffer and align it
+ * @skb: socket buffer to add attribute to
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ * @data: head of attribute payload
+ *
+ * The caller is responsible to ensure that the skb provides enough
+ * tailroom for the attribute header and payload.
+ */
+void __nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+ const void *data, int padattr)
+{
+ struct nlattr *nla;
+
+ nla = __nla_reserve_64bit(skb, attrtype, attrlen, padattr);
+ memcpy(nla_data(nla), data, attrlen);
+}
+EXPORT_SYMBOL(__nla_put_64bit);
+
+/**
* __nla_put_nohdr - Add a netlink attribute without header
* @skb: socket buffer to add attribute to
* @attrlen: length of attribute payload
@@ -474,6 +546,33 @@ int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
EXPORT_SYMBOL(nla_put);
/**
+ * nla_put_64bit - Add a netlink attribute to a socket buffer and align it
+ * @skb: socket buffer to add attribute to
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ * @data: head of attribute payload
+ *
+ * Returns -EMSGSIZE if the tailroom of the skb is insufficient to store
+ * the attribute header and payload.
+ */
+int nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+ const void *data, int padattr)
+{
+ size_t len;
+
+ if (nla_need_padding_for_64bit(skb))
+ len = nla_total_size_64bit(attrlen);
+ else
+ len = nla_total_size(attrlen);
+ if (unlikely(skb_tailroom(skb) < len))
+ return -EMSGSIZE;
+
+ __nla_put_64bit(skb, attrtype, attrlen, data, padattr);
+ return 0;
+}
+EXPORT_SYMBOL(nla_put_64bit);
+
+/**
* nla_put_nohdr - Add a netlink attribute without header
* @skb: socket buffer to add attribute to
* @attrlen: length of attribute payload
--
2.4.2
^ permalink raw reply related
* [PATCH net-next v2 0/4] libnl: enhance API to ease 64bit alignment for attribute
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs
In-Reply-To: <1461142655-5067-1-git-send-email-nicolas.dichtel@6wind.com>
Here is a proposal to add more helpers in the libnetlink to manage 64-bit
alignment issues.
Note that this series was only tested on x86 by tweeking
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.
The first patch adds helpers for 64bit alignment and other patches
use them.
We could also add helpers for nla_put_u64() and its variants if needed.
v1 -> v2:
- remove patch #1
- split patch #2 (now #1 and #2)
- add nla_need_padding_for_64bit()
include/net/netlink.h | 39 +++++++++++++----
include/uapi/linux/rtnetlink.h | 1 +
lib/nlattr.c | 99 ++++++++++++++++++++++++++++++++++++++++++
net/core/rtnetlink.c | 22 +++-------
net/ipv4/ipmr.c | 4 +-
net/ipv6/ip6mr.c | 4 +-
6 files changed, 140 insertions(+), 29 deletions(-)
Comments are welcomed,
Regards,
Nicolas
^ permalink raw reply
* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb
From: Martin KaFai Lau @ 2016-04-21 16:56 UTC (permalink / raw)
To: Soheil Hassas Yeganeh
Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
Yuchung Cheng, Kernel Team
In-Reply-To: <CACSApvbyD7wFy+y2MrShKRMCTg-zF6UkP-Le867N=ssCYa-Yng@mail.gmail.com>
On Wed, Apr 20, 2016 at 04:04:54PM -0400, Soheil Hassas Yeganeh wrote:
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index a6e4a83..96bdf98 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
> > * packet counting does not break.
> > */
> > TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS;
> > + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor;
> >
> > /* changed transmit queue under us so clear hints */
> > tcp_clear_retrans_hints_partial(tp);
> > @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
> > if (!tcp_can_collapse(sk, skb))
> > break;
> >
> > + if (TCP_SKB_CB(to)->eor)
> > + break;
> > +
>
> nit: Perhaps a better place to check for eor is right after entering
> the loop? to skip a few instructions and tcp_can_collapse, in an
> unlikely case eor is set.
hmm... Not sure I understand it.
You meant moving the unlikely case before (or after?) the more likely
cases which may have a better chance to break the loop sooner?
^ permalink raw reply
* [RFC PATCH v3 2/2] ppp: add rtnetlink device creation support
From: Guillaume Nault @ 2016-04-21 16:22 UTC (permalink / raw)
To: netdev
Cc: linux-ppp, David Miller, Paul Mackerras, Stephen Hemminger,
walter harms
In-Reply-To: <cover.1461251392.git.g.nault@alphalink.fr>
Define PPP device handler for use with rtnetlink.
The only PPP specific attribute is IFLA_PPP_DEV_FD. It is mandatory and
contains the file descriptor of the associated /dev/ppp instance (the
file descriptor which would have been used for ioctl(PPPIOCNEWUNIT) in
the ioctl-based API). The PPP device is removed when this file
descriptor is released (same behaviour as with ioctl based PPP
devices).
PPP devices created with the rtnetlink API behave like the ones created
with ioctl(PPPIOCNEWUNIT). In particular existing ioctls work the same
way, no matter how the PPP device was created.
The rtnl callbacks are also assigned to ioctl based PPP devices. This
way, rtnl messages have the same effect on any PPP devices.
The immediate effect is that all PPP devices, even ioctl-based
ones, can now be removed with "ip link del".
A minor difference still exists between ioctl and rtnl based PPP
interfaces: in the device name, the number following the "ppp" prefix
corresponds to the PPP unit number for ioctl based devices, while it is
just an unrelated incrementing index for rtnl ones.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
drivers/net/ppp/ppp_generic.c | 115 ++++++++++++++++++++++++++++++++++++++++--
include/uapi/linux/if_link.h | 8 +++
2 files changed, 120 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index f581a77..8b3db3a 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -46,6 +46,7 @@
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/slab.h>
+#include <linux/file.h>
#include <asm/unaligned.h>
#include <net/slhc_vj.h>
#include <linux/atomic.h>
@@ -186,6 +187,7 @@ struct channel {
struct ppp_config {
struct file *file;
s32 unit;
+ bool ifname_is_set;
};
/*
@@ -286,6 +288,7 @@ static int unit_get(struct idr *p, void *ptr);
static int unit_set(struct idr *p, void *ptr, int n);
static void unit_put(struct idr *p, int n);
static void *unit_find(struct idr *p, int n);
+static void ppp_setup(struct net_device *dev);
static const struct net_device_ops ppp_netdev_ops;
@@ -964,7 +967,7 @@ static struct pernet_operations ppp_net_ops = {
.size = sizeof(struct ppp_net),
};
-static int ppp_unit_register(struct ppp *ppp, int unit)
+static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
{
struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
int ret;
@@ -994,7 +997,8 @@ static int ppp_unit_register(struct ppp *ppp, int unit)
}
ppp->file.index = ret;
- snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+ if (!ifname_is_set)
+ snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
ret = register_netdevice(ppp->dev);
if (ret < 0)
@@ -1043,7 +1047,7 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
ppp->active_filter = NULL;
#endif /* CONFIG_PPP_FILTER */
- err = ppp_unit_register(ppp, conf->unit);
+ err = ppp_unit_register(ppp, conf->unit, conf->ifname_is_set);
if (err < 0)
return err;
@@ -1052,6 +1056,99 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
return 0;
}
+static const struct nla_policy ppp_nl_policy[IFLA_PPP_MAX + 1] = {
+ [IFLA_PPP_DEV_FD] = { .type = NLA_S32 },
+};
+
+static int ppp_nl_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+ if (!data)
+ return -EINVAL;
+
+ if (!data[IFLA_PPP_DEV_FD])
+ return -EINVAL;
+ if (nla_get_s32(data[IFLA_PPP_DEV_FD]) < 0)
+ return -EBADF;
+
+ return 0;
+}
+
+static int ppp_nl_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+ struct ppp_config conf = {
+ .unit = -1,
+ .ifname_is_set = true,
+ };
+ struct file *file;
+ int err;
+
+ file = fget(nla_get_s32(data[IFLA_PPP_DEV_FD]));
+ if (!file)
+ return -EBADF;
+
+ /* rtnl_lock is already held here, but ppp_create_interface() locks
+ * ppp_mutex before holding rtnl_lock. Using mutex_trylock() avoids
+ * possible deadlock due to lock order inversion, at the cost of
+ * pushing the problem back to userspace.
+ */
+ if (!mutex_trylock(&ppp_mutex)) {
+ err = -EBUSY;
+ goto out;
+ }
+
+ if (file->f_op != &ppp_device_fops || file->private_data) {
+ err = -EBADF;
+ goto out_unlock;
+ }
+
+ conf.file = file;
+ err = ppp_dev_configure(src_net, dev, &conf);
+
+out_unlock:
+ mutex_unlock(&ppp_mutex);
+out:
+ fput(file);
+
+ return err;
+}
+
+static void ppp_nl_dellink(struct net_device *dev, struct list_head *head)
+{
+ unregister_netdevice_queue(dev, head);
+}
+
+static size_t ppp_nl_get_size(const struct net_device *dev)
+{
+ return 0;
+}
+
+static int ppp_nl_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+ return 0;
+}
+
+static struct net *ppp_nl_get_link_net(const struct net_device *dev)
+{
+ struct ppp *ppp = netdev_priv(dev);
+
+ return ppp->ppp_net;
+}
+
+static struct rtnl_link_ops ppp_link_ops __read_mostly = {
+ .kind = "ppp",
+ .maxtype = IFLA_PPP_MAX,
+ .policy = ppp_nl_policy,
+ .priv_size = sizeof(struct ppp),
+ .setup = ppp_setup,
+ .validate = ppp_nl_validate,
+ .newlink = ppp_nl_newlink,
+ .dellink = ppp_nl_dellink,
+ .get_size = ppp_nl_get_size,
+ .fill_info = ppp_nl_fill_info,
+ .get_link_net = ppp_nl_get_link_net,
+};
+
#define PPP_MAJOR 108
/* Called at boot time if ppp is compiled into the kernel,
@@ -1080,11 +1177,19 @@ static int __init ppp_init(void)
goto out_chrdev;
}
+ err = rtnl_link_register(&ppp_link_ops);
+ if (err) {
+ pr_err("failed to register rtnetlink PPP handler\n");
+ goto out_class;
+ }
+
/* not a big deal if we fail here :-) */
device_create(ppp_class, NULL, MKDEV(PPP_MAJOR, 0), NULL, "ppp");
return 0;
+out_class:
+ class_destroy(ppp_class);
out_chrdev:
unregister_chrdev(PPP_MAJOR, "ppp");
out_net:
@@ -2829,6 +2934,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
struct ppp_config conf = {
.file = file,
.unit = *unit,
+ .ifname_is_set = false,
};
struct ppp *ppp;
struct net_device *dev;
@@ -2840,6 +2946,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
goto err;
}
dev_net_set(dev, net);
+ dev->rtnl_link_ops = &ppp_link_ops;
rtnl_lock();
@@ -3046,6 +3153,7 @@ static void __exit ppp_cleanup(void)
/* should never happen */
if (atomic_read(&ppp_unit_count) || atomic_read(&channel_count))
pr_err("PPP: removing module but units remain!\n");
+ rtnl_link_unregister(&ppp_link_ops);
unregister_chrdev(PPP_MAJOR, "ppp");
device_destroy(ppp_class, MKDEV(PPP_MAJOR, 0));
class_destroy(ppp_class);
@@ -3104,4 +3212,5 @@ EXPORT_SYMBOL(ppp_register_compressor);
EXPORT_SYMBOL(ppp_unregister_compressor);
MODULE_LICENSE("GPL");
MODULE_ALIAS_CHARDEV(PPP_MAJOR, 0);
+MODULE_ALIAS_RTNL_LINK("ppp");
MODULE_ALIAS("devname:ppp");
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ba69d44..6b2ec17 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -517,6 +517,14 @@ enum {
};
#define IFLA_GENEVE_MAX (__IFLA_GENEVE_MAX - 1)
+/* PPP section */
+enum {
+ IFLA_PPP_UNSPEC,
+ IFLA_PPP_DEV_FD,
+ __IFLA_PPP_MAX
+};
+#define IFLA_PPP_MAX (__IFLA_PPP_MAX - 1)
+
/* Bonding section */
enum {
--
2.8.1
^ permalink raw reply related
* [RFC PATCH v3 1/2] ppp: define reusable device creation functions
From: Guillaume Nault @ 2016-04-21 16:22 UTC (permalink / raw)
To: netdev
Cc: linux-ppp, David Miller, Paul Mackerras, Stephen Hemminger,
walter harms
In-Reply-To: <cover.1461251392.git.g.nault@alphalink.fr>
Move PPP device initialisation and registration out of
ppp_create_interface().
This prepares code for device registration with rtnetlink.
While there, simplify the prototype of ppp_create_interface():
* Since ppp_dev_configure() takes care of setting file->private_data,
there's no need to return a ppp structure to ppp_unattached_ioctl()
anymore.
* The unit parameter is made read/write so that ppp_create_interface()
can tell which unit number has been assigned.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
drivers/net/ppp/ppp_generic.c | 206 ++++++++++++++++++++++++------------------
1 file changed, 118 insertions(+), 88 deletions(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index f572b31..f581a77 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -183,6 +183,11 @@ struct channel {
#endif /* CONFIG_PPP_MULTILINK */
};
+struct ppp_config {
+ struct file *file;
+ s32 unit;
+};
+
/*
* SMP locking issues:
* Both the ppp.rlock and ppp.wlock locks protect the ppp.channels
@@ -269,8 +274,7 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
static void ppp_ccp_closed(struct ppp *ppp);
static struct compressor *find_compressor(int type);
static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st);
-static struct ppp *ppp_create_interface(struct net *net, int unit,
- struct file *file, int *retp);
+static int ppp_create_interface(struct net *net, struct file *file, int *unit);
static void init_ppp_file(struct ppp_file *pf, int kind);
static void ppp_destroy_interface(struct ppp *ppp);
static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit);
@@ -853,12 +857,12 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
/* Create a new ppp unit */
if (get_user(unit, p))
break;
- ppp = ppp_create_interface(net, unit, file, &err);
- if (!ppp)
+ err = ppp_create_interface(net, file, &unit);
+ if (err < 0)
break;
- file->private_data = &ppp->file;
+
err = -EFAULT;
- if (put_user(ppp->file.index, p))
+ if (put_user(unit, p))
break;
err = 0;
break;
@@ -960,6 +964,94 @@ static struct pernet_operations ppp_net_ops = {
.size = sizeof(struct ppp_net),
};
+static int ppp_unit_register(struct ppp *ppp, int unit)
+{
+ struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
+ int ret;
+
+ mutex_lock(&pn->all_ppp_mutex);
+
+ if (unit < 0) {
+ ret = unit_get(&pn->units_idr, ppp);
+ if (ret < 0)
+ goto err;
+ } else {
+ /* Caller asked for a specific unit number. Fail with -EEXIST
+ * if unavailable. For backward compatibility, return -EEXIST
+ * too if idr allocation fails; this makes pppd retry without
+ * requesting a specific unit number.
+ */
+ if (unit_find(&pn->units_idr, unit)) {
+ ret = -EEXIST;
+ goto err;
+ }
+ ret = unit_set(&pn->units_idr, ppp, unit);
+ if (ret < 0) {
+ /* Rewrite error for backward compatibility */
+ ret = -EEXIST;
+ goto err;
+ }
+ }
+ ppp->file.index = ret;
+
+ snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+
+ ret = register_netdevice(ppp->dev);
+ if (ret < 0)
+ goto err_unit;
+
+ atomic_inc(&ppp_unit_count);
+
+ mutex_unlock(&pn->all_ppp_mutex);
+
+ return 0;
+
+err_unit:
+ unit_put(&pn->units_idr, ppp->file.index);
+err:
+ mutex_unlock(&pn->all_ppp_mutex);
+
+ return ret;
+}
+
+static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
+ const struct ppp_config *conf)
+{
+ struct ppp *ppp = netdev_priv(dev);
+ int indx;
+ int err;
+
+ ppp->dev = dev;
+ ppp->ppp_net = src_net;
+ ppp->mru = PPP_MRU;
+ ppp->owner = conf->file;
+
+ init_ppp_file(&ppp->file, INTERFACE);
+ ppp->file.hdrlen = PPP_HDRLEN - 2; /* don't count proto bytes */
+
+ for (indx = 0; indx < NUM_NP; ++indx)
+ ppp->npmode[indx] = NPMODE_PASS;
+ INIT_LIST_HEAD(&ppp->channels);
+ spin_lock_init(&ppp->rlock);
+ spin_lock_init(&ppp->wlock);
+#ifdef CONFIG_PPP_MULTILINK
+ ppp->minseq = -1;
+ skb_queue_head_init(&ppp->mrq);
+#endif /* CONFIG_PPP_MULTILINK */
+#ifdef CONFIG_PPP_FILTER
+ ppp->pass_filter = NULL;
+ ppp->active_filter = NULL;
+#endif /* CONFIG_PPP_FILTER */
+
+ err = ppp_unit_register(ppp, conf->unit);
+ if (err < 0)
+ return err;
+
+ conf->file->private_data = &ppp->file;
+
+ return 0;
+}
+
#define PPP_MAJOR 108
/* Called at boot time if ppp is compiled into the kernel,
@@ -2732,102 +2824,40 @@ ppp_get_stats(struct ppp *ppp, struct ppp_stats *st)
* or if there is already a unit with the requested number.
* unit == -1 means allocate a new number.
*/
-static struct ppp *ppp_create_interface(struct net *net, int unit,
- struct file *file, int *retp)
+static int ppp_create_interface(struct net *net, struct file *file, int *unit)
{
+ struct ppp_config conf = {
+ .file = file,
+ .unit = *unit,
+ };
struct ppp *ppp;
- struct ppp_net *pn;
- struct net_device *dev = NULL;
- int ret = -ENOMEM;
- int i;
+ struct net_device *dev;
+ int err;
dev = alloc_netdev(sizeof(struct ppp), "", NET_NAME_ENUM, ppp_setup);
- if (!dev)
- goto out1;
-
- pn = ppp_pernet(net);
-
- ppp = netdev_priv(dev);
- ppp->dev = dev;
- ppp->mru = PPP_MRU;
- init_ppp_file(&ppp->file, INTERFACE);
- ppp->file.hdrlen = PPP_HDRLEN - 2; /* don't count proto bytes */
- ppp->owner = file;
- for (i = 0; i < NUM_NP; ++i)
- ppp->npmode[i] = NPMODE_PASS;
- INIT_LIST_HEAD(&ppp->channels);
- spin_lock_init(&ppp->rlock);
- spin_lock_init(&ppp->wlock);
-#ifdef CONFIG_PPP_MULTILINK
- ppp->minseq = -1;
- skb_queue_head_init(&ppp->mrq);
-#endif /* CONFIG_PPP_MULTILINK */
-#ifdef CONFIG_PPP_FILTER
- ppp->pass_filter = NULL;
- ppp->active_filter = NULL;
-#endif /* CONFIG_PPP_FILTER */
-
- /*
- * drum roll: don't forget to set
- * the net device is belong to
- */
+ if (!dev) {
+ err = -ENOMEM;
+ goto err;
+ }
dev_net_set(dev, net);
rtnl_lock();
- mutex_lock(&pn->all_ppp_mutex);
-
- if (unit < 0) {
- unit = unit_get(&pn->units_idr, ppp);
- if (unit < 0) {
- ret = unit;
- goto out2;
- }
- } else {
- ret = -EEXIST;
- if (unit_find(&pn->units_idr, unit))
- goto out2; /* unit already exists */
- /*
- * if caller need a specified unit number
- * lets try to satisfy him, otherwise --
- * he should better ask us for new unit number
- *
- * NOTE: yes I know that returning EEXIST it's not
- * fair but at least pppd will ask us to allocate
- * new unit in this case so user is happy :)
- */
- unit = unit_set(&pn->units_idr, ppp, unit);
- if (unit < 0)
- goto out2;
- }
-
- /* Initialize the new ppp unit */
- ppp->file.index = unit;
- sprintf(dev->name, "ppp%d", unit);
- ret = register_netdevice(dev);
- if (ret != 0) {
- unit_put(&pn->units_idr, unit);
- netdev_err(ppp->dev, "PPP: couldn't register device %s (%d)\n",
- dev->name, ret);
- goto out2;
- }
-
- ppp->ppp_net = net;
+ err = ppp_dev_configure(net, dev, &conf);
+ if (err < 0)
+ goto err_dev;
+ ppp = netdev_priv(dev);
+ *unit = ppp->file.index;
- atomic_inc(&ppp_unit_count);
- mutex_unlock(&pn->all_ppp_mutex);
rtnl_unlock();
- *retp = 0;
- return ppp;
+ return 0;
-out2:
- mutex_unlock(&pn->all_ppp_mutex);
+err_dev:
rtnl_unlock();
free_netdev(dev);
-out1:
- *retp = ret;
- return NULL;
+err:
+ return err;
}
/*
--
2.8.1
^ permalink raw reply related
* [RFC PATCH v3 0/2] ppp: add rtnetlink support
From: Guillaume Nault @ 2016-04-21 16:22 UTC (permalink / raw)
To: netdev
Cc: linux-ppp, David Miller, Paul Mackerras, Stephen Hemminger,
walter harms
PPP devices lack the ability to be customised at creation time. In
particular they can't be created in a given netns or with a particular
name. Moving or renaming the device after creation is possible, but
creates undesirable transient effects on servers where PPP devices are
constantly created and removed, as users connect and disconnect.
Implementing rtnetlink support solves this problem.
The rtnetlink handlers implemented in this series are minimal, and can
only replace the PPPIOCNEWUNIT ioctl. The rest of PPP ioctls remains
necessary for any other operation on channels and units.
It is perfectly possible to mix PPP devices created by rtnl
and by ioctl(PPPIOCNEWUNIT). Devices will behave in the same way.
If necessary, rtnetlink support could be extended to provide some of
the functionalities brought by ppp_net_ioctl() and ppp_ioctl(). This
would let external programs, like "ip link", set or retrieve PPP device
information. However, I haven't made my mind on the usefulness of this
approach, so this isn't implemented in this series.
This series doesn't try to invert lock ordering between ppp_mutex and
rtnl_lock. mutex_trylock() is used instead, which greatly simplifies
things.
A user visible difference brought by this series is that old PPP
interfaces (those created with ioctl(PPPIOCNEWUNIT)), can now be
removed by "ip link del", just like new rtnl based PPP devices.
Changes since v2:
- Define ->rtnl_link_ops for ioctl based PPP devices, so they can
handle rtnl messages just like rtnl based ones (suggested by
Stephen Hemminger).
- Move back to original lock ordering between ppp_mutex and rtnl_lock
to simplify patch series. Handle lock inversion issue using
mutex_trylock() (suggested by Stephen Hemminger).
- Do file descriptor lookup directly in ppp_nl_newlink(), to simplify
ppp_dev_configure().
Changes since v1:
- Rebase on net-next.
- Invert locking order wrt. ppp_mutex and rtnl_lock and protect
file->private_data with ppp_mutex.
Guillaume Nault (2):
ppp: define reusable device creation functions
ppp: add rtnetlink device creation support
drivers/net/ppp/ppp_generic.c | 315 ++++++++++++++++++++++++++++++------------
include/uapi/linux/if_link.h | 8 ++
2 files changed, 235 insertions(+), 88 deletions(-)
--
2.8.1
^ permalink raw reply
* Re: [net-next 00/17][pull request] 100GbE Intel Wired LAN Driver Updates 2016-04-20
From: David Miller @ 2016-04-21 16:21 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene, john.ronciak
In-Reply-To: <1461218969-68578-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 20 Apr 2016 23:09:12 -0700
> This series contains updates to fm10k only.
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
From: Alexander Duyck @ 2016-04-21 16:02 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Eric Dumazet, Sowmini Varadhan, Netdev
In-Reply-To: <20160421074042.GF3347@gauss.secunet.com>
On Thu, Apr 21, 2016 at 12:40 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> This partly reverts the below mentioned patch because on
> forwarding, such skbs can't be offloaded to a NIC.
>
> We need this to get IPsec GRO for forwarding to work properly,
> otherwise the GRO aggregated packets get segmented again by
> the GSO layer. Although discovered when implementing IPsec GRO,
> this is a general problem in the forwarding path.
I'm confused as to why you would need this to get IPsec GRO forwarding
to work. Are you having to go through a device that doesn't have
NETIF_F_FRAGLIST defined? Also what is the issue with having to go
through the GSO layer on segmentation? It seems like we might be able
to do something like what we did with GSO partial to split frames so
that they are in chunks that wouldn't require NETIF_F_FRAGLIST. Then
you could get the best of both worlds in that the stack would only
process one super-frame, and the transmitter could TSO a series of
frames that are some fixed MSS in size.
- Alex
^ permalink raw reply
* Re: [PATCH net-next 0/7] net: network drivers should not depend on geneve/vxlan
From: David Miller @ 2016-04-21 15:53 UTC (permalink / raw)
To: hannes; +Cc: netdev, jesse
In-Reply-To: <1461224591.4101216.585253121.498EBE7C@webmail.messagingengine.com>
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 21 Apr 2016 09:43:11 +0200
> could you look at this again? If you have further feedback I am happy to
> incooperate those.
I will.
^ permalink raw reply
* Re: [PATCH V2] net: stmmac: socfpga: Remove re-registration of reset controller
From: Dinh Nguyen @ 2016-04-21 14:52 UTC (permalink / raw)
To: Marek Vasut, netdev
Cc: peppe.cavallaro, alexandre.torgue, Matthew Gerlach,
David S . Miller
In-Reply-To: <5718BEDB.4010303@denx.de>
On 04/21/2016 06:51 AM, Marek Vasut wrote:
>>>>
>>>> if you modify the patch to call stmmac_dvr_probe() before calling
>>>> socfpga_dwmac_init(), then you would already have the reset control
>>>> information.
>>>
>>> I was under the impression that you must call socfpga_dwmac_init()
>>> before stmmac_dvr_probe() for whatever hardware-related reason. If
>>> you are absolutely certain this is not necessary, then that's just
>>> perfect and the patch can be simplified even further -- just remove
>>> the call to socfpga_dwmac_init() from probe altogether , the dwmac
>>> core code will call plat_dat->init at the end of probe .
>>>
>>> So shall we do that ? I am happy to spin V3 like that if you confirm
>>> that it's legal to do things in the aforementioned order.
>>>
>>
>> AFAICT, I don't see any reason why the socfpga_dwmac_init() has to go
>> before the stmmac_dvr_probe(). I tested this by using U-Boot to put the
>> PHY modes in the system manager into a different mode, and put the
>> ethernet IP into reset. Linux was able to use ethernet just fine with
>> the aformentioned order.
>
> Ha, the dwmac code does not call the ->init, so you have to call it from
> the socfpga-dwmac probe afterall. I will send a V3 now and do
> it just like you suggested. I will send a RFC for calling ->init and
> ->exit from the stmmac common code too.
>
Yes, I saw this too. Was going to wait until after this patch to
follow-up. Looks like you already did! Thanks!
Dinh
^ permalink raw reply
* Re: [PATCH V3] net: stmmac: socfpga: Remove re-registration of reset controller
From: Dinh Nguyen @ 2016-04-21 14:39 UTC (permalink / raw)
To: Marek Vasut, netdev
Cc: peppe.cavallaro, alexandre.torgue, Matthew Gerlach,
David S . Miller
In-Reply-To: <1461240710-5381-1-git-send-email-marex@denx.de>
On 04/21/2016 07:11 AM, Marek Vasut wrote:
> Both socfpga_dwmac_parse_data() in dwmac-socfpga.c and stmmac_dvr_probe()
> in stmmac_main.c functions call devm_reset_control_get() to register an
> reset controller for the stmmac. This results in an attempt to register
> two reset controllers for the same non-shared reset line.
>
> The first attempt to register the reset controller works fine. The second
> attempt fails with warning from the reset controller core, see below.
> The warning is produced because the reset line is non-shared and thus
> it is allowed to have only up-to one reset controller associated with
> that reset line, not two or more.
>
> The solution has multiple parts. First, the original socfpga_dwmac_init()
> is tweaked to use reset controller pointer from the stmmac_priv (private
> data of the stmmac core) instead of the local instance, which was used
> before. The local re-registration of the reset controller is removed.
>
> Next, the socfpga_dwmac_init() is moved after stmmac_dvr_probe() in the
> probe function. This order is legal according to Altera and it makes the
> code much easier, since there is no need to temporarily register and
> unregister the reset controller ; the reset controller is already registered
> by the stmmac_dvr_probe().
>
> Finally, plat_dat->exit and socfpga_dwmac_exit() is no longer necessary,
> since the functionality is already performed by the stmmac core.
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at drivers/reset/core.c:187 __of_reset_control_get+0x218/0x270
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160419-00015-gabb2477-dirty #4
> Hardware name: Altera SOCFPGA
> [<c010f290>] (unwind_backtrace) from [<c010b82c>] (show_stack+0x10/0x14)
> [<c010b82c>] (show_stack) from [<c0373da4>] (dump_stack+0x94/0xa8)
> [<c0373da4>] (dump_stack) from [<c011bcc0>] (__warn+0xec/0x104)
> [<c011bcc0>] (__warn) from [<c011bd88>] (warn_slowpath_null+0x20/0x28)
> [<c011bd88>] (warn_slowpath_null) from [<c03a6eb4>] (__of_reset_control_get+0x218/0x270)
> [<c03a6eb4>] (__of_reset_control_get) from [<c03a701c>] (__devm_reset_control_get+0x54/0x90)
> [<c03a701c>] (__devm_reset_control_get) from [<c041fa30>] (stmmac_dvr_probe+0x1b4/0x8e8)
> [<c041fa30>] (stmmac_dvr_probe) from [<c04298c8>] (socfpga_dwmac_probe+0x1b8/0x28c)
> [<c04298c8>] (socfpga_dwmac_probe) from [<c03d6ffc>] (platform_drv_probe+0x4c/0xb0)
> [<c03d6ffc>] (platform_drv_probe) from [<c03d54ec>] (driver_probe_device+0x224/0x2bc)
> [<c03d54ec>] (driver_probe_device) from [<c03d5630>] (__driver_attach+0xac/0xb0)
> [<c03d5630>] (__driver_attach) from [<c03d382c>] (bus_for_each_dev+0x6c/0xa0)
> [<c03d382c>] (bus_for_each_dev) from [<c03d4ad4>] (bus_add_driver+0x1a4/0x21c)
> [<c03d4ad4>] (bus_add_driver) from [<c03d60ac>] (driver_register+0x78/0xf8)
> [<c03d60ac>] (driver_register) from [<c0101760>] (do_one_initcall+0x40/0x170)
> [<c0101760>] (do_one_initcall) from [<c0800e38>] (kernel_init_freeable+0x1dc/0x27c)
> [<c0800e38>] (kernel_init_freeable) from [<c05d1bd4>] (kernel_init+0x8/0x114)
> [<c05d1bd4>] (kernel_init) from [<c01076f8>] (ret_from_fork+0x14/0x3c)
> ---[ end trace 059d2fbe87608fa9 ]---
>
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Matthew Gerlach <mgerlach@opensource.altera.com>
> Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
> V2: Add missing stmmac_rst = NULL; into socfpga_dwmac_init_probe()
> V3: Greatly simplify the code by moving socfpga_dwmac_init() after
> stmmac_dvr_probe(), which is legal. This removes the need for
> temporary registration of the reset controller, which is super.
>
Tested-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Thanks,
Dinh
^ permalink raw reply
* Re: [PATCH v2] MAINTAINERS: net: add entry for TI Ethernet Switch drivers
From: Tony Lindgren @ 2016-04-21 14:33 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, netdev, linux-kernel, David S. Miller,
Sekhar Nori, linux-omap, Richard Cochran
In-Reply-To: <5718B6EF.5000207@ti.com>
* Mugunthan V N <mugunthanvnm@ti.com> [160421 04:19]:
> On Thursday 21 April 2016 03:43 PM, Grygorii Strashko wrote:
> > Add record for TI Ethernet Switch Driver CPSW/CPDMA/MDIO HW
> > (am33/am43/am57/dr7/davinci) to ensure that related patches
> > will go through dedicated linux-omap list.
> >
> > Also add Mugunthan as maintainer and myself as the reviewer.
> >
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Tony Lindgren <tony@atomide.com>
> > Cc: Mugunthan V N <mugunthanvnm@ti.com>
> > Cc: Richard Cochran <richardcochran@gmail.com>
> > Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>
> Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Looks good to me too thanks:
Acked-by: Tony Lindgren <tony@atomide.com>
^ permalink raw reply
* Re: [PATCH net-next] perf, bpf: minimize the size of perf_trace_() tracepoint handler
From: Steven Rostedt @ 2016-04-21 14:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Alexei Starovoitov, David S . Miller, Ingo Molnar,
Daniel Borkmann, Arnaldo Carvalho de Melo, Wang Nan, Josef Bacik,
Brendan Gregg, netdev, linux-kernel, kernel-team
In-Reply-To: <20160421140255.GH3430@twins.programming.kicks-ass.net>
On Thu, 21 Apr 2016 16:02:55 +0200
Peter Zijlstra <peterz@infradead.org> wrote:
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> David, please take through the net tree as this depends on prior patches
> by Alexei that are already in your tree.
Yes please!
Acked-by: Steven Rostedt <rostedt@goodmis.org>
-- Steve
^ permalink raw reply
* Re: [PATCH net-next] perf, bpf: minimize the size of perf_trace_() tracepoint handler
From: Peter Zijlstra @ 2016-04-21 14:02 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: David S . Miller, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
Arnaldo Carvalho de Melo, Wang Nan, Josef Bacik, Brendan Gregg,
netdev, linux-kernel, kernel-team
In-Reply-To: <1461035510-2810305-1-git-send-email-ast@fb.com>
On Mon, Apr 18, 2016 at 08:11:50PM -0700, Alexei Starovoitov wrote:
> move trace_call_bpf() into helper function to minimize the size
> of perf_trace_*() tracepoint handlers.
> text data bss dec hex filename
> 10541679 5526646 2945024 19013349 1221ee5 vmlinux_before
> 10509422 5526646 2945024 18981092 121a0e4 vmlinux_after
>
> It may seem that perf_fetch_caller_regs() can also be moved,
> but that is incorrect, since ip/sp will be wrong.
>
> bpf+tracepoint performance is not affected, since
> perf_swevent_put_recursion_context() is now inlined.
> export_symbol_gpl can also be dropped.
>
> No measurable change in normal perf tracepoints.
>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
> include/linux/trace_events.h | 5 +++++
> include/trace/perf.h | 13 +++----------
> kernel/events/core.c | 20 +++++++++++++++++++-
> 3 files changed, 27 insertions(+), 11 deletions(-)
>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
David, please take through the net tree as this depends on prior patches
by Alexei that are already in your tree.
^ permalink raw reply
* [PATCH net-next v1 1/1] tipc: fix stale links after re-enabling bearer
From: Parthasarathy Bhuvaragan @ 2016-04-21 13:51 UTC (permalink / raw)
To: netdev; +Cc: jon.maloy, tipc-discussion
Commit 42b18f605fea ("tipc: refactor function tipc_link_timeout()"),
introduced a bug which prevents sending of probe messages during
link synchronization phase. This leads to hanging links, if the
bearer is disabled/enabled after links are up.
In this commit, we send the probe messages correctly.
Fixes: 42b18f605fea ("tipc: refactor function tipc_link_timeout()")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
---
net/tipc/link.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 2e28a7d7e802..7059c94f33c5 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -721,8 +721,7 @@ int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq)
mtyp = STATE_MSG;
state = bc_acked != bc_snt;
probe = l->silent_intv_cnt;
- if (probe)
- l->silent_intv_cnt++;
+ l->silent_intv_cnt++;
break;
case LINK_RESET:
setup = l->rst_cnt++ <= 4;
--
2.1.4
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
^ permalink raw reply related
* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
From: Hannes Frederic Sowa @ 2016-04-21 13:49 UTC (permalink / raw)
To: Eric Dumazet, Valdis.Kletnieks; +Cc: David S. Miller, netdev, linux-kernel
In-Reply-To: <1461245496.7627.17.camel@edumazet-glaptop3.roam.corp.google.com>
On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>> to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
>
>
> Looks like the following patch is needed, can you try it please ?
>
> Thanks !
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
> {
> struct sock *sk = (struct sock *)csk;
>
> - return lockdep_is_held(&sk->sk_lock) ||
> + return !debug_locks ||
> + lockdep_is_held(&sk->sk_lock) ||
> lockdep_is_held(&sk->sk_lock.slock);
> }
> #endif
I would prefer to add debug_locks at the WARN_ON level, like
WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
this fixes the initial splat.
Thanks Eric!
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Or Gerlitz @ 2016-04-21 13:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linux Netdev List, Sinan Kaya,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
timur-sgV2jX0FEOL9JmXXK+q4OQ, cov-sgV2jX0FEOL9JmXXK+q4OQ,
Yishai Hadas, Haggai Abramovsky, Eran Ben Elisha
In-Reply-To: <20160421133903.GA19633-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
On Thu, Apr 21, 2016 at 4:39 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Thu, Apr 21, 2016 at 04:37:42PM +0300, Or Gerlitz wrote:
>> On Wed, Apr 20, 2016 at 9:40 PM, Eran Ben Elisha
>> <eranlinuxmellanox-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> >> It is been 1.5 years since I reported the problem. We came up with three
>> >> different solutions this week. I'd like to see a version of the solution
>> >> to get merged until Mellanox comes up with a better solution with another
>> >> patch. My proposal is to use this one.
>>
>> > We will post our suggestion here in the following days.
>>
>> To update, Haggai A from our driver team is working on a patch. He is
>> providing a copy for
>> testing over ARM to the folks that reported on the problem and will
>> post it here early next week.
>
> Any chance you could give feedback to the patch I posted this week?
I missed your patch, looking on it now down this thread, I think
Haggai's patch is very much similar to yours, he's also touching some
more code in the IB driver.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Christoph Hellwig @ 2016-04-21 13:39 UTC (permalink / raw)
To: Or Gerlitz
Cc: Linux Netdev List, Sinan Kaya, Christoph Hellwig,
linux-rdma@vger.kernel.org, timur, cov, Yishai Hadas,
Haggai Abramovsky, Eran Ben Elisha
In-Reply-To: <CAJ3xEMiZ-L5OSogx_3fTSOt-chvYJw6ANtbxYJi5TFR-_H4U-g@mail.gmail.com>
On Thu, Apr 21, 2016 at 04:37:42PM +0300, Or Gerlitz wrote:
> On Wed, Apr 20, 2016 at 9:40 PM, Eran Ben Elisha
> <eranlinuxmellanox@gmail.com> wrote:
> >> It is been 1.5 years since I reported the problem. We came up with three
> >> different solutions this week. I'd like to see a version of the solution
> >> to get merged until Mellanox comes up with a better solution with another
> >> patch. My proposal is to use this one.
>
> > We will post our suggestion here in the following days.
>
> To update, Haggai A from our driver team is working on a patch. He is
> providing a copy for
> testing over ARM to the folks that reported on the problem and will
> post it here early next week.
Any chance you could give feedback to the patch I posted this week?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox