* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Thomas Graf @ 2014-12-04 1:15 UTC (permalink / raw)
To: Jesse Gross
Cc: Michael S. Tsirkin, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <CAEP_g=-y_-=r+BHk3VfzxMNwH=B-26DvQrzQ7gkKG8fDG7Mx3g@mail.gmail.com>
On 12/03/14 at 04:54pm, Jesse Gross wrote:
> I don't think that we actually need a bit. I would expect that ICMP
> generation to be coupled with routing (although this requires being
> able to know what the ultimate MTU is at the time of routing the inner
> packet). If that's the case, you just need to steer between L2 and L3
> processing in the same way that you would today and ICMP would just
> come in the right cases.
I think the MTU awareness is solveable but how do you steer between
L2 and L3? How do you differentiate between an L3 ACL rule in L2 mode
and an actual L3 based forward? dec_ttl? This is what drove me to
the user controlled bit and it became appealing as it allows to
enable/disable PMTU per guest or even per flow/route.
^ permalink raw reply
* [PATCH] e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
From: Rafael J. Wysocki @ 2014-12-04 1:15 UTC (permalink / raw)
To: e1000-devel
Cc: Jeff Kirsher, netdev, Linux NICS, Jesse Brandeburg, Bruce Allan,
Linux Kernel Mailing List, Linux PM list
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME within #ifdef blocks depending on
CONFIG_PM may be dropped now.
Do that in the e1000e and igb network drivers.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
Note: This depends on commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if
PM_SLEEP is selected) which is only in linux-next at the moment (via the
linux-pm tree).
Please let me know if it is OK to take this one into linux-pm.
---
drivers/net/ethernet/intel/e1000e/netdev.c | 2 --
drivers/net/ethernet/intel/igb/igb_main.c | 6 +-----
2 files changed, 1 insertion(+), 7 deletions(-)
Index: linux-pm/drivers/net/ethernet/intel/e1000e/netdev.c
===================================================================
--- linux-pm.orig/drivers/net/ethernet/intel/e1000e/netdev.c
+++ linux-pm/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6372,7 +6372,6 @@ static int e1000e_pm_resume(struct devic
}
#endif /* CONFIG_PM_SLEEP */
-#ifdef CONFIG_PM_RUNTIME
static int e1000e_pm_runtime_idle(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
@@ -6432,7 +6431,6 @@ static int e1000e_pm_runtime_suspend(str
return 0;
}
-#endif /* CONFIG_PM_RUNTIME */
#endif /* CONFIG_PM */
static void e1000_shutdown(struct pci_dev *pdev)
Index: linux-pm/drivers/net/ethernet/intel/igb/igb_main.c
===================================================================
--- linux-pm.orig/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-pm/drivers/net/ethernet/intel/igb/igb_main.c
@@ -186,11 +186,9 @@ static int igb_pci_enable_sriov(struct p
static int igb_suspend(struct device *);
#endif
static int igb_resume(struct device *);
-#ifdef CONFIG_PM_RUNTIME
static int igb_runtime_suspend(struct device *dev);
static int igb_runtime_resume(struct device *dev);
static int igb_runtime_idle(struct device *dev);
-#endif
static const struct dev_pm_ops igb_pm_ops = {
SET_SYSTEM_SLEEP_PM_OPS(igb_suspend, igb_resume)
SET_RUNTIME_PM_OPS(igb_runtime_suspend, igb_runtime_resume,
@@ -7441,7 +7439,6 @@ static int igb_resume(struct device *dev
return 0;
}
-#ifdef CONFIG_PM_RUNTIME
static int igb_runtime_idle(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
@@ -7478,8 +7475,7 @@ static int igb_runtime_resume(struct dev
{
return igb_resume(dev);
}
-#endif /* CONFIG_PM_RUNTIME */
-#endif
+#endif /* CONFIG_PM */
static void igb_shutdown(struct pci_dev *pdev)
{
^ permalink raw reply
* [PATCH net-next] net: avoid two atomic operations in fast clones
From: Eric Dumazet @ 2014-12-04 1:04 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Chris Mason, Sabrina Dubroca, Vijay Subramanian
From: Eric Dumazet <edumazet@google.com>
Commit ce1a4ea3f125 ("net: avoid one atomic operation in skb_clone()")
took the wrong way to save one atomic operation.
It is actually possible to avoid two atomic operations, if we
do not change skb->fclone values, and only rely on clone_ref
content to signal if the clone is available or not.
skb_clone() can simply use the fast clone if clone_ref is 1.
kfree_skbmem() can avoid the atomic_dec_and_test() if clone_ref is 1.
Note that because we usually free the clone before the original skb,
this particular attempt is only done for the original skb to have better
branch prediction.
SKB_FCLONE_FREE is removed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
---
include/linux/skbuff.h | 3 +--
net/core/skbuff.c | 35 ++++++++++++++++++-----------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7691ad5b4771..27b64d5f7c94 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -344,7 +344,6 @@ enum {
SKB_FCLONE_UNAVAILABLE, /* skb has no fclone (from head_cache) */
SKB_FCLONE_ORIG, /* orig skb (from fclone_cache) */
SKB_FCLONE_CLONE, /* companion fclone skb (from fclone_cache) */
- SKB_FCLONE_FREE, /* this companion fclone skb is available */
};
enum {
@@ -818,7 +817,7 @@ static inline bool skb_fclone_busy(const struct sock *sk,
fclones = container_of(skb, struct sk_buff_fclones, skb1);
return skb->fclone == SKB_FCLONE_ORIG &&
- fclones->skb2.fclone == SKB_FCLONE_CLONE &&
+ atomic_read(&fclones->fclone_ref) > 1 &&
fclones->skb2.sk == sk;
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 92116dfe827c..7a338fb55cc4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -265,7 +265,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
skb->fclone = SKB_FCLONE_ORIG;
atomic_set(&fclones->fclone_ref, 1);
- fclones->skb2.fclone = SKB_FCLONE_FREE;
+ fclones->skb2.fclone = SKB_FCLONE_CLONE;
fclones->skb2.pfmemalloc = pfmemalloc;
}
out:
@@ -541,26 +541,27 @@ static void kfree_skbmem(struct sk_buff *skb)
switch (skb->fclone) {
case SKB_FCLONE_UNAVAILABLE:
kmem_cache_free(skbuff_head_cache, skb);
- break;
+ return;
case SKB_FCLONE_ORIG:
fclones = container_of(skb, struct sk_buff_fclones, skb1);
- if (atomic_dec_and_test(&fclones->fclone_ref))
- kmem_cache_free(skbuff_fclone_cache, fclones);
- break;
-
- case SKB_FCLONE_CLONE:
- fclones = container_of(skb, struct sk_buff_fclones, skb2);
- /* The clone portion is available for
- * fast-cloning again.
+ /* We usually free the clone (TX completion) before original skb
+ * This test would have no chance to be true for the clone,
+ * while here, branch prediction will be good.
*/
- skb->fclone = SKB_FCLONE_FREE;
+ if (atomic_read(&fclones->fclone_ref) == 1)
+ goto fastpath;
+ break;
- if (atomic_dec_and_test(&fclones->fclone_ref))
- kmem_cache_free(skbuff_fclone_cache, fclones);
+ default: /* SKB_FCLONE_CLONE */
+ fclones = container_of(skb, struct sk_buff_fclones, skb2);
break;
}
+ if (!atomic_dec_and_test(&fclones->fclone_ref))
+ return;
+fastpath:
+ kmem_cache_free(skbuff_fclone_cache, fclones);
}
static void skb_release_head_state(struct sk_buff *skb)
@@ -872,15 +873,15 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
struct sk_buff_fclones *fclones = container_of(skb,
struct sk_buff_fclones,
skb1);
- struct sk_buff *n = &fclones->skb2;
+ struct sk_buff *n;
if (skb_orphan_frags(skb, gfp_mask))
return NULL;
if (skb->fclone == SKB_FCLONE_ORIG &&
- n->fclone == SKB_FCLONE_FREE) {
- n->fclone = SKB_FCLONE_CLONE;
- atomic_inc(&fclones->fclone_ref);
+ atomic_read(&fclones->fclone_ref) == 1) {
+ n = &fclones->skb2;
+ atomic_set(&fclones->fclone_ref, 2);
} else {
if (skb_pfmemalloc(skb))
gfp_mask |= __GFP_MEMALLOC;
^ permalink raw reply related
* Re: [PATCH net-next 3/3] ip: Add support for IP_CHECKSUM cmsg
From: Eric Dumazet @ 2014-12-04 0:56 UTC (permalink / raw)
To: Tom Herbert, Michael Kerrisk; +Cc: davem, netdev
In-Reply-To: <1417653868-14922-4-git-send-email-therbert@google.com>
CC Michael Kerrisk <mtk.manpages@gmail.com> for man pages...
On Wed, 2014-12-03 at 16:44 -0800, Tom Herbert wrote:
> New cmsg type is IP_CHECKSUM under SOL_IP. Enabled by standard
> setsockopt.
>
> The value returned is the unfolded 32 bit checksum of the packet
> being received starting from the first byte returned in recvmsg
> through the end of the packet (truncation is disregarded).
>
> Modified UDP to postpull checksum beyond UDP header before returning
> checksum for UDP data to userspace.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
> include/net/inet_sock.h | 1 +
> include/uapi/linux/in.h | 1 +
> net/ipv4/ip_sockglue.c | 34 +++++++++++++++++++++++++++++++++-
> net/ipv4/udp.c | 10 +++++++++-
> 4 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 4091fab..2823fc0 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -203,6 +203,7 @@ struct inet_sock {
> #define IP_CMSG_RETOPTS (1 << 4)
> #define IP_CMSG_PASSSEC (1 << 5)
> #define IP_CMSG_ORIGDSTADDR (1 << 6)
> +#define IP_CMSG_CHECKSUM (1 << 7)
>
> static inline struct inet_sock *inet_sk(const struct sock *sk)
> {
> diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
> index c33a65e..589ced0 100644
> --- a/include/uapi/linux/in.h
> +++ b/include/uapi/linux/in.h
> @@ -109,6 +109,7 @@ struct in_addr {
>
> #define IP_MINTTL 21
> #define IP_NODEFRAG 22
> +#define IP_CHECKSUM 23
>
> /* IP_MTU_DISCOVER values */
> #define IP_PMTUDISC_DONT 0 /* Never send DF frames */
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index d4406aa..054280f 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -96,6 +96,14 @@ static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb)
> put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data);
> }
>
> +static void ip_cmsg_recv_checksum(struct msghdr *msg, struct sk_buff *skb)
> +{
> + if (skb->ip_summed != CHECKSUM_COMPLETE)
> + return;
> +
> + put_cmsg(msg, SOL_IP, IP_CHECKSUM, sizeof(__wsum), &skb->csum);
> +}
> +
> static void ip_cmsg_recv_security(struct msghdr *msg, struct sk_buff *skb)
> {
> char *secdata;
> @@ -190,9 +198,16 @@ void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
> return;
> }
>
> - if (flags & IP_CMSG_ORIGDSTADDR)
> + if (flags & IP_CMSG_ORIGDSTADDR) {
> ip_cmsg_recv_dstaddr(msg, skb);
>
> + flags &= ~IP_CMSG_ORIGDSTADDR;
> + if (!flags)
> + return;
> + }
> +
> + if (flags & IP_CMSG_CHECKSUM)
> + ip_cmsg_recv_checksum(msg, skb);
> }
> EXPORT_SYMBOL(ip_cmsg_recv);
>
> @@ -512,6 +527,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
> case IP_MULTICAST_ALL:
> case IP_MULTICAST_LOOP:
> case IP_RECVORIGDSTADDR:
> + case IP_CHECKSUM:
> if (optlen >= sizeof(int)) {
> if (get_user(val, (int __user *) optval))
> return -EFAULT;
> @@ -609,6 +625,19 @@ static int do_ip_setsockopt(struct sock *sk, int level,
> else
> inet->cmsg_flags &= ~IP_CMSG_ORIGDSTADDR;
> break;
> + case IP_CHECKSUM:
> + if (val) {
> + if (!(inet->cmsg_flags & IP_CMSG_CHECKSUM)) {
> + inet_inc_convert_csum(sk);
> + inet->cmsg_flags |= IP_CMSG_CHECKSUM;
> + }
> + } else {
> + if (inet->cmsg_flags & IP_CMSG_CHECKSUM) {
> + inet_dec_convert_csum(sk);
> + inet->cmsg_flags &= ~IP_CMSG_CHECKSUM;
> + }
> + }
> + break;
> case IP_TOS: /* This sets both TOS and Precedence */
> if (sk->sk_type == SOCK_STREAM) {
> val &= ~INET_ECN_MASK;
> @@ -1212,6 +1241,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> case IP_RECVORIGDSTADDR:
> val = (inet->cmsg_flags & IP_CMSG_ORIGDSTADDR) != 0;
> break;
> + case IP_CHECKSUM:
> + val = (inet->cmsg_flags & IP_CMSG_CHECKSUM) != 0;
> + break;
> case IP_TOS:
> val = inet->tos;
> break;
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 221b53f..bba2e06 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1315,8 +1315,16 @@ try_again:
> memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
> *addr_len = sizeof(*sin);
> }
> - if (inet->cmsg_flags)
> + if (inet->cmsg_flags) {
> + /* Pull checksum past UDP header in case we are providing
> + * checksum in cmsg.
> + */
> + if (inet->cmsg_flags & IP_CMSG_CHECKSUM)
> + skb_postpull_rcsum(skb, skb->data,
> + sizeof(struct udphdr));
> +
> ip_cmsg_recv(msg, skb);
> + }
>
> err = copied;
> if (flags & MSG_TRUNC)
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2014-12-04 0:54 UTC (permalink / raw)
To: Thomas Graf
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, Michael S. Tsirkin,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Wang,
Du, Fan, fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <20141203230551.GC8822-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
On Wed, Dec 3, 2014 at 3:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 12/03/14 at 02:51pm, Jesse Gross wrote:
>> My proposal would be something like this:
>> * For L2, reduce the VM MTU to the lowest common denominator on the segment.
>> * For L3, use path MTU discovery or fragment inner packet (i.e.
>> normal routing behavior).
>> * As a last resort (such as if using an old version of virtio in the
>> guest), fragment the tunnel packet.
>
> That's what I had in mind as well although using a differentiator bit
> which indicates to the output path whether the packet is to be
> considered switched or routed and thus send ICMPs. The bit would be set
> per flow, thus allowing arbitary granularity of behaviour.
I don't think that we actually need a bit. I would expect that ICMP
generation to be coupled with routing (although this requires being
able to know what the ultimate MTU is at the time of routing the inner
packet). If that's the case, you just need to steer between L2 and L3
processing in the same way that you would today and ICMP would just
come in the right cases.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* [PATCH net-next 3/3] ip: Add support for IP_CHECKSUM cmsg
From: Tom Herbert @ 2014-12-04 0:44 UTC (permalink / raw)
To: davem, netdev
In-Reply-To: <1417653868-14922-1-git-send-email-therbert@google.com>
New cmsg type is IP_CHECKSUM under SOL_IP. Enabled by standard
setsockopt.
The value returned is the unfolded 32 bit checksum of the packet
being received starting from the first byte returned in recvmsg
through the end of the packet (truncation is disregarded).
Modified UDP to postpull checksum beyond UDP header before returning
checksum for UDP data to userspace.
Signed-off-by: Tom Herbert <therbert@google.com>
---
include/net/inet_sock.h | 1 +
include/uapi/linux/in.h | 1 +
net/ipv4/ip_sockglue.c | 34 +++++++++++++++++++++++++++++++++-
net/ipv4/udp.c | 10 +++++++++-
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 4091fab..2823fc0 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -203,6 +203,7 @@ struct inet_sock {
#define IP_CMSG_RETOPTS (1 << 4)
#define IP_CMSG_PASSSEC (1 << 5)
#define IP_CMSG_ORIGDSTADDR (1 << 6)
+#define IP_CMSG_CHECKSUM (1 << 7)
static inline struct inet_sock *inet_sk(const struct sock *sk)
{
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index c33a65e..589ced0 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -109,6 +109,7 @@ struct in_addr {
#define IP_MINTTL 21
#define IP_NODEFRAG 22
+#define IP_CHECKSUM 23
/* IP_MTU_DISCOVER values */
#define IP_PMTUDISC_DONT 0 /* Never send DF frames */
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d4406aa..054280f 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -96,6 +96,14 @@ static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb)
put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data);
}
+static void ip_cmsg_recv_checksum(struct msghdr *msg, struct sk_buff *skb)
+{
+ if (skb->ip_summed != CHECKSUM_COMPLETE)
+ return;
+
+ put_cmsg(msg, SOL_IP, IP_CHECKSUM, sizeof(__wsum), &skb->csum);
+}
+
static void ip_cmsg_recv_security(struct msghdr *msg, struct sk_buff *skb)
{
char *secdata;
@@ -190,9 +198,16 @@ void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
return;
}
- if (flags & IP_CMSG_ORIGDSTADDR)
+ if (flags & IP_CMSG_ORIGDSTADDR) {
ip_cmsg_recv_dstaddr(msg, skb);
+ flags &= ~IP_CMSG_ORIGDSTADDR;
+ if (!flags)
+ return;
+ }
+
+ if (flags & IP_CMSG_CHECKSUM)
+ ip_cmsg_recv_checksum(msg, skb);
}
EXPORT_SYMBOL(ip_cmsg_recv);
@@ -512,6 +527,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
case IP_MULTICAST_ALL:
case IP_MULTICAST_LOOP:
case IP_RECVORIGDSTADDR:
+ case IP_CHECKSUM:
if (optlen >= sizeof(int)) {
if (get_user(val, (int __user *) optval))
return -EFAULT;
@@ -609,6 +625,19 @@ static int do_ip_setsockopt(struct sock *sk, int level,
else
inet->cmsg_flags &= ~IP_CMSG_ORIGDSTADDR;
break;
+ case IP_CHECKSUM:
+ if (val) {
+ if (!(inet->cmsg_flags & IP_CMSG_CHECKSUM)) {
+ inet_inc_convert_csum(sk);
+ inet->cmsg_flags |= IP_CMSG_CHECKSUM;
+ }
+ } else {
+ if (inet->cmsg_flags & IP_CMSG_CHECKSUM) {
+ inet_dec_convert_csum(sk);
+ inet->cmsg_flags &= ~IP_CMSG_CHECKSUM;
+ }
+ }
+ break;
case IP_TOS: /* This sets both TOS and Precedence */
if (sk->sk_type == SOCK_STREAM) {
val &= ~INET_ECN_MASK;
@@ -1212,6 +1241,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_RECVORIGDSTADDR:
val = (inet->cmsg_flags & IP_CMSG_ORIGDSTADDR) != 0;
break;
+ case IP_CHECKSUM:
+ val = (inet->cmsg_flags & IP_CMSG_CHECKSUM) != 0;
+ break;
case IP_TOS:
val = inet->tos;
break;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 221b53f..bba2e06 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1315,8 +1315,16 @@ try_again:
memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
*addr_len = sizeof(*sin);
}
- if (inet->cmsg_flags)
+ if (inet->cmsg_flags) {
+ /* Pull checksum past UDP header in case we are providing
+ * checksum in cmsg.
+ */
+ if (inet->cmsg_flags & IP_CMSG_CHECKSUM)
+ skb_postpull_rcsum(skb, skb->data,
+ sizeof(struct udphdr));
+
ip_cmsg_recv(msg, skb);
+ }
err = copied;
if (flags & MSG_TRUNC)
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply related
* [PATCH net-next 1/3] ip: Move checksum convert defines to inet
From: Tom Herbert @ 2014-12-04 0:44 UTC (permalink / raw)
To: davem, netdev
In-Reply-To: <1417653868-14922-1-git-send-email-therbert@google.com>
Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).
Signed-off-by: Tom Herbert <therbert@google.com>
---
include/linux/udp.h | 16 +---------------
include/net/inet_sock.h | 17 +++++++++++++++++
net/ipv4/fou.c | 2 +-
net/ipv4/udp.c | 2 +-
net/ipv4/udp_tunnel.c | 2 +-
net/ipv6/udp.c | 2 +-
6 files changed, 22 insertions(+), 19 deletions(-)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index ee32775..247cfdc 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -49,11 +49,7 @@ struct udp_sock {
unsigned int corkflag; /* Cork is required */
__u8 encap_type; /* Is this an Encapsulation socket? */
unsigned char no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
- no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
- convert_csum:1;/* On receive, convert checksum
- * unnecessary to checksum complete
- * if possible.
- */
+ no_check6_rx:1;/* Allow zero UDP6 checksums on RX? */
/*
* Following member retains the information to create a UDP header
* when the socket is uncorked.
@@ -102,16 +98,6 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
return udp_sk(sk)->no_check6_rx;
}
-static inline void udp_set_convert_csum(struct sock *sk, bool val)
-{
- udp_sk(sk)->convert_csum = val;
-}
-
-static inline bool udp_get_convert_csum(struct sock *sk)
-{
- return udp_sk(sk)->convert_csum;
-}
-
#define udp_portaddr_for_each_entry(__sk, node, list) \
hlist_nulls_for_each_entry(__sk, node, list, __sk_common.skc_portaddr_node)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index a829b77..360b110 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -184,6 +184,7 @@ struct inet_sock {
mc_all:1,
nodefrag:1;
__u8 rcv_tos;
+ __u8 convert_csum;
int uc_index;
int mc_index;
__be32 mc_addr;
@@ -250,4 +251,20 @@ static inline __u8 inet_sk_flowi_flags(const struct sock *sk)
return flags;
}
+static inline void inet_inc_convert_csum(struct sock *sk)
+{
+ inet_sk(sk)->convert_csum++;
+}
+
+static inline void inet_dec_convert_csum(struct sock *sk)
+{
+ if (inet_sk(sk)->convert_csum > 0)
+ inet_sk(sk)->convert_csum--;
+}
+
+static inline bool inet_get_convert_csum(struct sock *sk)
+{
+ return !!inet_sk(sk)->convert_csum;
+}
+
#endif /* _INET_SOCK_H */
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index b986298..2197c36 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -490,7 +490,7 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
sk->sk_user_data = fou;
fou->sock = sock;
- udp_set_convert_csum(sk, true);
+ inet_inc_convert_csum(sk);
sk->sk_allocation = GFP_ATOMIC;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b2d6068..221b53f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1793,7 +1793,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
if (sk != NULL) {
int ret;
- if (udp_sk(sk)->convert_csum && uh->check && !IS_UDPLITE(sk))
+ if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
inet_compute_pseudo);
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 1671263..9996e63 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -63,7 +63,7 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
inet_sk(sk)->mc_loop = 0;
/* Enable CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion */
- udp_set_convert_csum(sk, true);
+ inet_inc_convert_csum(sk);
rcu_assign_sk_user_data(sk, cfg->sk_user_data);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7cfb5d7..5d9900c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -896,7 +896,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
goto csum_error;
}
- if (udp_sk(sk)->convert_csum && uh->check && !IS_UDPLITE(sk))
+ if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
ip6_compute_pseudo);
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply related
* [PATCH net-next 0/3] ip: Support checksum returned in csmg
From: Tom Herbert @ 2014-12-04 0:44 UTC (permalink / raw)
To: davem, netdev
This patch set allows the packet checksum for a datagram socket
to be returned in csum data in recvmsg. This allows userspace
to implement its own checksum over the data, for instance if an
IP tunnel was be implemented in user space, the inner checksum
could be validated.
Changes in this patch set:
- Move checksum conversion to inet_sock from udp_sock. This
generalizes checksum conversion for use with other protocols.
- Move IP cmsg constants to a header file and make processing
of the flags more efficient in ip_cmsg_recv.
- Return checksum value in cmsg. This is specifically the unfolded
32 bit checksum of the full packet starting from the first byte
returned in recvmsg.
Tested: Wrote a little server to get checksums in cmsg for UDP and
verfied correct checksum is returned.
Tom Herbert (3):
ip: Move checksum convert defines to inet
ip: IP cmsg cleanup
ip: Add support for IP_CHECKSUM cmsg
include/linux/udp.h | 16 +--------
include/net/inet_sock.h | 27 ++++++++++++++
include/uapi/linux/in.h | 1 +
net/ipv4/fou.c | 2 +-
net/ipv4/ip_sockglue.c | 96 +++++++++++++++++++++++++++++++++++--------------
net/ipv4/udp.c | 12 +++++--
net/ipv4/udp_tunnel.c | 2 +-
net/ipv6/udp.c | 2 +-
8 files changed, 111 insertions(+), 47 deletions(-)
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply
* Re: [patch net-next v4 8/9] net: move vlan pop/push functions into common code
From: Pravin Shelar @ 2014-12-04 0:33 UTC (permalink / raw)
To: Jiri Benc
Cc: Jiri Pirko, netdev, David Miller, Jamal Hadi Salim, Tom Herbert,
Eric Dumazet, Willem de Bruijn, Daniel Borkmann, mst, fw,
Paul.Durrant, Thomas Graf, Cong Wang
In-Reply-To: <20141121202933.2970b7ea@griffin>
On Fri, Nov 21, 2014 at 11:29 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Fri, 21 Nov 2014 10:41:27 -0800, Pravin Shelar wrote:
>> There is bug in the openvswitch code where vlan-pop does mac-length
>> reset rather than subtracting VLAN_HLEN from mac-len. MPLS code
>> depends on this. Now this patch moves the bug to common code. I am
>> fine with this patch and I can send fix to change code vlan-pop code
>> to adjust mac_len later on.
>> But I am worried about the comment made by Jiri Benc on earlier
>> version of this patch where is mentioned that subtracting VLAN_HLEN
>> can break common case for vlan-pop. In that case we can not change
>> this common function. Jiri Benc, Can you point me to the code, so the
>> we can work on solution to fix this issue.
>
> In such case, a skb that enters the __pop_vlan_tci function looks like
> this:
>
> +--------+--------+-------------+-----+---------+-------
> | dstmac | srcmac | ETH_P_8021Q | tci | ETH_P_x | data
> +--------+--------+-------------+-----+---------+-------
> ^ ^
> mac_header network_header
>
> skb->protocol = ETH_P_8021Q
> skb->mac_len = 14
>
> After the function finishes, it's supposed to look like this:
>
> +--------+--------+---------+-------
> | dstmac | srcmac | ETH_P_x | data
> +--------+--------+---------+-------
> ^ ^
> mac_header network_header
>
> skb->protocol = ETH_P_x
> skb->mac_len = 14
>
> Blindly decrementing mac_len wouldn't work here, you'd end up with
> mac_len = 10. You'd either need to ensure this case doesn't happen (it
> does with openvswitch currently) or detect this case.
>
OVS correctly sets mac header length in case of vlan header. Can you
give me OVS test case to reproduce this issue?
^ permalink raw reply
* Re: [PATCH iproute2] ss: Use rtnl_dump_filter in handle_netlink_request
From: Vadim Kochan @ 2014-12-04 0:00 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20141203232819.GA12656@angus-think.lan>
Ofcourse I think it is possible to add checking if protocol ==
NETLINK_INET_DIAG protocol
and dont print the error message in lib/netlink.c if errno == ENOENT,
but I am not sure if you will like such approach ...
On Thu, Dec 4, 2014 at 1:28 AM, <vadim4j@gmail.com> wrote:
> On Wed, Dec 03, 2014 at 03:07:48PM -0800, Stephen Hemminger wrote:
>> On Thu, 4 Dec 2014 00:15:54 +0200
>> Vadim Kochan <vadim4j@gmail.com> wrote:
>>
>> > I established some simple OpenVPN connection (at least by the logs and
>> > ss I see that peer is connected), but
>> > I dont get any errors with ss.
>> >
>> > So can you please provide some more info how did you tested this patch
>> > ? I am surprised that this is caused only
>> > by these changes ...
>> >
>> > On Wed, Dec 3, 2014 at 7:45 PM, <vadim4j@gmail.com> wrote:
>> > > On Wed, Dec 03, 2014 at 09:51:23AM -0800, Stephen Hemminger wrote:
>> > >> On Wed, 3 Dec 2014 19:20:10 +0200
>> > >> vadim4j@gmail.com wrote:
>> > >>
>> > >> > On Wed, Dec 03, 2014 at 09:21:29AM -0800, Stephen Hemminger wrote:
>> > >> > > On Tue, 2 Dec 2014 16:53:04 +0200
>> > >> > > Vadim Kochan <vadim4j@gmail.com> wrote:
>> > >> > >
>> > >> > > > Replaced handling netlink messages by rtnl_dump_filter
>> > >> > > > from lib/libnetlink.c, also:
>> > >> > > >
>> > >> > > > - removed unused dump_fp arg;
>> > >> > > > - added MAGIC_SEQ #define for 123456 seq id
>> > >> > > >
>> > >> > > > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
>> > >> > >
>> > >> > > This doesn't work correctly.
>> > >> > >
>> > >> > > Simple test
>> > >> > > $ misc/ss >/dev/null
>> > >> > > RTNETLINK answers: No such file or directory
>> > >> > > RTNETLINK answers: No such file or directory
>> > >> > > RTNETLINK answers: No such file or directory
>> > >> >
>> > >> > Just tried, I did not get such errors.
>> > >>
>> > >> I have OpenVPN running.
>> > >
>> > > Hm, it is reproduced only with this patch ?
>> > > If so I will try to setup OpenVPN ... can't imagine how it can be
>> > > related ...
>>
>> I am running on 3.17.4 kernel if that helps.
>
> OK, thank you!
>
> I see a reason why it was caused by this patch. So the difference is in
> the way how the libnetlink/rtnl_* and original ss.c handles the
> NLMSG_ERROR, ss.c netlink handler checks errno ENOENT (No such file or directory)
> and silently closes file end returns -1, but rtnl_* prints the error message.
>
> Meanwhile I dont know the real reason why ENOENT error is happaned, may
> be it is OK in context of diagnostic messages. But I grepped over the
> commits and found that this ENOENT checking was added by Eric in:
>
> commit a3fd8e58c1787af186f5c4b234ff974544f840b6
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon Jan 30 17:05:45 2012 +0100
>
> ss: should support CONFIG_INET_UDP_DIAG=n kernels
>
> ss -x currently fails if CONFIG_INET_UDP_DIAG=n or old kernels
>
> Also close file descriptors while we are at it.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
^ permalink raw reply
* Re: [PATCH iproute2] ss: Use rtnl_dump_filter in handle_netlink_request
From: vadim4j @ 2014-12-03 23:28 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20141203150748.001d481e@urahara>
On Wed, Dec 03, 2014 at 03:07:48PM -0800, Stephen Hemminger wrote:
> On Thu, 4 Dec 2014 00:15:54 +0200
> Vadim Kochan <vadim4j@gmail.com> wrote:
>
> > I established some simple OpenVPN connection (at least by the logs and
> > ss I see that peer is connected), but
> > I dont get any errors with ss.
> >
> > So can you please provide some more info how did you tested this patch
> > ? I am surprised that this is caused only
> > by these changes ...
> >
> > On Wed, Dec 3, 2014 at 7:45 PM, <vadim4j@gmail.com> wrote:
> > > On Wed, Dec 03, 2014 at 09:51:23AM -0800, Stephen Hemminger wrote:
> > >> On Wed, 3 Dec 2014 19:20:10 +0200
> > >> vadim4j@gmail.com wrote:
> > >>
> > >> > On Wed, Dec 03, 2014 at 09:21:29AM -0800, Stephen Hemminger wrote:
> > >> > > On Tue, 2 Dec 2014 16:53:04 +0200
> > >> > > Vadim Kochan <vadim4j@gmail.com> wrote:
> > >> > >
> > >> > > > Replaced handling netlink messages by rtnl_dump_filter
> > >> > > > from lib/libnetlink.c, also:
> > >> > > >
> > >> > > > - removed unused dump_fp arg;
> > >> > > > - added MAGIC_SEQ #define for 123456 seq id
> > >> > > >
> > >> > > > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> > >> > >
> > >> > > This doesn't work correctly.
> > >> > >
> > >> > > Simple test
> > >> > > $ misc/ss >/dev/null
> > >> > > RTNETLINK answers: No such file or directory
> > >> > > RTNETLINK answers: No such file or directory
> > >> > > RTNETLINK answers: No such file or directory
> > >> >
> > >> > Just tried, I did not get such errors.
> > >>
> > >> I have OpenVPN running.
> > >
> > > Hm, it is reproduced only with this patch ?
> > > If so I will try to setup OpenVPN ... can't imagine how it can be
> > > related ...
>
> I am running on 3.17.4 kernel if that helps.
OK, thank you!
I see a reason why it was caused by this patch. So the difference is in
the way how the libnetlink/rtnl_* and original ss.c handles the
NLMSG_ERROR, ss.c netlink handler checks errno ENOENT (No such file or directory)
and silently closes file end returns -1, but rtnl_* prints the error message.
Meanwhile I dont know the real reason why ENOENT error is happaned, may
be it is OK in context of diagnostic messages. But I grepped over the
commits and found that this ENOENT checking was added by Eric in:
commit a3fd8e58c1787af186f5c4b234ff974544f840b6
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon Jan 30 17:05:45 2012 +0100
ss: should support CONFIG_INET_UDP_DIAG=n kernels
ss -x currently fails if CONFIG_INET_UDP_DIAG=n or old kernels
Also close file descriptors while we are at it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
^ permalink raw reply
* Re: [PATCH net] Update old iproute2 and Xen Remus links
From: Stephen Hemminger @ 2014-12-03 23:34 UTC (permalink / raw)
To: Andrew Shewmaker; +Cc: linux-doc, linux-kernel, netdev, corbet, jhs, davem
In-Reply-To: <1417644451-7305-1-git-send-email-agshew@gmail.com>
On Wed, 3 Dec 2014 14:07:31 -0800
Andrew Shewmaker <agshew@gmail.com> wrote:
> Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
> ---
> Documentation/Changes | 2 +-
> net/sched/Kconfig | 7 ++++---
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/Changes b/Documentation/Changes
> index 1de131b..74bdda9 100644
> --- a/Documentation/Changes
> +++ b/Documentation/Changes
> @@ -383,7 +383,7 @@ o <http://www.iptables.org/downloads.html>
>
> Ip-route2
> ---------
> -o <ftp://ftp.tux.org/pub/net/ip-routing/iproute2-2.2.4-now-ss991023.tar.gz>
> +o <https://www.kernel.org/pub/linux/utils/net/iproute2/>
>
> OProfile
> --------
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index a1a8e29..d17053d 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -22,8 +22,9 @@ menuconfig NET_SCHED
> This code is considered to be experimental.
>
> To administer these schedulers, you'll need the user-level utilities
> - from the package iproute2+tc at <ftp://ftp.tux.org/pub/net/ip-routing/>.
> - That package also contains some documentation; for more, check out
> + from the package iproute2+tc at
> + <https://www.kernel.org/pub/linux/utils/net/iproute2/>. That package
> + also contains some documentation; for more, check out
> <http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2>.
>
> This Quality of Service (QoS) support will enable you to use
> @@ -336,7 +337,7 @@ config NET_SCH_PLUG
> of virtual machines by allowing the generated network output to be rolled
> back if needed.
>
> - For more information, please refer to http://wiki.xensource.com/xenwiki/Remus
> + For more information, please refer to <http://wiki.xenproject.org/wiki/Remus>
>
> Say Y here if you are using this kernel for Xen dom0 and
> want to protect Xen guests with Remus.
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
^ permalink raw reply
* Re: [patch net-next 3/6] net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
From: Daniel Borkmann @ 2014-12-03 23:15 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Jiri Pirko, netdev, davem
In-Reply-To: <547F7D6D.3000709@mojatatu.com>
On 12/03/2014 10:15 PM, Jamal Hadi Salim wrote:
...
> I am not an rcu officionado. So if the control path is doing a non-rcu
> get + rcu-del/change (update) then as long as the fastpath is (read)
> rcu locking we are fine and nothing will actually happen until the
> fastpath releases and rcu grace period ends, correct?
So if the control path does a list_del_rcu() and call_rcu() under
lock and iterates over list_for_each_entry{,safe}() [which we do
in cls_bpf_delete()], while the fast path [cls_bpf_classify()] uses
a rcu_read_{un,}lock() with *_rcu list iterator, that's fine then,
as it's still guaranteed to be deleted after the grace period. So
Jiri's change looks good to me.
Acked-by: Daniel Borkmann <dborkman@redhat.com>
^ permalink raw reply
* Re: [PATCH iproute2] ss: Use rtnl_dump_filter in handle_netlink_request
From: Stephen Hemminger @ 2014-12-03 23:07 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAMw6YJ+W0KCvGrfij990+BU2EsW6E4SEWJRsrRgk=vnq-okjzQ@mail.gmail.com>
On Thu, 4 Dec 2014 00:15:54 +0200
Vadim Kochan <vadim4j@gmail.com> wrote:
> I established some simple OpenVPN connection (at least by the logs and
> ss I see that peer is connected), but
> I dont get any errors with ss.
>
> So can you please provide some more info how did you tested this patch
> ? I am surprised that this is caused only
> by these changes ...
>
> On Wed, Dec 3, 2014 at 7:45 PM, <vadim4j@gmail.com> wrote:
> > On Wed, Dec 03, 2014 at 09:51:23AM -0800, Stephen Hemminger wrote:
> >> On Wed, 3 Dec 2014 19:20:10 +0200
> >> vadim4j@gmail.com wrote:
> >>
> >> > On Wed, Dec 03, 2014 at 09:21:29AM -0800, Stephen Hemminger wrote:
> >> > > On Tue, 2 Dec 2014 16:53:04 +0200
> >> > > Vadim Kochan <vadim4j@gmail.com> wrote:
> >> > >
> >> > > > Replaced handling netlink messages by rtnl_dump_filter
> >> > > > from lib/libnetlink.c, also:
> >> > > >
> >> > > > - removed unused dump_fp arg;
> >> > > > - added MAGIC_SEQ #define for 123456 seq id
> >> > > >
> >> > > > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> >> > >
> >> > > This doesn't work correctly.
> >> > >
> >> > > Simple test
> >> > > $ misc/ss >/dev/null
> >> > > RTNETLINK answers: No such file or directory
> >> > > RTNETLINK answers: No such file or directory
> >> > > RTNETLINK answers: No such file or directory
> >> >
> >> > Just tried, I did not get such errors.
> >>
> >> I have OpenVPN running.
> >
> > Hm, it is reproduced only with this patch ?
> > If so I will try to setup OpenVPN ... can't imagine how it can be
> > related ...
I am running on 3.17.4 kernel if that helps.
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Thomas Graf @ 2014-12-03 23:05 UTC (permalink / raw)
To: Jesse Gross
Cc: Michael S. Tsirkin, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <CAEP_g=90nC2HhmBNKh-hKJ5MJ85Z-_ER14roDxMsZAKog+dFhw@mail.gmail.com>
On 12/03/14 at 02:51pm, Jesse Gross wrote:
> My proposal would be something like this:
> * For L2, reduce the VM MTU to the lowest common denominator on the segment.
> * For L3, use path MTU discovery or fragment inner packet (i.e.
> normal routing behavior).
> * As a last resort (such as if using an old version of virtio in the
> guest), fragment the tunnel packet.
That's what I had in mind as well although using a differentiator bit
which indicates to the output path whether the packet is to be
considered switched or routed and thus send ICMPs. The bit would be set
per flow, thus allowing arbitary granularity of behaviour.
I haven't fully thought this through yet though.
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2014-12-03 22:51 UTC (permalink / raw)
To: Thomas Graf
Cc: Michael S. Tsirkin, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <20141203220244.GA8822@casper.infradead.org>
On Wed, Dec 3, 2014 at 2:02 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 12/03/14 at 11:38am, Jesse Gross wrote:
>> On Wed, Dec 3, 2014 at 10:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > Both approaches seem strange. You are sending 1 packet an hour to
>> > some destination behind 100 tunnels. Why would you want to
>> > cut down your MTU for all packets? On the other hand,
>> > doubling the amount of packets because your MTU is off
>> > by a couple of bytes will hurt performance significantly.
>> >
>> > Still, if you want to cut down the MTU within guest,
>> > that's only an ifconfig away.
>> > Most people would not want to bother, I think it's a good
>> > idea to make PMTU work properly for them.
>>
>> I care about correctness first, which means that an Ethernet link
>> being exposed to the guest should behave like Ethernet. So, yes, IPX
>> should work if somebody chooses to do that.
>>
>> Your comments are about performance optimization. That's fine but
>> without a correct base to start from it seems like putting the cart
>> before the horse and is hard to reason about.
>
> I agree with Jesse in particular about correctnes but Michael has a
> point (which I thing nobod objects to) which is that it may not always
> make sense to force the MTU onto the guest. It clearly makes sense for
> the edge server connected to an overlay but it may not be ideal if
> WAN traffic is VXLAN encapped and local DC traffic is put onto a VLAN.
The question is whether you would do this in a single L2 segment. It's
possible, of course, but probably not a great idea and I'm not sure
that it's really worth optimizing for. We do have one existing example
of this type of MTU reduction - the bridge device when you attach
multiple devices with varying MTUs (including a VXLAN device). In that
case, the bridge device is effectively acting as a connection point,
similar to virtio in a VM.
My proposal would be something like this:
* For L2, reduce the VM MTU to the lowest common denominator on the segment.
* For L3, use path MTU discovery or fragment inner packet (i.e.
normal routing behavior).
* As a last resort (such as if using an old version of virtio in the
guest), fragment the tunnel packet.
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Michael S. Tsirkin @ 2014-12-03 22:50 UTC (permalink / raw)
To: Thomas Graf
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Wang,
Du, Fan, fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <20141203220244.GA8822-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>
On Wed, Dec 03, 2014 at 10:02:44PM +0000, Thomas Graf wrote:
> On 12/03/14 at 11:38am, Jesse Gross wrote:
> > On Wed, Dec 3, 2014 at 10:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > Both approaches seem strange. You are sending 1 packet an hour to
> > > some destination behind 100 tunnels. Why would you want to
> > > cut down your MTU for all packets? On the other hand,
> > > doubling the amount of packets because your MTU is off
> > > by a couple of bytes will hurt performance significantly.
> > >
> > > Still, if you want to cut down the MTU within guest,
> > > that's only an ifconfig away.
> > > Most people would not want to bother, I think it's a good
> > > idea to make PMTU work properly for them.
> >
> > I care about correctness first, which means that an Ethernet link
> > being exposed to the guest should behave like Ethernet. So, yes, IPX
> > should work if somebody chooses to do that.
> >
> > Your comments are about performance optimization. That's fine but
> > without a correct base to start from it seems like putting the cart
> > before the horse and is hard to reason about.
>
> I agree with Jesse in particular about correctnes but Michael has a
> point (which I thing nobod objects to) which is that it may not always
> make sense to force the MTU onto the guest. It clearly makes sense for
> the edge server connected to an overlay but it may not be ideal if
> WAN traffic is VXLAN encapped and local DC traffic is put onto a VLAN.
And it's not like tweaking local MTU for one interface will
magically fix everything.
> That said, I think it is fair to assume that the host knows what role
> it plays and can be configured accordingly, i.e. a Netlink API which
> exposes the encap overhead so libvirt can max() over it force it onto
> the guest or something along those lines.
I'd say let's try to at least fix IP traffic properly.
--
MST
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH net-next v4] rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
From: Thomas Graf @ 2014-12-03 22:25 UTC (permalink / raw)
To: Mahesh Bandewar
Cc: netdev, David Miller, Eric Dumazet, Roopa Prabhu, Toshiaki Makita
In-Reply-To: <1417643184-23440-1-git-send-email-maheshb@google.com>
On 12/03/14 at 01:46pm, Mahesh Bandewar wrote:
> The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
> until after ndo_uninit") tried to do this ealier but while doing so
> it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
> delayed call to fill_info(). So this translated into asking driver
> to remove private state and then query it's private state. This
> could have catastropic consequences.
>
[...]
>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
> Cc: David S. Miller <davem@davemloft.net>
LGTM, thanks!
Acked-by: Thomas Graf <tgraf@suug.ch>
^ permalink raw reply
* Re: [PATCH iproute2] ss: Use rtnl_dump_filter in handle_netlink_request
From: Vadim Kochan @ 2014-12-03 22:15 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20141203174506.GA12511@angus-think.wlc.globallogic.com>
I established some simple OpenVPN connection (at least by the logs and
ss I see that peer is connected), but
I dont get any errors with ss.
So can you please provide some more info how did you tested this patch
? I am surprised that this is caused only
by these changes ...
On Wed, Dec 3, 2014 at 7:45 PM, <vadim4j@gmail.com> wrote:
> On Wed, Dec 03, 2014 at 09:51:23AM -0800, Stephen Hemminger wrote:
>> On Wed, 3 Dec 2014 19:20:10 +0200
>> vadim4j@gmail.com wrote:
>>
>> > On Wed, Dec 03, 2014 at 09:21:29AM -0800, Stephen Hemminger wrote:
>> > > On Tue, 2 Dec 2014 16:53:04 +0200
>> > > Vadim Kochan <vadim4j@gmail.com> wrote:
>> > >
>> > > > Replaced handling netlink messages by rtnl_dump_filter
>> > > > from lib/libnetlink.c, also:
>> > > >
>> > > > - removed unused dump_fp arg;
>> > > > - added MAGIC_SEQ #define for 123456 seq id
>> > > >
>> > > > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
>> > >
>> > > This doesn't work correctly.
>> > >
>> > > Simple test
>> > > $ misc/ss >/dev/null
>> > > RTNETLINK answers: No such file or directory
>> > > RTNETLINK answers: No such file or directory
>> > > RTNETLINK answers: No such file or directory
>> >
>> > Just tried, I did not get such errors.
>>
>> I have OpenVPN running.
>
> Hm, it is reproduced only with this patch ?
> If so I will try to setup OpenVPN ... can't imagine how it can be
> related ...
^ permalink raw reply
* Re: [PATCH net-next v4] rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
From: Eric Dumazet @ 2014-12-03 22:07 UTC (permalink / raw)
To: Mahesh Bandewar
Cc: netdev, David Miller, Eric Dumazet, Roopa Prabhu, Toshiaki Makita
In-Reply-To: <1417643184-23440-1-git-send-email-maheshb@google.com>
On Wed, 2014-12-03 at 13:46 -0800, Mahesh Bandewar wrote:
> The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
> until after ndo_uninit") tried to do this ealier but while doing so
> it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
> delayed call to fill_info(). So this translated into asking driver
> to remove private state and then query it's private state. This
> could have catastropic consequences.
>
> This change breaks the rtmsg_ifinfo() into two parts - one takes the
> precise snapshot of the device by called fill_info() before calling
> the ndo_uninit() and the second part sends the notification using
> collected snapshot.
>
> It was brought to notice when last link is deleted from an ipvlan device
> when it has free-ed the port and the subsequent .fill_info() call is
> trying to get the info from the port.
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* [PATCH net] Update old iproute2 and Xen Remus links
From: Andrew Shewmaker @ 2014-12-03 22:07 UTC (permalink / raw)
To: linux-doc; +Cc: linux-kernel, netdev, corbet, jhs, davem, Andrew Shewmaker
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
---
Documentation/Changes | 2 +-
net/sched/Kconfig | 7 ++++---
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/Documentation/Changes b/Documentation/Changes
index 1de131b..74bdda9 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -383,7 +383,7 @@ o <http://www.iptables.org/downloads.html>
Ip-route2
---------
-o <ftp://ftp.tux.org/pub/net/ip-routing/iproute2-2.2.4-now-ss991023.tar.gz>
+o <https://www.kernel.org/pub/linux/utils/net/iproute2/>
OProfile
--------
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a1a8e29..d17053d 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -22,8 +22,9 @@ menuconfig NET_SCHED
This code is considered to be experimental.
To administer these schedulers, you'll need the user-level utilities
- from the package iproute2+tc at <ftp://ftp.tux.org/pub/net/ip-routing/>.
- That package also contains some documentation; for more, check out
+ from the package iproute2+tc at
+ <https://www.kernel.org/pub/linux/utils/net/iproute2/>. That package
+ also contains some documentation; for more, check out
<http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2>.
This Quality of Service (QoS) support will enable you to use
@@ -336,7 +337,7 @@ config NET_SCH_PLUG
of virtual machines by allowing the generated network output to be rolled
back if needed.
- For more information, please refer to http://wiki.xensource.com/xenwiki/Remus
+ For more information, please refer to <http://wiki.xenproject.org/wiki/Remus>
Say Y here if you are using this kernel for Xen dom0 and
want to protect Xen guests with Remus.
--
2.1.0
^ permalink raw reply related
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Thomas Graf @ 2014-12-03 22:02 UTC (permalink / raw)
To: Jesse Gross
Cc: Michael S. Tsirkin, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <CAEP_g=_Y2YQg0wDRJLXsNH6p3fOc5G0KSJb12x_b4OE238ophg@mail.gmail.com>
On 12/03/14 at 11:38am, Jesse Gross wrote:
> On Wed, Dec 3, 2014 at 10:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Both approaches seem strange. You are sending 1 packet an hour to
> > some destination behind 100 tunnels. Why would you want to
> > cut down your MTU for all packets? On the other hand,
> > doubling the amount of packets because your MTU is off
> > by a couple of bytes will hurt performance significantly.
> >
> > Still, if you want to cut down the MTU within guest,
> > that's only an ifconfig away.
> > Most people would not want to bother, I think it's a good
> > idea to make PMTU work properly for them.
>
> I care about correctness first, which means that an Ethernet link
> being exposed to the guest should behave like Ethernet. So, yes, IPX
> should work if somebody chooses to do that.
>
> Your comments are about performance optimization. That's fine but
> without a correct base to start from it seems like putting the cart
> before the horse and is hard to reason about.
I agree with Jesse in particular about correctnes but Michael has a
point (which I thing nobod objects to) which is that it may not always
make sense to force the MTU onto the guest. It clearly makes sense for
the edge server connected to an overlay but it may not be ideal if
WAN traffic is VXLAN encapped and local DC traffic is put onto a VLAN.
That said, I think it is fair to assume that the host knows what role
it plays and can be configured accordingly, i.e. a Netlink API which
exposes the encap overhead so libvirt can max() over it force it onto
the guest or something along those lines.
^ permalink raw reply
* Re: [PATCH] Documentation: bindings: net: DPAA corenet binding document
From: Scott Wood @ 2014-12-03 22:01 UTC (permalink / raw)
To: Bucur Madalin-Cristian-B32716
Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
netdev@vger.kernel.org, Medve Emilian-EMMEDVE1,
Liberman Igal-B31950, galak@codeaurora.org, Shaohui Xie
In-Reply-To: <1417561420.15957.221.camel@freescale.com>
On Tue, 2014-12-02 at 17:03 -0600, Scott Wood wrote:
> On Tue, 2014-12-02 at 06:12 -0600, Bucur Madalin-Cristian-B32716 wrote:
> > > -----Original Message-----
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, December 02, 2014 6:40 AM
> > >
> > > No need for the <SoC> part. As we previously discussed, the only
> > > purpose of this node is backwards compatibility with the U-Boot MAC
> > > address fixup -- if U-Boot doesn't look for the <SoC> version, then
> > > don't complicate things.
> > >
> > > Though, I can't find where U-Boot references this node. Are you sure
> > > it's not using the ethernet%d aliases like everything else, in which
> > > case why do we need this node at all?
> > >
> > > -Scott
> > >
> >
> > The initial (Freescale SDK) binding document contained those compatibles,
> > not sure what the initial intent was for the <SoC> variants.
> >
> > The "fsl,dpaa" node is of interest to the DPAA Ethernet because it is
> > the parent of the "fsl,dpa-ethernet" nodes.
>
> I'm not interested in what the SDK binding says, or what the SDK kernel
> does. I'm interested in whether there's a U-Boot compatibility issue,
> as was previously alleged. If there isn't, then there's no need for
> fsl,dpaa *or* fsl,dpa-ethernet.
OK, I found the U-Boot fixup in question. It's not for MAC addresses,
but for marking disabled ports as disabled. It marks the dpa-ethernet
node as disabled, based on it having an fsl,fman-mac property that
points to the MAC node.
U-Boot also disables the MAC node itself, so it doesn't matter if it
doesn't find fsl,fman-mac -- except for the special case of fm1-dtsec1,
which is always kept enabled because it's used for MDIO for all ports.
Based on http://patchwork.ozlabs.org/patch/410770/ there's a separate
node for mdio, so why can't we mark the MAC disabled? Assuming that
there's no real problem in marking the fm1-dtsec1 MAC node disabled, we
can consider this to be a bug in U-Boot which can be worked around by
having the fm1-dtsec1 mac node have an fsl,fman-mac property that points
to itself. This property would only go on the fm1-dtsec1 mac node and
would only be in device trees for SoCs that are supported by U-Boots old
enough to not have had the bug be fixed.
-Scott
^ permalink raw reply
* [PATCH net-next v4] rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
From: Mahesh Bandewar @ 2014-12-03 21:46 UTC (permalink / raw)
To: netdev
Cc: David Miller, Eric Dumazet, Roopa Prabhu, Toshiaki Makita,
Mahesh Bandewar
The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
until after ndo_uninit") tried to do this ealier but while doing so
it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
delayed call to fill_info(). So this translated into asking driver
to remove private state and then query it's private state. This
could have catastropic consequences.
This change breaks the rtmsg_ifinfo() into two parts - one takes the
precise snapshot of the device by called fill_info() before calling
the ndo_uninit() and the second part sends the notification using
collected snapshot.
It was brought to notice when last link is deleted from an ipvlan device
when it has free-ed the port and the subsequent .fill_info() call is
trying to get the info from the port.
kernel: [ 255.139429] ------------[ cut here ]------------
kernel: [ 255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
kernel: [ 255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
kernel: [ 255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
kernel: [ 255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
kernel: [ 255.139515] 0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
kernel: [ 255.139516] 0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
kernel: [ 255.139518] 00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
kernel: [ 255.139520] Call Trace:
kernel: [ 255.139527] [<ffffffff815d87f4>] dump_stack+0x46/0x58
kernel: [ 255.139531] [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
kernel: [ 255.139540] [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
kernel: [ 255.139544] [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
kernel: [ 255.139547] [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
kernel: [ 255.139549] [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
kernel: [ 255.139551] [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
kernel: [ 255.139553] [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
kernel: [ 255.139557] [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
kernel: [ 255.139558] [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
kernel: [ 255.139562] [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
kernel: [ 255.139563] [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
kernel: [ 255.139565] [<ffffffff8152c398>] netlink_unicast+0x178/0x230
kernel: [ 255.139567] [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
kernel: [ 255.139571] [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
kernel: [ 255.139575] [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
kernel: [ 255.139577] [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
kernel: [ 255.139578] [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
kernel: [ 255.139581] [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
kernel: [ 255.139585] [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
kernel: [ 255.139589] [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
kernel: [ 255.139597] [<ffffffff811e8336>] ? dput+0xb6/0x190
kernel: [ 255.139606] [<ffffffff811f05f6>] ? mntput+0x26/0x40
kernel: [ 255.139611] [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
kernel: [ 255.139613] [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
kernel: [ 255.139615] [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
kernel: [ 255.139617] [<ffffffff815df092>] system_call_fastpath+0x12/0x17
kernel: [ 255.139619] ---[ end trace 5e6703e87d984f6b ]---
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
---
v1:
Initial version
v2:
Keep the rtmsg_ifinfo() return type as it is but break the function into
two minimizing the changes all over places
v3:
Corrected an error in the code.
v4:
Removed EXPORT_SYMBOL() for both the new functions.
include/linux/rtnetlink.h | 5 +++++
net/core/dev.c | 12 +++++++++---
net/core/rtnetlink.c | 25 +++++++++++++++++++++----
3 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 3b0419072f88..5db76a32fcab 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -17,6 +17,11 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
u32 id, long expires, u32 error);
void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t flags);
+struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
+ unsigned change, gfp_t flags);
+void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
+ gfp_t flags);
+
/* RTNL is used as a global lock for all changes to network configuration */
extern void rtnl_lock(void);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0814a560e5f3..dd3bf582e6f0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5925,6 +5925,8 @@ static void rollback_registered_many(struct list_head *head)
synchronize_net();
list_for_each_entry(dev, head, unreg_list) {
+ struct sk_buff *skb = NULL;
+
/* Shutdown queueing discipline. */
dev_shutdown(dev);
@@ -5934,6 +5936,11 @@ static void rollback_registered_many(struct list_head *head)
*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+ if (!dev->rtnl_link_ops ||
+ dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
+ skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
+ GFP_KERNEL);
+
/*
* Flush the unicast and multicast chains
*/
@@ -5943,9 +5950,8 @@ static void rollback_registered_many(struct list_head *head)
if (dev->netdev_ops->ndo_uninit)
dev->netdev_ops->ndo_uninit(dev);
- if (!dev->rtnl_link_ops ||
- dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
- rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
+ if (skb)
+ rtmsg_ifinfo_send(skb, dev, GFP_KERNEL);
/* Notifier chain MUST detach us all upper devices. */
WARN_ON(netdev_has_any_upper_dev(dev));
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 61cb7e7cc3c7..a9be2c161702 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2245,8 +2245,8 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
return skb->len;
}
-void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
- gfp_t flags)
+struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
+ unsigned int change, gfp_t flags)
{
struct net *net = dev_net(dev);
struct sk_buff *skb;
@@ -2264,11 +2264,28 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
kfree_skb(skb);
goto errout;
}
- rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
- return;
+ return skb;
errout:
if (err < 0)
rtnl_set_sk_err(net, RTNLGRP_LINK, err);
+ return NULL;
+}
+
+void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags)
+{
+ struct net *net = dev_net(dev);
+
+ rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
+}
+
+void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
+ gfp_t flags)
+{
+ struct sk_buff *skb;
+
+ skb = rtmsg_ifinfo_build_skb(type, dev, change, flags);
+ if (skb)
+ rtmsg_ifinfo_send(skb, dev, flags);
}
EXPORT_SYMBOL(rtmsg_ifinfo);
--
2.2.0.rc0.207.ga3a616c
^ permalink raw reply related
* Re: [PATCHv11 net-next 1/2] openvswitch: Refactor ovs_nla_fill_match().
From: Joe Stringer @ 2014-12-03 21:42 UTC (permalink / raw)
To: Pravin Shelar; +Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, netdev, LKML
In-Reply-To: <CALnjE+pNbwtGPsHfTJ0FXxdSpXT=AKgtfHPqT5matCx+PGcXbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 3 December 2014 at 11:37, Pravin Shelar <pshelar@nicira.com> wrote:
> On Tue, Dec 2, 2014 at 6:56 PM, Joe Stringer <joestringer@nicira.com> wrote:
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index 332b5a0..b2a3796 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -462,10 +462,8 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>> 0, upcall_info->cmd);
>> upcall->dp_ifindex = dp_ifindex;
>>
>> - nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY);
>> - err = ovs_nla_put_flow(key, key, user_skb);
>> + err = ovs_nla_put_flow(key, key, OVS_PACKET_ATTR_KEY, false, user_skb);
>
> We need different name here, since it does not operate on flow. maybe
> __ovs_nla_put_key(). we can move the function definition to
> flow_netlink.h
OK sure. I'll fix this up for the next version.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox