* RE: [PATCH net-next 7/8] net: Add calaulation of non folded IPV6 pseudo header checksum
From: David Laight @ 2014-10-30 17:09 UTC (permalink / raw)
To: 'Or Gerlitz'
Cc: David S. Miller, netdev@vger.kernel.org, Matan Barak, Amir Vadai,
Saeed Mahameed, Shani Michaeli
In-Reply-To: <54526D3C.2040604@mellanox.com>
> OK, talking to Matan, hecame up with this super-quick (compile tested
> only) alternative, is
> this what you were advocating for?
> {
> + sum = csum_partial(saddr, sizeof(saddr->in6_u.u6_addr32), sum);
> + sum = csum_partial(saddr, sizeof(daddr->in6_u.u6_addr32), sum);
I'm pretty sure your 'saddr' and 'daddr' are adjacent.
Whether you can prove that is another matter.
> + sum = csum_add(sum, (__force __wsum)htonl(len));
> + sum = csum_add(sum, (__force __wsum)htons(proto));
> +
> + return sum;
> +}
On 64 bit systems you probably want to end up with something akin to:
__u64 a, b, c, d;
a = saddr[0] + saddr[1];
b = saddr[2] + saddr[3];
c = daddr[0] + daddr[1];
d = daddr[2] + daddr[3];
a += b;
c += d;
a += c;
and then collapse down the 64bit value.
However if you write the above in proper C, gcc will probably
convert it to long dependence chain.
An architecture specific csum_partial() probably manages to DTRT.
David
^ permalink raw reply
* Re: [PATCH net-next 4/8] net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages
From: Eric Dumazet @ 2014-10-30 17:15 UTC (permalink / raw)
To: Or Gerlitz
Cc: David S. Miller, netdev, Matan Barak, Amir Vadai, Saeed Mahameed,
Shani Michaeli, Ido Shamay
In-Reply-To: <1414685216-28907-5-git-send-email-ogerlitz@mellanox.com>
On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:
> From: Ido Shamay <idos@mellanox.com>
>
> Needed in order to get cache cold pages (L3 flushed) for HW scatter.
>
> Otherwise memory may flush those entries when the packet comes from
> PCI, causing back pressure resulting in BW decrease.
>
> Signed-off-by: Ido Shamay <idos@mellanox.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 4cb716f..9d616a8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -54,7 +54,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
> dma_addr_t dma;
>
> for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
> - gfp_t gfp = _gfp;
> + gfp_t gfp = _gfp | __GFP_COLD;
This should be set by callers, to avoid extra code
GFP_ATOMIC | __GFP_COLD
or
GFP_KERNEL | __GFP_COLD
>
> if (order)
> gfp |= __GFP_COMP | __GFP_NOWARN;
^ permalink raw reply
* Re: [PATCH net-next] tcp: Correction to RFC number in comment
From: Eric Dumazet @ 2014-10-30 17:19 UTC (permalink / raw)
To: Sowmini Varadhan; +Cc: davem, netdev
In-Reply-To: <20141030164808.GH650@oracle.com>
On Thu, 2014-10-30 at 12:48 -0400, Sowmini Varadhan wrote:
> Challenge ACK is described in RFC 5961, fix typo.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
> net/ipv4/tcp_input.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index a12b455..d285962 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5028,7 +5028,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> /* step 3: check security and precedence [ignored] */
>
> /* step 4: Check for a SYN
> - * RFC 5691 4.2 : Send a challenge ack
> + * RFC 5961 4.2 : Send a challenge ack
> */
> if (th->syn) {
> syn_challenge:
Acked-by: Eric Dumazet <edumazet@google.com>
Thanks !
^ permalink raw reply
* Re: [PATCH iproute2] ss: Use generic handle_netlink_request for packet
From: vadim4j @ 2014-10-30 17:11 UTC (permalink / raw)
To: netdev
In-Reply-To: <1413470432-28588-1-git-send-email-vadim4j@gmail.com>
On Thu, Oct 16, 2014 at 05:40:32PM +0300, Vadim Kochan wrote:
> Get rid of self-handling and creating of Netlink socket for show packet
> socket stats.
>
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> ---
> misc/ss.c | 80 +++------------------------------------------------------------
> 1 file changed, 3 insertions(+), 77 deletions(-)
>
> diff --git a/misc/ss.c b/misc/ss.c
> index 2420b51..b3cc455 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -2777,17 +2777,11 @@ static int packet_show_sock(struct nlmsghdr *nlh, struct filter *f)
>
> static int packet_show_netlink(struct filter *f, FILE *dump_fp)
> {
> - int fd;
> struct {
> struct nlmsghdr nlh;
> struct packet_diag_req r;
> - } req;
> - char buf[8192];
> -
> - if ((fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_INET_DIAG)) < 0)
> - return -1;
> + } req = {};
>
> - memset(&req, 0, sizeof(req));
> req.nlh.nlmsg_len = sizeof(req);
> req.nlh.nlmsg_type = SOCK_DIAG_BY_FAMILY;
> req.nlh.nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
> @@ -2796,76 +2790,8 @@ static int packet_show_netlink(struct filter *f, FILE *dump_fp)
> req.r.sdiag_family = AF_PACKET;
> req.r.pdiag_show = PACKET_SHOW_INFO | PACKET_SHOW_MEMINFO | PACKET_SHOW_FILTER;
>
> - if (send(fd, &req, sizeof(req), 0) < 0) {
> - close(fd);
> - return -1;
> - }
> -
> - while (1) {
> - ssize_t status;
> - struct nlmsghdr *h;
> - struct sockaddr_nl nladdr;
> - socklen_t slen = sizeof(nladdr);
> -
> - status = recvfrom(fd, buf, sizeof(buf), 0,
> - (struct sockaddr *) &nladdr, &slen);
> - if (status < 0) {
> - if (errno == EINTR)
> - continue;
> - perror("OVERRUN");
> - continue;
> - }
> - if (status == 0) {
> - fprintf(stderr, "EOF on netlink\n");
> - goto close_it;
> - }
> -
> - if (dump_fp)
> - fwrite(buf, 1, NLMSG_ALIGN(status), dump_fp);
> -
> - h = (struct nlmsghdr*)buf;
> - while (NLMSG_OK(h, status)) {
> - int err;
> -
> - if (h->nlmsg_seq != 123456)
> - goto skip_it;
> -
> - if (h->nlmsg_type == NLMSG_DONE)
> - goto close_it;
> -
> - if (h->nlmsg_type == NLMSG_ERROR) {
> - struct nlmsgerr *err = (struct nlmsgerr*)NLMSG_DATA(h);
> - if (h->nlmsg_len < NLMSG_LENGTH(sizeof(struct nlmsgerr))) {
> - fprintf(stderr, "ERROR truncated\n");
> - } else {
> - errno = -err->error;
> - if (errno != ENOENT)
> - fprintf(stderr, "UDIAG answers %d\n", errno);
> - }
> - close(fd);
> - return -1;
> - }
> - if (!dump_fp) {
> - err = packet_show_sock(h, f);
> - if (err < 0) {
> - close(fd);
> - return err;
> - }
> - }
> -
> -skip_it:
> - h = NLMSG_NEXT(h, status);
> - }
> -
> - if (status) {
> - fprintf(stderr, "!!!Remnant of size %zd\n", status);
> - exit(1);
> - }
> - }
> -
> -close_it:
> - close(fd);
> - return 0;
> + return handle_netlink_request(f, dump_fp, &req.nlh, sizeof(req),
> + packet_show_sock);
> }
>
>
> --
> 2.1.0
>
This patch conflicts with master branch, will re-send v2 later.
Regards,
^ permalink raw reply
* [PATCH] net: skb_fclone_busy() needs to detect orphaned skb
From: Eric Dumazet @ 2014-10-30 17:32 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Neal Cardwell
From: Eric Dumazet <edumazet@google.com>
Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()
Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
---
This is a stable candidate.
This problem is known to hurt users of linux-3.16 kernels used by guests kernels.
David, I can provide backports if you want.
Thanks !
include/linux/skbuff.h | 8 ++++++--
net/ipv4/tcp_output.c | 2 +-
net/xfrm/xfrm_policy.c | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5884f95ff0e9..6c8b6f604e76 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -799,15 +799,19 @@ struct sk_buff_fclones {
* @skb: buffer
*
* Returns true is skb is a fast clone, and its clone is not freed.
+ * Some drivers call skb_orphan() in their ndo_start_xmit(),
+ * so we also check that this didnt happen.
*/
-static inline bool skb_fclone_busy(const struct sk_buff *skb)
+static inline bool skb_fclone_busy(const struct sock *sk,
+ const struct sk_buff *skb)
{
const struct sk_buff_fclones *fclones;
fclones = container_of(skb, struct sk_buff_fclones, skb1);
return skb->fclone == SKB_FCLONE_ORIG &&
- fclones->skb2.fclone == SKB_FCLONE_CLONE;
+ fclones->skb2.fclone == SKB_FCLONE_CLONE &&
+ fclones->skb2.sk == sk;
}
static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3af21296d967..a3d453b94747 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2126,7 +2126,7 @@ bool tcp_schedule_loss_probe(struct sock *sk)
static bool skb_still_in_host_queue(const struct sock *sk,
const struct sk_buff *skb)
{
- if (unlikely(skb_fclone_busy(skb))) {
+ if (unlikely(skb_fclone_busy(sk, skb))) {
NET_INC_STATS_BH(sock_net(sk),
LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
return true;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4c4e457e7888..88bf289abdc9 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1962,7 +1962,7 @@ static int xdst_queue_output(struct sock *sk, struct sk_buff *skb)
struct xfrm_policy *pol = xdst->pols[0];
struct xfrm_policy_queue *pq = &pol->polq;
- if (unlikely(skb_fclone_busy(skb))) {
+ if (unlikely(skb_fclone_busy(sk, skb))) {
kfree_skb(skb);
return 0;
}
^ permalink raw reply related
* Re: [PATCH net-next 8/8] net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
From: Eric Dumazet @ 2014-10-30 18:22 UTC (permalink / raw)
To: Or Gerlitz, Tom Herbert
Cc: David S. Miller, netdev, Matan Barak, Amir Vadai, Saeed Mahameed,
Shani Michaeli, Jerry Chu
In-Reply-To: <1414685216-28907-9-git-send-email-ogerlitz@mellanox.com>
On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:
> From: Shani Michaeli <shanim@mellanox.com>
>
> When processing received traffic, pass CHECKSUM_COMPLETE status to the
> stack, with calculated checksum for non TCP/UDP packets (such
> as GRE or ICMP).
>
> Although the stack expects checksum which doesn't include the pseudo
> header, the HW adds it. To address that, we are subtracting the pseudo
> header checksum from the checksum value provided by the HW.
>
> In the IPv6 case, we also compute/add the IP header checksum which
> is not added by the HW for such packets.
>
> Cc: Jerry Chu <hkchu@google.com>
> Signed-off-by: Shani Michaeli <shanim@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
Awesome !
^ permalink raw reply
* [PATCH v2 net 0/2] drivers/net,ipv6: Fix IPv6 fragment ID selection for virtio
From: Ben Hutchings @ 2014-10-30 18:26 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, virtualization
[-- Attachment #1: Type: text/plain, Size: 834 bytes --]
The virtio net protocol supports UFO but does not provide for passing a
fragment ID for fragmentation of IPv6 packets. We used to generate a
fragment ID wherever such a packet was fragmented, but currently we
always use ID=0!
v2: Add blank lines after declarations
Ben.
Ben Hutchings (2):
drivers/net: Disable UFO through virtio
drivers/net,ipv6: Select IPv6 fragment idents for virtio UFO packets
drivers/net/macvtap.c | 16 ++++++++--------
drivers/net/tun.c | 25 ++++++++++++++++---------
drivers/net/virtio_net.c | 24 ++++++++++++++----------
include/net/ipv6.h | 2 ++
net/ipv6/output_core.c | 34 ++++++++++++++++++++++++++++++++++
5 files changed, 74 insertions(+), 27 deletions(-)
--
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* [PATCH v2 net 1/2] drivers/net: Disable UFO through virtio
From: Ben Hutchings @ 2014-10-30 18:27 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, virtualization
In-Reply-To: <1414693592.16849.61.camel@decadent.org.uk>
[-- Attachment #1.1: Type: text/plain, Size: 8295 bytes --]
IPv6 does not allow fragmentation by routers, so there is no
fragmentation ID in the fixed header. UFO for IPv6 requires the ID to
be passed separately, but there is no provision for this in the virtio
net protocol.
Until recently our software implementation of UFO/IPv6 generated a new
ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet
passed through a tap, which is even worse.
Unfortunately there is no distinction between UFO/IPv4 and v6
features, so disable UFO on taps and virtio_net completely until we
have a proper solution.
We cannot depend on VM managers respecting the tap feature flags, so
keep accepting UFO packets but log a warning the first time we do
this.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
---
drivers/net/macvtap.c | 13 +++++--------
drivers/net/tun.c | 19 +++++++++++--------
drivers/net/virtio_net.c | 24 ++++++++++++++----------
3 files changed, 30 insertions(+), 26 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 65e2892..2aeaa61 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -65,7 +65,7 @@ static struct cdev macvtap_cdev;
static const struct proto_ops macvtap_socket_ops;
#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
- NETIF_F_TSO6 | NETIF_F_UFO)
+ NETIF_F_TSO6)
#define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
@@ -569,6 +569,8 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
+ current->comm);
gso_type = SKB_GSO_UDP;
break;
default:
@@ -614,8 +616,6 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
- else if (sinfo->gso_type & SKB_GSO_UDP)
- vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (sinfo->gso_type & SKB_GSO_TCP_ECN)
@@ -950,9 +950,6 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
if (arg & TUN_F_TSO6)
feature_mask |= NETIF_F_TSO6;
}
-
- if (arg & TUN_F_UFO)
- feature_mask |= NETIF_F_UFO;
}
/* tun/tap driver inverts the usage for TSO offloads, where
@@ -963,7 +960,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
* When user space turns off TSO, we turn off GSO/LRO so that
* user-space will not receive TSO frames.
*/
- if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
+ if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
features |= RX_OFFLOADS;
else
features &= ~RX_OFFLOADS;
@@ -1064,7 +1061,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
case TUNSETOFFLOAD:
/* let the user check for future flags */
if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
- TUN_F_TSO_ECN | TUN_F_UFO))
+ TUN_F_TSO_ECN))
return -EINVAL;
rtnl_lock();
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 186ce54..280d3d2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -174,7 +174,7 @@ struct tun_struct {
struct net_device *dev;
netdev_features_t set_features;
#define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
- NETIF_F_TSO6|NETIF_F_UFO)
+ NETIF_F_TSO6)
int vnet_hdr_sz;
int sndbuf;
@@ -1149,8 +1149,18 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ {
+ static bool warned;
+
+ if (!warned) {
+ warned = true;
+ netdev_warn(tun->dev,
+ "%s: using disabled UFO feature; please fix this program\n",
+ current->comm);
+ }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
break;
+ }
default:
tun->dev->stats.rx_frame_errors++;
kfree_skb(skb);
@@ -1251,8 +1261,6 @@ static ssize_t tun_put_user(struct tun_struct *tun,
gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
- else if (sinfo->gso_type & SKB_GSO_UDP)
- gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
else {
pr_err("unexpected GSO type: "
"0x%x, gso_size %d, hdr_len %d\n",
@@ -1762,11 +1770,6 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
features |= NETIF_F_TSO6;
arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
}
-
- if (arg & TUN_F_UFO) {
- features |= NETIF_F_UFO;
- arg &= ~TUN_F_UFO;
- }
}
/* This gives the user a way to test for new features in future by
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d75256bd..ec2a8b4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -491,8 +491,17 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ {
+ static bool warned;
+
+ if (!warned) {
+ warned = true;
+ netdev_warn(dev,
+ "host using disabled UFO feature; please fix it\n");
+ }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
break;
+ }
case VIRTIO_NET_HDR_GSO_TCPV6:
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
@@ -881,8 +890,6 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
- else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
- hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
@@ -1705,7 +1712,7 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
- dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
+ dev->hw_features |= NETIF_F_TSO
| NETIF_F_TSO_ECN | NETIF_F_TSO6;
}
/* Individual feature bits: what can host handle? */
@@ -1715,11 +1722,9 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->hw_features |= NETIF_F_TSO6;
if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
dev->hw_features |= NETIF_F_TSO_ECN;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
- dev->hw_features |= NETIF_F_UFO;
if (gso)
- dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
+ dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
/* (!csum && gso) case will be fixed by register_netdev() */
}
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
@@ -1757,8 +1762,7 @@ static int virtnet_probe(struct virtio_device *vdev)
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
- virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
- virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
+ virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
vi->big_packets = true;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
@@ -1952,9 +1956,9 @@ static struct virtio_device_id id_table[] = {
static unsigned int features[] = {
VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
- VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
+ VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
+ VIRTIO_NET_F_GUEST_ECN,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
--
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* [PATCH v2 net 2/2] drivers/net,ipv6: Select IPv6 fragment idents for virtio UFO packets
From: Ben Hutchings @ 2014-10-30 18:27 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, virtualization
In-Reply-To: <1414693592.16849.61.camel@decadent.org.uk>
[-- Attachment #1: Type: text/plain, Size: 4608 bytes --]
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
---
drivers/net/macvtap.c | 3 +++
drivers/net/tun.c | 6 +++++-
include/net/ipv6.h | 2 ++
net/ipv6/output_core.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 2aeaa61..6f226de 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -16,6 +16,7 @@
#include <linux/idr.h>
#include <linux/fs.h>
+#include <net/ipv6.h>
#include <net/net_namespace.h>
#include <net/rtnetlink.h>
#include <net/sock.h>
@@ -572,6 +573,8 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
current->comm);
gso_type = SKB_GSO_UDP;
+ if (skb->protocol == htons(ETH_P_IPV6))
+ ipv6_proxy_select_ident(skb);
break;
default:
return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 280d3d2..7302398 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -65,6 +65,7 @@
#include <linux/nsproxy.h>
#include <linux/virtio_net.h>
#include <linux/rcupdate.h>
+#include <net/ipv6.h>
#include <net/net_namespace.h>
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
@@ -1139,6 +1140,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
break;
}
+ skb_reset_network_header(skb);
+
if (gso.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
pr_debug("GSO!\n");
switch (gso.gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -1159,6 +1162,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
current->comm);
}
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+ if (skb->protocol == htons(ETH_P_IPV6))
+ ipv6_proxy_select_ident(skb);
break;
}
default:
@@ -1189,7 +1194,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
}
- skb_reset_network_header(skb);
skb_probe_transport_header(skb, 0);
rxhash = skb_get_hash(skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 97f4720..4292929 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -671,6 +671,8 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
}
+void ipv6_proxy_select_ident(struct sk_buff *skb);
+
int ip6_dst_hoplimit(struct dst_entry *dst);
static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo *np, struct flowi6 *fl6,
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index fc24c39..97f41a3 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -3,11 +3,45 @@
* not configured or static. These functions are needed by GSO/GRO implementation.
*/
#include <linux/export.h>
+#include <net/ip.h>
#include <net/ipv6.h>
#include <net/ip6_fib.h>
#include <net/addrconf.h>
#include <net/secure_seq.h>
+/* This function exists only for tap drivers that must support broken
+ * clients requesting UFO without specifying an IPv6 fragment ID.
+ *
+ * This is similar to ipv6_select_ident() but we use an independent hash
+ * seed to limit information leakage.
+ *
+ * The network header must be set before calling this.
+ */
+void ipv6_proxy_select_ident(struct sk_buff *skb)
+{
+ static u32 ip6_proxy_idents_hashrnd __read_mostly;
+ struct in6_addr buf[2];
+ struct in6_addr *addrs;
+ u32 hash, id;
+
+ addrs = skb_header_pointer(skb,
+ skb_network_offset(skb) +
+ offsetof(struct ipv6hdr, saddr),
+ sizeof(buf), buf);
+ if (!addrs)
+ return;
+
+ net_get_random_once(&ip6_proxy_idents_hashrnd,
+ sizeof(ip6_proxy_idents_hashrnd));
+
+ hash = __ipv6_addr_jhash(&addrs[1], ip6_proxy_idents_hashrnd);
+ hash = __ipv6_addr_jhash(&addrs[0], hash);
+
+ id = ip_idents_reserve(hash, 1);
+ skb_shinfo(skb)->ip6_frag_id = htonl(id);
+}
+EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident);
+
int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
{
u16 offset = sizeof(struct ipv6hdr);
--
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply related
* Re: [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
From: Eric W. Biederman @ 2014-10-30 18:35 UTC (permalink / raw)
To: Nicolas Dichtel
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, luto-kltTT9wpgjJwATOyAt5JVQ,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
cwang-xCSkyg8dI+0RB7SZvlqPiA, linux-api-u79uwXL29TY76Z2rM5mHXA,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1414682728-4532-2-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> writes:
> With this patch, a user can define an id for a peer netns by providing a FD or a
> PID. These ids are local to netns (ie valid only into one netns).
Scratches head. Do you actually find value in using the pid instead of
a file descriptor?
Doing things by pid was an early attempt to make things work, and has
been a bit clutsy. If you don't find value in it I would recommend just
supporting getting/setting the network namespace by file descriptor.
Eric
> This will be useful for netlink messages when a x-netns interface is dumped.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
> ---
> MAINTAINERS | 1 +
> include/net/net_namespace.h | 5 ++
> include/uapi/linux/Kbuild | 1 +
> include/uapi/linux/netns.h | 38 +++++++++
> net/core/net_namespace.c | 195 ++++++++++++++++++++++++++++++++++++++++++++
> net/netlink/genetlink.c | 4 +
> 6 files changed, 244 insertions(+)
> create mode 100644 include/uapi/linux/netns.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 43898b1a8a2d..de7e6fcbd5c2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6382,6 +6382,7 @@ F: include/linux/netdevice.h
> F: include/uapi/linux/in.h
> F: include/uapi/linux/net.h
> F: include/uapi/linux/netdevice.h
> +F: include/uapi/linux/netns.h
> F: tools/net/
> F: tools/testing/selftests/net/
> F: lib/random32.c
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index e0d64667a4b3..0f1367a71b81 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -59,6 +59,7 @@ struct net {
> struct list_head exit_list; /* Use only net_mutex */
>
> struct user_namespace *user_ns; /* Owning user namespace */
> + struct idr netns_ids;
>
> unsigned int proc_inum;
>
> @@ -289,6 +290,10 @@ static inline struct net *read_pnet(struct net * const *pnet)
> #define __net_initconst __initconst
> #endif
>
> +int peernet2id(struct net *net, struct net *peer);
> +struct net *get_net_ns_by_id(struct net *net, int id);
> +int netns_genl_register(void);
> +
> struct pernet_operations {
> struct list_head list;
> int (*init)(struct net *net);
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index 6cad97485bad..d7f49c69585a 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -277,6 +277,7 @@ header-y += netfilter_decnet.h
> header-y += netfilter_ipv4.h
> header-y += netfilter_ipv6.h
> header-y += netlink.h
> +header-y += netns.h
> header-y += netrom.h
> header-y += nfc.h
> header-y += nfs.h
> diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
> new file mode 100644
> index 000000000000..2edf129377de
> --- /dev/null
> +++ b/include/uapi/linux/netns.h
> @@ -0,0 +1,38 @@
> +/* Copyright (c) 2014 6WIND S.A.
> + * Author: Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + */
> +#ifndef _UAPI_LINUX_NETNS_H_
> +#define _UAPI_LINUX_NETNS_H_
> +
> +/* Generic netlink messages */
> +
> +#define NETNS_GENL_NAME "netns"
> +#define NETNS_GENL_VERSION 0x1
> +
> +/* Commands */
> +enum {
> + NETNS_CMD_UNSPEC,
> + NETNS_CMD_NEWID,
> + NETNS_CMD_GETID,
> + __NETNS_CMD_MAX,
> +};
> +
> +#define NETNS_CMD_MAX (__NETNS_CMD_MAX - 1)
> +
> +/* Attributes */
> +enum {
> + NETNSA_NONE,
> +#define NETNSA_NSINDEX_UNKNOWN -1
> + NETNSA_NSID,
> + NETNSA_PID,
> + NETNSA_FD,
> + __NETNSA_MAX,
> +};
> +
> +#define NETNSA_MAX (__NETNSA_MAX - 1)
> +
> +#endif /* _UAPI_LINUX_NETNS_H_ */
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 7f155175bba8..4a5680ed42fb 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -15,6 +15,8 @@
> #include <linux/file.h>
> #include <linux/export.h>
> #include <linux/user_namespace.h>
> +#include <linux/netns.h>
> +#include <net/genetlink.h>
> #include <net/net_namespace.h>
> #include <net/netns/generic.h>
>
> @@ -144,6 +146,50 @@ static void ops_free_list(const struct pernet_operations *ops,
> }
> }
>
> +/* This function is used by idr_for_each(). If net is equal to peer, the
> + * function returns the id so that idr_for_each() stops. Because we cannot
> + * returns the id 0 (idr_for_each() will not stop), we return the magic value
> + * -1 for it.
> + */
> +static int net_eq_idr(int id, void *net, void *peer)
> +{
> + if (net_eq(net, peer))
> + return id ? : -1;
> + return 0;
> +}
> +
> +/* returns NETNSA_NSINDEX_UNKNOWN if not found */
> +int peernet2id(struct net *net, struct net *peer)
> +{
> + int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
> +
> + ASSERT_RTNL();
> +
> + /* Magic value for id 0. */
> + if (id == -1)
> + return 0;
> + if (id == 0)
> + return NETNSA_NSINDEX_UNKNOWN;
> +
> + return id;
> +}
> +
> +struct net *get_net_ns_by_id(struct net *net, int id)
> +{
> + struct net *peer;
> +
> + if (id < 0)
> + return NULL;
> +
> + rcu_read_lock();
> + peer = idr_find(&net->netns_ids, id);
> + if (peer)
> + get_net(peer);
> + rcu_read_unlock();
> +
> + return peer;
> +}
> +
> /*
> * setup_net runs the initializers for the network namespace object.
> */
> @@ -158,6 +204,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
> atomic_set(&net->passive, 1);
> net->dev_base_seq = 1;
> net->user_ns = user_ns;
> + idr_init(&net->netns_ids);
>
> #ifdef NETNS_REFCNT_DEBUG
> atomic_set(&net->use_count, 0);
> @@ -288,6 +335,14 @@ static void cleanup_net(struct work_struct *work)
> list_for_each_entry(net, &net_kill_list, cleanup_list) {
> list_del_rcu(&net->list);
> list_add_tail(&net->exit_list, &net_exit_list);
> + for_each_net(tmp) {
> + int id = peernet2id(tmp, net);
> +
> + if (id >= 0)
> + idr_remove(&tmp->netns_ids, id);
> + }
> + idr_destroy(&net->netns_ids);
> +
> }
> rtnl_unlock();
>
> @@ -399,6 +454,146 @@ static struct pernet_operations __net_initdata net_ns_ops = {
> .exit = net_ns_net_exit,
> };
>
> +static struct genl_family netns_genl_family = {
> + .id = GENL_ID_GENERATE,
> + .name = NETNS_GENL_NAME,
> + .version = NETNS_GENL_VERSION,
> + .hdrsize = 0,
> + .maxattr = NETNSA_MAX,
> + .netnsok = true,
> +};
> +
> +static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
> + [NETNSA_NONE] = { .type = NLA_UNSPEC },
> + [NETNSA_NSID] = { .type = NLA_S32 },
> + [NETNSA_PID] = { .type = NLA_U32 },
> + [NETNSA_FD] = { .type = NLA_U32 },
> +};
> +
> +static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
> +{
> + struct net *net = genl_info_net(info);
> + struct net *peer;
> + int nsid, err;
> +
> + if (!info->attrs[NETNSA_NSID])
> + return -EINVAL;
> + nsid = nla_get_s32(info->attrs[NETNSA_NSID]);
> + if (nsid < 0)
> + return -EINVAL;
> +
> + if (info->attrs[NETNSA_PID])
> + peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
> + else if (info->attrs[NETNSA_FD])
> + peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
> + else
> + return -EINVAL;
> + if (IS_ERR(peer))
> + return PTR_ERR(peer);
> +
> + rtnl_lock();
> + if (peernet2id(net, peer) >= 0) {
> + err = -EEXIST;
> + goto out;
> + }
> +
> + err = idr_alloc(&net->netns_ids, peer, nsid, nsid + 1, GFP_KERNEL);
> + if (err >= 0)
> + err = 0;
> +out:
> + rtnl_unlock();
> + put_net(peer);
> + return err;
> +}
> +
> +static int netns_nl_get_size(void)
> +{
> + return nla_total_size(sizeof(s32)) /* NETNSA_NSID */
> + ;
> +}
> +
> +static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
> + int cmd, struct net *net, struct net *peer)
> +{
> + void *hdr;
> + int id;
> +
> + hdr = genlmsg_put(skb, portid, seq, &netns_genl_family, flags, cmd);
> + if (!hdr)
> + return -EMSGSIZE;
> +
> + rtnl_lock();
> + id = peernet2id(net, peer);
> + rtnl_unlock();
> + if (nla_put_s32(skb, NETNSA_NSID, id))
> + goto nla_put_failure;
> +
> + return genlmsg_end(skb, hdr);
> +
> +nla_put_failure:
> + genlmsg_cancel(skb, hdr);
> + return -EMSGSIZE;
> +}
> +
> +static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
> +{
> + struct net *net = genl_info_net(info);
> + struct sk_buff *msg;
> + int err = -ENOBUFS;
> + struct net *peer;
> +
> + if (info->attrs[NETNSA_PID])
> + peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
> + else if (info->attrs[NETNSA_FD])
> + peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
> + else
> + return -EINVAL;
> +
> + if (IS_ERR(peer))
> + return PTR_ERR(peer);
> +
> + msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
> + if (!msg) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
> + NLM_F_ACK, NETNS_CMD_GETID, net, peer);
> + if (err < 0)
> + goto err_out;
> +
> + err = genlmsg_unicast(net, msg, info->snd_portid);
> + goto out;
> +
> +err_out:
> + nlmsg_free(msg);
> +out:
> + put_net(peer);
> + return err;
> +}
> +
> +static struct genl_ops netns_genl_ops[] = {
> + {
> + .cmd = NETNS_CMD_NEWID,
> + .policy = netns_nl_policy,
> + .doit = netns_nl_cmd_newid,
> + .flags = GENL_ADMIN_PERM,
> + },
> + {
> + .cmd = NETNS_CMD_GETID,
> + .policy = netns_nl_policy,
> + .doit = netns_nl_cmd_getid,
> + .flags = GENL_ADMIN_PERM,
> + },
> +};
> +
> +int netns_genl_register(void)
> +{
> + return genl_register_family_with_ops(&netns_genl_family,
> + netns_genl_ops);
> +}
> +
> static int __init net_ns_init(void)
> {
> struct net_generic *ng;
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 76393f2f4b22..c6f39e40c9f3 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -1029,6 +1029,10 @@ static int __init genl_init(void)
> if (err)
> goto problem;
>
> + err = netns_genl_register();
> + if (err < 0)
> + goto problem;
> +
> return 0;
>
> problem:
^ permalink raw reply
* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
From: Eric W. Biederman @ 2014-10-30 18:41 UTC (permalink / raw)
To: Nicolas Dichtel
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, luto-kltTT9wpgjJwATOyAt5JVQ,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
cwang-xCSkyg8dI+0RB7SZvlqPiA, linux-api-u79uwXL29TY76Z2rM5mHXA,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1414682728-4532-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> writes:
> The goal of this serie is to be able to multicast netlink messages with an
> attribute that identify a peer netns.
> This is needed by the userland to interpret some informations contained in
> netlink messages (like IFLA_LINK value, but also some other attributes in case
> of x-netns netdevice (see also
> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>
> Ids of peer netns are set by userland via a new genl messages. These ids are
> stored per netns and are local (ie only valid in the netns where they are set).
> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
> the id of a peer netns. Note that it will be possible to add a table (struct net
> -> id) later to optimize this lookup if needed.
>
> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
> messages. And patch 4/4 shows that the netlink messages can be symetric between
> a GET and a SET.
>
> iproute2 patches are available, I can send them on demand.
A quick reply. I think this patchset is in the right general direction.
There are some oddball details that seem odd/awkward to me such as using
genetlink instead of rtnetlink to get and set the ids, and not having
ids if they are not set (that feels like a maintenance/usability challenge).
I would like to give your patches a deep review, but I won't be able to
do that for a couple of weeks. I am deep in the process of moving,
and will be mostly offline until about the Nov 11th.
Eric
> Here is a small screenshot to show how it can be used by userland.
>
> First, setup netns and required ids:
> $ ip netns add foo
> $ ip netns del foo
> $ ip netns
> $ touch /var/run/netns/init_net
> $ mount --bind /proc/1/ns/net /var/run/netns/init_net
> $ ip netns add foo
> $ ip netns exec foo ip netns set init_net 0
> $ ip netns
> foo
> init_net
> $ ip netns exec foo ip netns
> foo
> init_net (id: 0)
>
> Now, add and display an ipip tunnel, with its link part in init_net (id 0 in
> netns foo) and the netdevice in foo:
> $ ip netns exec foo ip link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
> $ ip netns exec foo ip l ls ipip1
> 6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default
> link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0
>
> The parameter link-netnsid shows us where the interface sends and receives
> packets (and thus we know where encapsulated addresses are set).
>
> RFCv3 -> v4:
> rebase on net-next
> add copyright text in the new netns.h file
>
> RFCv2 -> RFCv3:
> ids are now defined by userland (via netlink). Ids are stored in each netns
> (and they are local to this netns).
> add get_link_net support for ip6 tunnels
> netnsid is now a s32 instead of a u32
>
> RFCv1 -> RFCv2:
> remove useless ()
> ids are now stored in the user ns. It's possible to get an id for a peer netns
> only if the current netns and the peer netns have the same user ns parent.
>
> MAINTAINERS | 1 +
> include/net/ip6_tunnel.h | 1 +
> include/net/ip_tunnels.h | 1 +
> include/net/net_namespace.h | 5 ++
> include/net/rtnetlink.h | 2 +
> include/uapi/linux/Kbuild | 1 +
> include/uapi/linux/if_link.h | 1 +
> include/uapi/linux/netns.h | 38 +++++++++
> net/core/net_namespace.c | 195 +++++++++++++++++++++++++++++++++++++++++++
> net/core/rtnetlink.c | 38 ++++++++-
> net/ipv4/ip_gre.c | 2 +
> net/ipv4/ip_tunnel.c | 8 ++
> net/ipv4/ip_vti.c | 1 +
> net/ipv4/ipip.c | 1 +
> net/ipv6/ip6_gre.c | 1 +
> net/ipv6/ip6_tunnel.c | 9 ++
> net/ipv6/ip6_vti.c | 1 +
> net/ipv6/sit.c | 1 +
> net/netlink/genetlink.c | 4 +
> 19 files changed, 308 insertions(+), 3 deletions(-)
>
> Comments are welcome.
>
> Regards,
> Nicolas
^ permalink raw reply
* Re: [PATCH v2 net 1/2] drivers/net: Disable UFO through virtio
From: Eric Dumazet @ 2014-10-30 18:47 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, Hannes Frederic Sowa, virtualization
In-Reply-To: <1414693632.16849.62.camel@decadent.org.uk>
On Thu, 2014-10-30 at 18:27 +0000, Ben Hutchings wrote:
> + {
> + static bool warned;
> +
> + if (!warned) {
> + warned = true;
> + netdev_warn(tun->dev,
> + "%s: using disabled UFO feature; please fix this program\n",
> + current->comm);
> + }
>
It might be time to add netdev_warn_once() ;)
Alternatively, you could use
pr_warn_once("%s: using disabled UFO feature; please fix this program\n",
tun->dev->name, current->comm);
^ permalink raw reply
* Re: [PATCH v3 00/15] net: dsa: Fixes and enhancements
From: David Miller @ 2014-10-30 18:54 UTC (permalink / raw)
To: linux; +Cc: f.fainelli, netdev, andrew, linux-kernel
In-Reply-To: <20141029213928.GA28498@roeck-us.net>
From: Guenter Roeck <linux@roeck-us.net>
Date: Wed, 29 Oct 2014 14:39:28 -0700
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Series applied, thanks everyone.
^ permalink raw reply
* Re: [PATCH net-next 5/8] net/mlx4_en: Remove redundant code from RX/GRO path
From: Eric Dumazet @ 2014-10-30 19:00 UTC (permalink / raw)
To: Or Gerlitz
Cc: David S. Miller, netdev, Matan Barak, Amir Vadai, Saeed Mahameed,
Shani Michaeli
In-Reply-To: <1414685216-28907-6-git-send-email-ogerlitz@mellanox.com>
On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:
> Remove the code which goes through napi_gro_frags() on the RX path,
> use only napi_gro_receive().
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
Hmpff... napi_gro_frags() should be faster.
Have you benchmarked this ?
^ permalink raw reply
* Re: nfs stalls over loopback interface (no sk_data_ready events?)
From: Christoph Hellwig @ 2014-10-30 19:00 UTC (permalink / raw)
To: Jeff Layton
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
Linux NFS Mailing List, Bruce Fields, Trond Myklebust
In-Reply-To: <20141029102123.58f6c960-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
On Wed, Oct 29, 2014 at 10:21:23AM -0400, Jeff Layton wrote:
> Looks some change that went into -rc2 has fixed the problem for me.
> Christoph, can you confirm that this no longer occurs with -rc2?
I can't reproduce it anymore on -rc2.
However:
generic/133 fails fairly reliably with:
Buffered writer, buffered reader
+pread64: Device or resource busy
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Mistake in commit 0d961b3b52f566f823070ce2366511a7f64b928c breaks cpsw non dual_emac mode.
From: David Miller @ 2014-10-30 19:43 UTC (permalink / raw)
To: lsorense; +Cc: linux-kernel, hs, mugunthanvnm, netdev
In-Reply-To: <20141028170242.GA24112@csclub.uwaterloo.ca>
From: "Lennart Sorensen" <lsorense@csclub.uwaterloo.ca>
Date: Tue, 28 Oct 2014 13:02:42 -0400
> I believe commit 0d961b3b52f566f823070ce2366511a7f64b928c made a mistake
> while correcting a bug.
This patch submission is not properly formed.
You subject line should be of the form:
subsystem: Description.
"subsystem" here would be "cpsw: " or something like that.
Secondly, you should not refer to a commit ID in the patch
Subject line, instead just describe exactly what is being
fixed in the most succinct yet complete manner that is
possible.
Thirdly, when you do refer to commit ID's in your commit
message body you must do so in the following format:
${SHA1_ID} ("Commit message header line text.")
The commit message body is also not a place to have a general
discussion. Please avoid saying things like "I think", for example.
State facts, and be exact about what the problem is and exactly
how you are fixing it.
Because this commit message will be read by others looking at your
change days, weeks, years from now.
Thanks.
^ permalink raw reply
* Re: [PATCH net-next 0/3] reduce verifier memory consumption and add tests
From: David Miller @ 2014-10-30 19:45 UTC (permalink / raw)
To: ast; +Cc: hannes, edumazet, dborkman, luto, netdev, linux-kernel
In-Reply-To: <1414534303-9906-1-git-send-email-ast@plumgrid.com>
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Tue, 28 Oct 2014 15:11:40 -0700
> Small set of cleanups:
> - reduce verifier memory consumption
> - add verifier test to check register state propagation and state equivalence
> - add JIT test reduced from recent nmap triggered crash
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net 0/3] r8152: patches for autosuspend
From: David Miller @ 2014-10-30 19:49 UTC (permalink / raw)
To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-69-Taiwan-albertk@realtek.com>
From: Hayes Wang <hayeswang@realtek.com>
Date: Wed, 29 Oct 2014 11:12:14 +0800
> There are unexpected processes when enabling autosuspend.
> These patches are used to fix them.
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH 0/6 3.18] Fixes for iwlwifi drivers
From: Arend van Spriel @ 2014-10-30 19:51 UTC (permalink / raw)
To: Larry Finger
Cc: Luca Coelho, linville-2XuSBdqkA4R54TAoqtyWWQ,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Murilo Opsfelder Araujo
In-Reply-To: <545258D6.20000-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
On 10/30/14 16:27, Larry Finger wrote:
> On 10/30/2014 06:08 AM, Luca Coelho wrote:
>> The cover-letter subject is wrong. :) I guess you meant
>> s/iwlwifi/rtlwifi/ ;)
>
> Yes, the changes were for rtlwifi, not iwlwifi. Sorry. (:
>
> My laptop has an Intel 7260 card built in, and it is working correctly
> on both 2.4 and 5G bands under mainline 3.18-rc2.
>
> Those types of errors are what I get for trying to "work" while on a
> family vacation. Unfortunately, I needed to submit those patches quickly
> to prevent a set of conflicting updates from being accepted, and I made
> a silly mistake.
Don't let it spoil your vacation ;-)
Regards,
Arend
> Larry
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-wireless" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/1 net-next] ipx: remove all unnecessary castings on ntohl
From: David Miller @ 2014-10-30 19:52 UTC (permalink / raw)
To: fabf; +Cc: linux-kernel, acme, netdev
In-Reply-To: <1414571502-9231-1-git-send-email-fabf@skynet.be>
From: Fabian Frederick <fabf@skynet.be>
Date: Wed, 29 Oct 2014 09:31:42 +0100
> Apply commit e0f36310f793
> ("ipx: remove unnecessary casting on ntohl")
> to all seq_printf/08lX
>
> Inspired-by: "David S. Miller" <davem@davemloft.net>
> Inspired-by: Joe Perches <joe@perches.com>
> Signed-off-by: Fabian Frederick <fabf@skynet.be>
Applied.
^ permalink raw reply
* Re: [PATCH/TRIVIAL 1/1 net-next] ipv6: spelling s/incomming/incoming
From: David Miller @ 2014-10-30 19:52 UTC (permalink / raw)
To: fabf; +Cc: linux-kernel, kuznet, jmorris, yoshfuji, kaber, trivial, netdev
In-Reply-To: <1414573227-10743-1-git-send-email-fabf@skynet.be>
From: Fabian Frederick <fabf@skynet.be>
Date: Wed, 29 Oct 2014 10:00:26 +0100
> Signed-off-by: Fabian Frederick <fabf@skynet.be>
Applied.
^ permalink raw reply
* Re: [PATCH 1/1 net-next] ipv6: remove inline on static in c file
From: David Miller @ 2014-10-30 19:52 UTC (permalink / raw)
To: fabf; +Cc: linux-kernel, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1414579097-11659-1-git-send-email-fabf@skynet.be>
From: Fabian Frederick <fabf@skynet.be>
Date: Wed, 29 Oct 2014 11:38:17 +0100
> remove __inline__ / inline and let compiler decide what to do
> with static functions
>
> Inspired-by: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Fabian Frederick <fabf@skynet.be>
Applied.
^ permalink raw reply
* Re: [PATCH 1/1 net-next] ipv6: remove assignment in if condition
From: David Miller @ 2014-10-30 19:52 UTC (permalink / raw)
To: fabf; +Cc: linux-kernel, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1414583871-14196-1-git-send-email-fabf@skynet.be>
From: Fabian Frederick <fabf@skynet.be>
Date: Wed, 29 Oct 2014 12:57:51 +0100
> Do assignment before if condition and test !skb like in rawv6_recvmsg()
>
> Signed-off-by: Fabian Frederick <fabf@skynet.be>
Applied.
^ permalink raw reply
* Re: [PATCH -next 0/2] net: allow setting ecn via routing table
From: David Miller @ 2014-10-30 19:59 UTC (permalink / raw)
To: fw; +Cc: netdev
In-Reply-To: <20141029122307.GA29253@breakpoint.cc>
From: Florian Westphal <fw@strlen.de>
Date: Wed, 29 Oct 2014 13:23:07 +0100
> We could do that, if you prefer.
>
> I tried to come up with a scenario though, where sysctl_tcp_ecn=0, and
> then we want to enable 'passive' ecn for incoming connections only on
> a particular route without announcing ecn to the peer. I haven't been
> able to find any -- I think if you deem 'route to x' safe for ecn it
> might as well be enabled for both initiator and responder. The original
> patch would be sufficient for that.
>
> IOW, is 'ecn from a to b but not b to a' a sensible requirement?
I think you have to apply the same logic for the sysctl (there's a
reason to only support ECN passively) as you do for the route feature
because you can logically look at the sysctl as applying to the
default route.
> Unrelated to this patch, but I'd like to see sysctl_tcp_ecn=1 as a
> default at one point (almost no routers set CE bit at this time, perhaps
> that would change if ecn usage is more widespread).
Now you're talking.
So, either passive ECN support makes sense or it does not. To me, no
matter what the argument, it doesn't matter what realm (whole system,
specific routes) you apply that argument to.
^ permalink raw reply
* Re: [PATCH v2] ip6_tunnel: allow to change mode for the ip6tnl0
From: David Miller @ 2014-10-30 20:09 UTC (permalink / raw)
To: alan; +Cc: netdev, edumazet
In-Reply-To: <1414569292-15384-1-git-send-email-alan@al-an.info>
From: "Alexey Andriyanov" <alan@al-an.info>
Date: Wed, 29 Oct 2014 10:54:52 +0300
> The fallback device is in ipv6 mode by default.
> The mode can not be changed in runtime, so there
> is no way to decapsulate ip4in6 packets coming from
> various sources without creating the specific tunnel
> ifaces for each peer.
>
> This allows to update the fallback tunnel device, but only
> the mode could be changed. Usual command should work for the
> fallback device: `ip -6 tun change ip6tnl0 mode any`
>
> The fallback device can not be hidden from the packet receiver
> as a regular tunnel, but there is no need for synchronization
> as long as we do single assignment.
>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Applied to net-next, thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox