Netdev List
 help / color / mirror / Atom feed
* [PATCH net] vti: get rid of nf mark rule in prerouting
From: Christophe Gouault @ 2013-10-08 15:21 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, Saurabh Mohan, Christophe Gouault

This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).

Currently:

- it is necessary to identify and mark inbound IPsec
  packets destined to each vti interface, via netfilter rules in
  the mangle table at prerouting hook.

- the vti module cannot retrieve the right tunnel in input since
  commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
  is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.

- the i_key is used by the outbound processing as a mark to lookup
  for the right SP and SA bundle.

This patch uses the o_key to store the vti mark (instead of i_key) and
enables:

- to avoid the need for previously marking the inbound skbuffs via a
  netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
  packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
  as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
  o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
  used nor changed by the vti interface. Only the vti interface o_key
  is used.

SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.

The vti interface must be created as follows (i_key = 0, o_key = mark):

   ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1

The SPs attached to vti1 must be created as follows (mark = vti1 o_key):

   ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
      proto esp mode tunnel
   ip xfrm policy add dir in  mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
      proto esp mode tunnel

The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:

   ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
                 enc "cbc(aes)" "azertyuiopqsdfgh"

   ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
                 enc "cbc(aes)" "sqbdhgqsdjqjsdfh"

To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.

To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:

   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm

The outgoing traffic is steered to vti1 by a route via the vti interface:

   ip route add 192.168.0.0/16 dev vti1

The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
This is is both a fix and enhancement patch. However, there are 2 ways
of fixing the inbound processing bug:
- either keep the current configuration model (ikey + netfilter rule)
  and change the tunnel lookup method. This patch would then be reverted
  by the enhancement (this sounds counterproductive).
- or directly change the configuration model (okey, no netfilter rule) and keep
  the current tunnel lookup method.

 net/ipv4/ip_vti.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index e805e7b..6e87f85 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -125,8 +125,17 @@ static int vti_rcv(struct sk_buff *skb)
 				  iph->saddr, iph->daddr, 0);
 	if (tunnel != NULL) {
 		struct pcpu_tstats *tstats;
+		u32 oldmark = skb->mark;
+		int ret;
 
-		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
+
+		/* temporarily mark the skb with the tunnel o_key, to
+		 * only match policies with this mark.
+		 */
+		skb->mark = be32_to_cpu(tunnel->parms.o_key);
+		ret = xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb);
+		skb->mark = oldmark;
+		if (!ret)
 			return -1;
 
 		tstats = this_cpu_ptr(tunnel->dev->tstats);
@@ -135,7 +144,6 @@ static int vti_rcv(struct sk_buff *skb)
 		tstats->rx_bytes += skb->len;
 		u64_stats_update_end(&tstats->syncp);
 
-		skb->mark = 0;
 		secpath_reset(skb);
 		skb->dev = tunnel->dev;
 		return 1;
@@ -167,7 +175,7 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	memset(&fl4, 0, sizeof(fl4));
 	flowi4_init_output(&fl4, tunnel->parms.link,
-			   be32_to_cpu(tunnel->parms.i_key), RT_TOS(tos),
+			   be32_to_cpu(tunnel->parms.o_key), RT_TOS(tos),
 			   RT_SCOPE_UNIVERSE,
 			   IPPROTO_IPIP, 0,
 			   dst, tiph->saddr, 0, 0);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH] rtlwifi: rtl8192cu: Fix error in pointer arithmetic
From: Larry Finger @ 2013-10-08 15:18 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Larry Finger, netdev, Mark Cave-Ayland, Stable

From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

An error in calculating the offset in an skb causes the driver to read
essential device info from the wrong locations. The main effect is that
automatic gain calculations are nonsense.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [2.6.39+]
---
 drivers/net/wireless/rtlwifi/rtl8192cu/trx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/trx.c b/drivers/net/wireless/rtlwifi/rtl8192cu/trx.c
index 04c7e57..25e50ff 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/trx.c
@@ -343,7 +343,8 @@ bool rtl92cu_rx_query_desc(struct ieee80211_hw *hw,
 					(bool)GET_RX_DESC_PAGGR(pdesc));
 	rx_status->mactime = GET_RX_DESC_TSFL(pdesc);
 	if (phystatus) {
-		p_drvinfo = (struct rx_fwinfo_92c *)(pdesc + RTL_RX_DESC_SIZE);
+		p_drvinfo = (struct rx_fwinfo_92c *)(skb->data +
+						     stats->rx_bufshift);
 		rtl92c_translate_rx_signal_stuff(hw, skb, stats, pdesc,
 						 p_drvinfo);
 	}
-- 
1.8.1.4

^ permalink raw reply related

* [RFC] net: stmmac: keep RXC running in LPI mode to avoid system overload
From: Romain Baeriswyl @ 2013-10-08 15:06 UTC (permalink / raw)
  To: Giuseppe Cavallaro
  Cc: David S. Miller, netdev, linux-kernel, Romain Baeriswyl

In order to avoid system overload, the clock RXC from the Phy should not be 
stopped when in LPI mode.

With the RTL8211E PHY which support EEE mode and with Apple Airport Extreme that 
supports it also, the kernel get frozen as soon as some Ethernet transfers are 
on-going. System seems to be overloaded with too many interrupts. The 'top' 
command reports often around ~80% irq.

By letting the RXC clock running even in LPI mode as proposed below, the issue 
seems solved. Is it the right way to proceed?  

What is the power impact to not stop RXC in LPI mode?

---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index e9eab29..d40d26b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -314,7 +314,8 @@ bool stmmac_eee_init(struct stmmac_priv *priv)
 	/* MAC core supports the EEE feature. */
 	if (priv->dma_cap.eee) {
 		/* Check if the PHY supports EEE */
-		if (phy_init_eee(priv->phydev, 1))
+		/* Keeps RXC running in LPI mode to avoid stability issue */
+		if (phy_init_eee(priv->phydev, 0))
 			goto out;
 
 		if (!priv->eee_active) {
@@ -770,7 +771,8 @@ static void stmmac_adjust_link(struct net_device *dev)
 	/* At this stage, it could be needed to setup the EEE or adjust some
 	 * MAC related HW registers.
 	 */
-	priv->eee_enabled = stmmac_eee_init(priv);
+	if (new_state)
+		priv->eee_enabled = stmmac_eee_init(priv);
 
 	spin_unlock_irqrestore(&priv->lock, flags);
 }
-- 
1.7.1

^ permalink raw reply related

* [PATCH iproute2] iplink: update available type list
From: Nicolas Dichtel @ 2013-10-08 14:59 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Nicolas Dichtel

macvtap and vti were missing.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 ip/iplink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 16cb6fed9fcf..6cde731ac328 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -84,9 +84,9 @@ void iplink_usage(void)
 
 	if (iplink_have_newlink()) {
 		fprintf(stderr, "\n");
-		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can |\n");
-		fprintf(stderr, "          bridge | ipoib | ip6tnl | ipip | sit | vxlan |\n");
-		fprintf(stderr, "          gre | gretap | ip6gre | ip6gretap }\n");
+		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | macvtap |\n");
+		fprintf(stderr, "          can | bridge | ipoib | ip6tnl | ipip | sit | vxlan |\n");
+		fprintf(stderr, "          gre | gretap | ip6gre | ip6gretap | vti }\n");
 	}
 	exit(-1);
 }
-- 
1.8.4.1

^ permalink raw reply related

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Hannes Frederic Sowa @ 2013-10-08 14:53 UTC (permalink / raw)
  To: Jon Mason
  Cc: Jiri Pirko, netdev, yoshfuji, David Miller, kuznet, jmorris,
	kaber, herbert, Eric Dumazet
In-Reply-To: <CAPoiz9wxxvHfmmMOCvPstzT51BafGfU1u5umxBQ8kqqCE=OzbQ@mail.gmail.com>

On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> 
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants).  I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.

No this is not what I tried to say. I'll try to be more clear this
time. ;)

We start with an UDP socket which is corked. As soon as we write the
first few bytes (smaller than the mtu) onto this socket we put the
header in place and the rest of the data is just appended behind the
header directly in skb->data via plain ip_append_data.

Now a second write with a length > mtu happens: The ip(6)_append_data
will branch to ufo_append. This will fetch the first skb and append
to skb->frags.  gso_type and gso_size will be updated on this skb (this
currently does not happen but will with the patches discussed in this
thread).

If this packet is transmitted down to the device driver we have the udp
header in skb->data *and* also the payload from the first write. The
payload from the second write is appended as a frag and gso_type and
gso_size are set. This header+payload seem to be mapped just after the
ufo_in_band_v descriptor as the header in the first tx descriptor:

   4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
   4175                                               frg_len, PCI_DMA_TODEVICE);

frg_len is set to skb_headlen(skb). This happens right after setting up
the descriptor for the in-band ufo data.

My guess is that this data isn't split currently by the neterion driver
(at least I could not find it in the driver as Eric showed it for bnx2x)
so it might reappear in the packets when the hardware fragments the
packet and places the first tx ring in front of every packet.

Before these changes we never updated the gso_type and gso_size even when
we did append via UFO. So we never had payload in an UFO marked skb->data,
only the headers. Now we could also end up with a some payload in the
first TX ring, which you said is only for the header.

> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).

Ok, testing this should not be that complicated:

We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/

--- >8 ---
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, "    ", 4);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}
--- >8 ---

I left out error handling so it is better observed with strace if
something went wrong.

You should change the port number and ip address to something reasonable
for your network. My guess would be that the spaces (0x20) of the first
write is now placed between UDP header and payload of every packet
fragmented by the hardware. Would be nice to hear that I am wrong. ;)

Be aware that the above program can cause memory corruption in the kernel
if you did not apply Jiri's patch.

Thanks for helping!

  Hannes

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 14:52 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David Laight,
	<linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	<linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
In-Reply-To: <1381239951.13359.13.camel-8Nb76shvtaUJvtFkdXX2HixXY32XiHfO@public.gmane.org>

On 8 October 2013 15:45, Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> wrote:
> On Tue, 2013-10-08 at 15:45 +0200, Johannes Berg wrote:
>> On Tue, 2013-10-08 at 15:41 +0200, Ard Biesheuvel wrote:
>>
>> > Actually, as the size is always the same, it should be feasible to
>> > alloc a couple of request structs at init time. would one for rx and
>> > one for tx be sufficient? or is this code more reentrant than that?
>>
>> TX can run concurrently on multiple (four) queues using the same key.
>
> And maybe even more with injection ... I wouldn't go there.
>

OK, clear.

If you think this patch has any merit at all, I am happy to modify it
so that it kmalloc()s the request struct with GFP_ATOMIC at every
invocation.
However, personally I don't think this should be necessary and in fact
my patch removes a stack allocation of u8[48] (from
ieee80211_crypto_ccmp_decrypt() and from ccmp_encrypt_skb() in wpa.c)
so it does even out a bit.

regards,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] xen-netback: Don't destroy the netdev until the vif is shut
From: Paul Durrant @ 2013-10-08 14:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, Wei Liu
In-Reply-To: <20131004.002248.789391156050342547.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: 04 October 2013 05:23
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; Wei Liu
> Subject: Re: [PATCH] xen-netback: Don't destroy the netdev until the vif is
> shut
> 
> 
> Can one of you do -stable backports of this change for v3.11 and any
> other active -stable branch where this fix is relevant?
> 
> I gave it a shot, and it got outside my realm of expertise.
> 
> Thanks!

Ok. Here's a backport to 3.11.

  Paul

--8<--
>From 158011889a79ad7cebdd30d551adcfeae80b8989 Mon Sep 17 00:00:00 2001
From: Paul Durrant <paul.durrant@citrix.com>
Date: Tue, 8 Oct 2013 14:56:44 +0100
Subject: [PATCH] xen-netback: Don't destroy the netdev until the vif is shut
 down

[ upstream commit id: 279f438e36c0a70b23b86d2090aeec50155034a9 ]

Without this patch, if a frontend cycles through states Closing
and Closed (which Windows frontends need to do) then the netdev
will be destroyed and requires re-invocation of hotplug scripts
to restore state before the frontend can move to Connected. Thus
when udev is not in use the backend gets stuck in InitWait.

With this patch, the netdev is left alone whilst the backend is
still online and is only de-registered and freed just prior to
destroying the vif (which is also nicely symmetrical with the
netdev allocation and registration being done during probe) so
no re-invocation of hotplug scripts is required.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |   23 +++++++++--------------
 drivers/net/xen-netback/xenbus.c    |   17 ++++++++++++-----
 3 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a4d77e..4d9a5e7 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -120,6 +120,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
 		   unsigned int rx_evtchn);
 void xenvif_disconnect(struct xenvif *vif);
+void xenvif_free(struct xenvif *vif);
 
 void xenvif_get(struct xenvif *vif);
 void xenvif_put(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 087d2db..73336c1 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -326,6 +326,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	}
 
 	netdev_dbg(dev, "Successfully created xenvif\n");
+
+	__module_get(THIS_MODULE);
+
 	return vif;
 }
 
@@ -413,12 +416,6 @@ void xenvif_carrier_off(struct xenvif *vif)
 
 void xenvif_disconnect(struct xenvif *vif)
 {
-	/* Disconnect funtion might get called by generic framework
-	 * even before vif connects, so we need to check if we really
-	 * need to do a module_put.
-	 */
-	int need_module_put = 0;
-
 	if (netif_carrier_ok(vif->dev))
 		xenvif_carrier_off(vif);
 
@@ -432,18 +429,16 @@ void xenvif_disconnect(struct xenvif *vif)
 			unbind_from_irqhandler(vif->tx_irq, vif);
 			unbind_from_irqhandler(vif->rx_irq, vif);
 		}
-		/* vif->irq is valid, we had a module_get in
-		 * xenvif_connect.
-		 */
-		need_module_put = 1;
 	}
 
-	unregister_netdev(vif->dev);
-
 	xen_netbk_unmap_frontend_rings(vif);
+}
+
+void xenvif_free(struct xenvif *vif)
+{
+	unregister_netdev(vif->dev);
 
 	free_netdev(vif->dev);
 
-	if (need_module_put)
-		module_put(THIS_MODULE);
+	module_put(THIS_MODULE);
 }
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 1fe48fe3..a53782e 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -42,7 +42,7 @@ static int netback_remove(struct xenbus_device *dev)
 	if (be->vif) {
 		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
 		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_disconnect(be->vif);
+		xenvif_free(be->vif);
 		be->vif = NULL;
 	}
 	kfree(be);
@@ -213,9 +213,18 @@ static void disconnect_backend(struct xenbus_device *dev)
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
 
+	if (be->vif)
+		xenvif_disconnect(be->vif);
+}
+
+static void destroy_backend(struct xenbus_device *dev)
+{
+	struct backend_info *be = dev_get_drvdata(&dev->dev);
+
 	if (be->vif) {
+		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
 		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_disconnect(be->vif);
+		xenvif_free(be->vif);
 		be->vif = NULL;
 	}
 }
@@ -246,14 +255,11 @@ static void frontend_changed(struct xenbus_device *dev,
 	case XenbusStateConnected:
 		if (dev->state == XenbusStateConnected)
 			break;
-		backend_create_xenvif(be);
 		if (be->vif)
 			connect(be);
 		break;
 
 	case XenbusStateClosing:
-		if (be->vif)
-			kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
 		disconnect_backend(dev);
 		xenbus_switch_state(dev, XenbusStateClosing);
 		break;
@@ -262,6 +268,7 @@ static void frontend_changed(struct xenbus_device *dev,
 		xenbus_switch_state(dev, XenbusStateClosed);
 		if (xenbus_dev_is_online(dev))
 			break;
+		destroy_backend(dev);
 		/* fall through if not online */
 	case XenbusStateUnknown:
 		device_unregister(&dev->dev);
-- 
1.7.10.4

^ permalink raw reply related

* RE: [PATCH] xen-netback: Don't destroy the netdev until the vif is shut
From: Paul Durrant @ 2013-10-08 14:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, Wei Liu
In-Reply-To: <20131004.002248.789391156050342547.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: 04 October 2013 05:23
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; Wei Liu
> Subject: Re: [PATCH] xen-netback: Don't destroy the netdev until the vif is
> shut
> 
> 
> Can one of you do -stable backports of this change for v3.11 and any
> other active -stable branch where this fix is relevant?
> 
> I gave it a shot, and it got outside my realm of expertise.
> 
> Thanks!

Here is a patch for 3.10.

--8<--
>From e626fc73f445f05d976756d022614f06461cab74 Mon Sep 17 00:00:00 2001
From: Paul Durrant <paul.durrant@citrix.com>
Date: Tue, 8 Oct 2013 14:22:56 +0100
Subject: [PATCH] xen-netback: Don't destroy the netdev until the vif is shut
 down

[ upstream commit id: 279f438e36c0a70b23b86d2090aeec50155034a9 ]

Without this patch, if a frontend cycles through states Closing
and Closed (which Windows frontends need to do) then the netdev
will be destroyed and requires re-invocation of hotplug scripts
to restore state before the frontend can move to Connected. Thus
when udev is not in use the backend gets stuck in InitWait.

With this patch, the netdev is left alone whilst the backend is
still online and is only de-registered and freed just prior to
destroying the vif (which is also nicely symmetrical with the
netdev allocation and registration being done during probe) so
no re-invocation of hotplug scripts is required.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |   12 ++++++++++--
 drivers/net/xen-netback/xenbus.c    |   17 ++++++++++++-----
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 9d7f172..1a28508 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -115,6 +115,7 @@ struct xenvif *xenvif_alloc(struct device *parent,
 int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int evtchn);
 void xenvif_disconnect(struct xenvif *vif);
+void xenvif_free(struct xenvif *vif);
 
 void xenvif_get(struct xenvif *vif);
 void xenvif_put(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index d984141..3a294c2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -304,6 +304,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	}
 
 	netdev_dbg(dev, "Successfully created xenvif\n");
+
+	__module_get(THIS_MODULE);
+
 	return vif;
 }
 
@@ -369,9 +372,14 @@ void xenvif_disconnect(struct xenvif *vif)
 	if (vif->irq)
 		unbind_from_irqhandler(vif->irq, vif);
 
-	unregister_netdev(vif->dev);
-
 	xen_netbk_unmap_frontend_rings(vif);
+}
+
+void xenvif_free(struct xenvif *vif)
+{
+	unregister_netdev(vif->dev);
 
 	free_netdev(vif->dev);
+
+	module_put(THIS_MODULE);
 }
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 410018c..abe24ff 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -42,7 +42,7 @@ static int netback_remove(struct xenbus_device *dev)
 	if (be->vif) {
 		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
 		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_disconnect(be->vif);
+		xenvif_free(be->vif);
 		be->vif = NULL;
 	}
 	kfree(be);
@@ -203,9 +203,18 @@ static void disconnect_backend(struct xenbus_device *dev)
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
 
+	if (be->vif)
+		xenvif_disconnect(be->vif);
+}
+
+static void destroy_backend(struct xenbus_device *dev)
+{
+	struct backend_info *be = dev_get_drvdata(&dev->dev);
+
 	if (be->vif) {
+		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
 		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_disconnect(be->vif);
+		xenvif_free(be->vif);
 		be->vif = NULL;
 	}
 }
@@ -237,14 +246,11 @@ static void frontend_changed(struct xenbus_device *dev,
 	case XenbusStateConnected:
 		if (dev->state == XenbusStateConnected)
 			break;
-		backend_create_xenvif(be);
 		if (be->vif)
 			connect(be);
 		break;
 
 	case XenbusStateClosing:
-		if (be->vif)
-			kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
 		disconnect_backend(dev);
 		xenbus_switch_state(dev, XenbusStateClosing);
 		break;
@@ -253,6 +259,7 @@ static void frontend_changed(struct xenbus_device *dev,
 		xenbus_switch_state(dev, XenbusStateClosed);
 		if (xenbus_dev_is_online(dev))
 			break;
+		destroy_backend(dev);
 		/* fall through if not online */
 	case XenbusStateUnknown:
 		device_unregister(&dev->dev);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net stable REPOST] ipv6: restrict neighbor entry creation to output flow
From: Jiri Pirko @ 2013-10-08 14:41 UTC (permalink / raw)
  To: netdev; +Cc: hannes, jiri, dbanerje, yoshfuji, mleitner, davem

From: Marcelo Ricardo Leitner <mleitner@redhat.com>

This patch is based on 3.2.y branch, the one used by reporter. Please let me
know if it should be different. Thanks.

The patch which introduced the regression was applied on stables:
3.0.64 3.4.31 3.7.8 3.2.39

The patch which introduced the regression was for stable trees only.

---8<---

Commit 0d6a77079c475033cb622c07c5a880b392ef664e "ipv6: do not create
neighbor entries for local delivery" introduced a regression on
which routes to local delivery would not work anymore. Like this:

    $ ip -6 route add local 2001::/64 dev lo
    $ ping6 -c1 2001::9
    PING 2001::9(2001::9) 56 data bytes
    ping: sendmsg: Invalid argument

As this is a local delivery, that commit would not allow the creation of a
neighbor entry and thus the packet cannot be sent.

But as TPROXY scenario actually needs to avoid the neighbor entry creation only
for input flow, this patch now limits previous patch to input flow, keeping
output as before that patch.

Reported-by: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/route.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18ea73c..bc9103d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -791,7 +791,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
 }
 
 static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, int oif,
-				      struct flowi6 *fl6, int flags)
+				      struct flowi6 *fl6, int flags, bool input)
 {
 	struct fib6_node *fn;
 	struct rt6_info *rt, *nrt;
@@ -799,8 +799,11 @@ static struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 	int attempts = 3;
 	int err;
 	int reachable = net->ipv6.devconf_all->forwarding ? 0 : RT6_LOOKUP_F_REACHABLE;
+	int local = RTF_NONEXTHOP;
 
 	strict |= flags & RT6_LOOKUP_F_IFACE;
+	if (input)
+		local |= RTF_LOCAL;
 
 relookup:
 	read_lock_bh(&table->tb6_lock);
@@ -820,7 +823,7 @@ restart:
 	read_unlock_bh(&table->tb6_lock);
 
 	if (!dst_get_neighbour_raw(&rt->dst)
-	    && !(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_LOCAL)))
+	    && !(rt->rt6i_flags & local))
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
 	else if (!(rt->dst.flags & DST_HOST))
 		nrt = rt6_alloc_clone(rt, &fl6->daddr);
@@ -864,7 +867,7 @@ out2:
 static struct rt6_info *ip6_pol_route_input(struct net *net, struct fib6_table *table,
 					    struct flowi6 *fl6, int flags)
 {
-	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags);
+	return ip6_pol_route(net, table, fl6->flowi6_iif, fl6, flags, true);
 }
 
 void ip6_route_input(struct sk_buff *skb)
@@ -890,7 +893,7 @@ void ip6_route_input(struct sk_buff *skb)
 static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
 					     struct flowi6 *fl6, int flags)
 {
-	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
+	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags, false);
 }
 
 struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,
-- 
1.8.3.1

^ permalink raw reply related

* Re: bug in passing file descriptors
From: Steve Rago @ 2013-10-08 14:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, luto, netdev, mtk.manpages, ebiederm
In-Reply-To: <87li24zo6x.fsf@tassilo.jf.intel.com>

On 10/07/2013 06:55 PM, Andi Kleen wrote:
> David Miller <davem@davemloft.net> writes:
>
>> From: Steve Rago <sar@nec-labs.com>
>> Date: Mon, 7 Oct 2013 16:29:15 -0400
>>
>>> On 10/07/2013 03:42 PM, David Miller wrote:
>>>> There is no compatability issue.
>>>>
>>>> 32-bit tasks will always see the 4-byte align/length.
>>>> 64-bit tasks will always see the 8-byte align/length.
>>>>
>>>
>>> Really?  So when I compile my application on a 32-bit Linux box and
>>> then try to run it on a 64-bit Linux box, you're not going to overrun
>>> my buffer when CMSG_SPACE led me to allocate an insufficient amount of
>>> memory needed to account for padding on the 64-bit platform?
>>
>> We have a compatability layer that gives 32-bit applications the
>> same behavior as if they had run on a 32-bit machine.
>>
>> Search around for the MSG_MSG_COMPAT flag and how that is used in
>> net/socket.c
>
> But it seems the compat layer doesn't handle this correctly,
> otherwise Steve's original test case would work.
>
> Must be a bug somewhere in the compat layer.
>
> -Andi
>

I did some research last night and I think the problem stems from an underspecified standard.  CMSG_LEN and CMSG_SPACE 
seem to have originated with RFC 2292, which has since been obsoleted by RFC 3542.  The difference is that CMSG_SPACE 
accounts for padding at the end, which is needed when you stuff multiple cmsghdr objects in the same buffer.  CMSG_LEN 
is required to be used to initialize the cmsg_len member of the structure.  When you only have one cmsghdr object in 
your call to recvmsg, it is unclear whether you need to have a buffer as large as CMSG_SPACE or CMSG_LEN.  Historically, 
BSD-based platforms never had these macros and didn't return the ancillary data if the space provided by the application 
wasn't big enough.  Linux, *which has a bug*, won't copy more bytes than cmsg_len specifies when the application uses 
CMSG_LEN instead of CMSG_SPACE, but then lies to the application by overwriting msg_controllen.  Look in put_cmsg() in 
net/core/scm.c: "cmlen" is calculated using CM_LEN, and then msg_controllen is checked against it to make sure you don't 
overwrite the user's buffer.  However, in recvmsg(), msg_controllen is overwritten by "msg_sys.msg_control - cmsg_ptr". 
  msg_control was incremented by CMSG_SPACE at the end of scm_detach_fds().

The Linux kernel code seems to copy the right amount of data, even in the compat case, as far as I can tell.  The only 
bug I see is that msg_controllen returns from recvmsg() with the wrong value.


Steve

^ permalink raw reply

* Re: IPv6 kernel warning
From: Neal Cardwell @ 2013-10-08 14:05 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: dormando, Michele Baldessari, Russell King - ARM Linux, netdev,
	Nandita Dukkipati
In-Reply-To: <CAK6E8=czoej81t=-J=gjjyQiGVbZ0qiNKBbeRVSWYtweXfSRNQ@mail.gmail.com>

On Mon, Oct 7, 2013 at 3:51 PM, Yuchung Cheng <ycheng@google.com> wrote:
> I suspect tcp_process_tlp_ack() should not revert state to Open
> directly, but calling tcp_try_keep_open() instead, similar to all the
> undo processing in the tcp_fastretrans_alert(): after
> tcp_end_cwnd_reduction(), the process (E) falls back to check other
> stats before moving to CA_Open.
>
>
> index 9c62257..9012b42 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
>                         tcp_init_cwnd_reduction(sk, true);
>                         tcp_set_ca_state(sk, TCP_CA_CWR);
>                         tcp_end_cwnd_reduction(sk);
> -                       tcp_set_ca_state(sk, TCP_CA_Open);
> +                       tcp_try_keep_open(sk);
>                         NET_INC_STATS_BH(sock_net(sk),
>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
>                 }

Yes, nice catch! This looks good to me. My testing confirms that this
definitely fixes a bug when this code fires and there are segments
SACKed out. Since it will stay in CA_Disorder if there are outstanding
retransmissions, I bet it will also fix the WARN_ON(tp->retrans_out !=
0) in state TCP_CA_Open that people are seeing.

neal

^ permalink raw reply

* RE: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-08 13:50 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008133045.GI28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:31
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6
> checksum offload from guest
> 
> On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
> [...]
> > -/*
> > - * This is the amount of packet we copy rather than map, so that the
> > - * guest can't fiddle with the contents of the headers while we do
> > - * packet processing on them (netfilter, routing, etc).
> > +/* This is a miniumum size for the linear area to avoid lots of
> > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> >   */
> > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > -			 VLAN_HLEN + \
> > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > +#define PKT_PROT_LEN 128
> >
> 
> Where does 128 come from?
> 

It's just an arbitrary power of 2 that was chosen because it seems to cover most likely v6 headers and all v4 headers.

> >  static u16 frag_get_pending_idx(skb_frag_t *frag)
> >  {
> > @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
> >  	return 0;
> >  }
> >
> > -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
> >  {
> > -	struct iphdr *iph;
> > +	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> > +		int target = min_t(int, skb->len, len);
> > +		__pskb_pull_tail(skb, target - skb_headlen(skb));
> > +	}
> > +}
> > +
> > +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> > +			     int recalculate_partial_csum)
> > +{
> [...]
> >
> >  		if (recalculate_partial_csum) {
> >  			struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct tcphdr) +
> > +				MAX_TCP_OPTION_SPACE;
> > +			maybe_pull_tail(skb, header_size);
> > +
> 
> I presume this function is checksum_setup stripped down to handle IPv4
> packet. What's the purpose of changing its behaviour? Why the pull_tail
> here?
> 

We have to make sure that the TCP header is in the linear area as we are about to write to the checksum field. In practice, the 128 byte pull should guarantee this but in case that is varied later I wanted to make sure things did not start to fail in an add way.

> > +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> > +			       int recalculate_partial_csum)
> > +{
> > +	int err = -EPROTO;
> > +	struct ipv6hdr *ipv6h = (void *)skb->data;
> > +	u8 nexthdr;
> > +	unsigned int header_size;
> > +	unsigned int off;
> > +	bool done;
> > +
> > +	done = false;
> > +	off = sizeof(struct ipv6hdr);
> > +
> > +	header_size = skb->network_header + off;
> > +	maybe_pull_tail(skb, header_size);
> > +
> > +	nexthdr = ipv6h->nexthdr;
> > +
> > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> &&
> > +	       !done) {
> > +		/* We only access at most the first 2 bytes of any option
> header
> > +		 * to determine the next one.
> > +		 */
> > +		header_size = skb->network_header + off + 2;
> > +		maybe_pull_tail(skb, header_size);
> > +
> 
> Will this cause performance problem? Is it possible that you pull too
> many times?
> 

I guess it means we may get two pulls for the TCP/UDP headers rather than one so could push the pulls into the individual cases if you think it will affect performance that badly.

> > +		switch (nexthdr) {
> > +		case IPPROTO_FRAGMENT: {
> > +			struct frag_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += 8;
> > +			break;
> > +		}
> > +		case IPPROTO_DSTOPTS:
> > +		case IPPROTO_HOPOPTS:
> > +		case IPPROTO_ROUTING: {
> > +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += ipv6_optlen(hp);
> > +			break;
> > +		}
> > +		case IPPROTO_AH: {
> > +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += (hp->hdrlen+2)<<2;
> > +			break;
> > +		}
> > +		default:
> > +			done = true;
> > +			break;
> 
> Can you make the logic explicit here?
> 
>    case IPPROTO_TCP:
>    case IPPROTO_UDP:
>         done = true;
> 	break;
>    default:
>         break;
> 
> Another minor suggestion is that change "done" to "found" because you're
> trying to find the two type of headers.
> 

Yes, I could code it that way if you prefer.

> > +	switch (nexthdr) {
> > +	case IPPROTO_TCP:
> > +		if (!skb_partial_csum_set(skb, off,
> > +					  offsetof(struct tcphdr, check)))
> > +			goto out;
> > +
> > +		if (recalculate_partial_csum) {
> > +			struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct tcphdr) +
> > +				MAX_TCP_OPTION_SPACE;
> > +			maybe_pull_tail(skb, header_size);
> > +
> 
> Same question as IPv4 counterpart, why do you need to pull here?
> 

Same answer as before ;-)

  Paul

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Jan Beulich @ 2013-10-08 13:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Vrabel, Ian Campbell, Wei Liu, xen-devel@lists.xen.org,
	netdev@vger.kernel.org
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD012F306@AMSPEX01CL01.citrite.net>

>>> On 08.10.13 at 15:42, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Wei Liu [mailto:wei.liu2@citrix.com]
>> Sent: 08 October 2013 14:32
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
>> Ian Campbell
>> Subject: Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO
>> packets from the guest
>> 
>> On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
>> > This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
>>             ^ adds            feature               advertise
>> 
>> > handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the
>> frontend
>> > passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6
>> added to
>> > netif.h.
>> 
>> Note the new type should be synced to Xen's netif.h tree with separate
>> patch for Xen.
>> 
> 
> Ok. Do you think it is reasonable to make incremental changes to netif.h in 
> this series and then submit a patch to xen-devel with the complete set once 
> this series is accepted?

I would think so.

Jan

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-08 13:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: David Laight, <linux-wireless@vger.kernel.org>,
	<netdev@vger.kernel.org>, <patches@linaro.org>,
	<linville@tuxdriver.com>
In-Reply-To: <1381239908.13359.12.camel@jlt4.sipsolutions.net>

On Tue, 2013-10-08 at 15:45 +0200, Johannes Berg wrote:
> On Tue, 2013-10-08 at 15:41 +0200, Ard Biesheuvel wrote:
> 
> > Actually, as the size is always the same, it should be feasible to
> > alloc a couple of request structs at init time. would one for rx and
> > one for tx be sufficient? or is this code more reentrant than that?
> 
> TX can run concurrently on multiple (four) queues using the same key.

And maybe even more with injection ... I wouldn't go there.

johannes

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-08 13:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: David Laight, <linux-wireless@vger.kernel.org>,
	<netdev@vger.kernel.org>, <patches@linaro.org>,
	<linville@tuxdriver.com>
In-Reply-To: <23D74823-0DE3-4A87-9E89-310F437A328D@linaro.org>

On Tue, 2013-10-08 at 15:41 +0200, Ard Biesheuvel wrote:

> Actually, as the size is always the same, it should be feasible to
> alloc a couple of request structs at init time. would one for rx and
> one for tx be sufficient? or is this code more reentrant than that?

TX can run concurrently on multiple (four) queues using the same key.

johannes

^ permalink raw reply

* RE: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Paul Durrant @ 2013-10-08 13:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008133142.GJ28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:32
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO
> packets from the guest
> 
> On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
> > This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
>             ^ adds            feature               advertise
> 
> > handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the
> frontend
> > passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6
> added to
> > netif.h.
> 
> Note the new type should be synced to Xen's netif.h tree with separate
> patch for Xen.
> 

Ok. Do you think it is reasonable to make incremental changes to netif.h in this series and then submit a patch to xen-devel with the complete set once this series is accepted?

  Paul

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 13:41 UTC (permalink / raw)
  To: David Laight
  Cc: Johannes Berg,
	<linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	<linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B737B-CgBM+Bx2aUAnGFn1LkZF6NBPR1lH4CV8@public.gmane.org>



On 8 okt. 2013, at 15:01, "David Laight" <David.Laight-ZS65k/vG3HxXrIkS9f7CXA@public.gmane.org> wrote:

>> Hmm, thanks I guess. I'll need to review this in more detail, but I have
>> a question first:
>> 
>>> +    /* allocate the variable sized aead_request on the stack */
>>> +    int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
>>> +                 sizeof(struct aead_request));
>>> +    struct aead_request req[1 + l];
>> 
>> This looks a bit odd, why round up first and then add one? Why even
>> bother using a struct array rather than some local struct like
> 
> Is it even a good idea to be allocating variable sized items
> on the kernel stack?
> 
> There has to be enough stack available for the maximum number
> of entries - so there is little point in dynamically sizing it.

Actually, as the size is always the same, it should be feasible to alloc a couple of request structs at init time. would one for rx and one for tx be sufficient? or is this code more reentrant than that?

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-08 13:40 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008133421.GK28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:34
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to
> the guest
> 
> On Tue, Oct 08, 2013 at 11:58:16AM +0100, Paul Durrant wrote:
> [...]
> >  	/* Data must not cross a page boundary. */
> >  	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> > @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif
> *vif, struct sk_buff *skb,
> >  		}
> >
> >  		/* Leave a gap for the GSO descriptor. */
> > -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> > +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> > +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> > +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		else
> > +			gso_type = 0;
> 
> Should probably #define XEN_NETIF_GSO_TYPE_NONE 0 instead of using
> plain
> 0?

All I need is a bit shift that's not going to hit in the mask. Probably worth reserving the value in the header.

> 
> > +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else {
> > +		gso_type = 0;
> 
> Ditto.
> 
> > -	if (!vif->gso_prefix)
> > -		meta->gso_size = skb_shinfo(skb)->gso_size;
> > -	else
> > +	if ((1 << gso_type) & vif->gso_mask) {
> > +		meta->gso_type = gso_type;
> > +		meta->gso_size = gso_size;
> > +	} else {
> > +		meta->gso_type = 0;
> 
> Ditto.
> 
> >  		meta->gso_size = 0;
> [...]
> > @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
> >  		val = 0;
> >  	vif->can_sg = !!val;
> >
> > +	vif->gso_mask = 0;
> > +	vif->gso_prefix_mask = 0;
> > +
> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->gso = !!val;
> > +	if (val)
> > +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> 
> Not using "|=" ?
> 
> >
> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-
> prefix",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->gso_prefix = !!val;
> > +	if (val)
> > +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> > +
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
> > +			 "%d", &val) < 0)
> > +		val = 0;
> > +	if (val)
> > +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
> > +
> 
> Not using "|="? Are feature-gso_tcpv{4,6} mutually exclusive?
> 
> Same question goes to the setting of gso_prefix_mask.
> 

That's very odd. You're correct and I could have sworn the code did have |= in these places; thanks for catching that.

  Paul

^ permalink raw reply

* Re: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Wei Liu @ 2013-10-08 13:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-6-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:16AM +0100, Paul Durrant wrote:
[...]
>  	/* Data must not cross a page boundary. */
>  	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>  		}
>  
>  		/* Leave a gap for the GSO descriptor. */
> -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		else
> +			gso_type = 0;

Should probably #define XEN_NETIF_GSO_TYPE_NONE 0 instead of using plain
0?

> +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else {
> +		gso_type = 0;

Ditto.

> -	if (!vif->gso_prefix)
> -		meta->gso_size = skb_shinfo(skb)->gso_size;
> -	else
> +	if ((1 << gso_type) & vif->gso_mask) {
> +		meta->gso_type = gso_type;
> +		meta->gso_size = gso_size;
> +	} else {
> +		meta->gso_type = 0;

Ditto.

>  		meta->gso_size = 0;
[...]
> @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
>  		val = 0;
>  	vif->can_sg = !!val;
>  
> +	vif->gso_mask = 0;
> +	vif->gso_prefix_mask = 0;
> +
>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->gso = !!val;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;

Not using "|=" ?

>  
>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->gso_prefix = !!val;
> +	if (val)
> +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> +
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
> +			 "%d", &val) < 0)
> +		val = 0;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
> +

Not using "|="? Are feature-gso_tcpv{4,6} mutually exclusive?

Same question goes to the setting of gso_prefix_mask.

Wei.

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Wei Liu @ 2013-10-08 13:31 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-5-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
> This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
            ^ adds            feature               advertise

> handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the frontend
> passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6 added to
> netif.h.

Note the new type should be synced to Xen's netif.h tree with separate
patch for Xen.

Wei.

^ permalink raw reply

* Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
From: Wei Liu @ 2013-10-08 13:30 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-3-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
[...]
> -/*
> - * This is the amount of packet we copy rather than map, so that the
> - * guest can't fiddle with the contents of the headers while we do
> - * packet processing on them (netfilter, routing, etc).
> +/* This is a miniumum size for the linear area to avoid lots of
> + * calls to __pskb_pull_tail() as we set up checksum offsets.
>   */
> -#define PKT_PROT_LEN    (ETH_HLEN + \
> -			 VLAN_HLEN + \
> -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> +#define PKT_PROT_LEN 128
>  

Where does 128 come from?

>  static u16 frag_get_pending_idx(skb_frag_t *frag)
>  {
> @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
>  	return 0;
>  }
>  
> -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
>  {
> -	struct iphdr *iph;
> +	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> +		int target = min_t(int, skb->len, len);
> +		__pskb_pull_tail(skb, target - skb_headlen(skb));
> +	}
> +}
> +
> +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> +			     int recalculate_partial_csum)
> +{
[...]
>  
>  		if (recalculate_partial_csum) {
>  			struct tcphdr *tcph = tcp_hdr(skb);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct tcphdr) +
> +				MAX_TCP_OPTION_SPACE;
> +			maybe_pull_tail(skb, header_size);
> +

I presume this function is checksum_setup stripped down to handle IPv4
packet. What's the purpose of changing its behaviour? Why the pull_tail
here?

> +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> +			       int recalculate_partial_csum)
> +{
> +	int err = -EPROTO;
> +	struct ipv6hdr *ipv6h = (void *)skb->data;
> +	u8 nexthdr;
> +	unsigned int header_size;
> +	unsigned int off;
> +	bool done;
> +
> +	done = false;
> +	off = sizeof(struct ipv6hdr);
> +
> +	header_size = skb->network_header + off;
> +	maybe_pull_tail(skb, header_size);
> +
> +	nexthdr = ipv6h->nexthdr;
> +
> +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> +	       !done) {
> +		/* We only access at most the first 2 bytes of any option header
> +		 * to determine the next one.
> +		 */
> +		header_size = skb->network_header + off + 2;
> +		maybe_pull_tail(skb, header_size);
> +

Will this cause performance problem? Is it possible that you pull too
many times?

> +		switch (nexthdr) {
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += 8;
> +			break;
> +		}
> +		case IPPROTO_DSTOPTS:
> +		case IPPROTO_HOPOPTS:
> +		case IPPROTO_ROUTING: {
> +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_optlen(hp);
> +			break;
> +		}
> +		case IPPROTO_AH: {
> +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += (hp->hdrlen+2)<<2;
> +			break;
> +		}
> +		default:
> +			done = true;
> +			break;

Can you make the logic explicit here?

   case IPPROTO_TCP:
   case IPPROTO_UDP:
        done = true;
	break;
   default:
        break;

Another minor suggestion is that change "done" to "found" because you're
trying to find the two type of headers.

> +	switch (nexthdr) {
> +	case IPPROTO_TCP:
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct tcphdr, check)))
> +			goto out;
> +
> +		if (recalculate_partial_csum) {
> +			struct tcphdr *tcph = tcp_hdr(skb);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct tcphdr) +
> +				MAX_TCP_OPTION_SPACE;
> +			maybe_pull_tail(skb, header_size);
> +

Same question as IPv4 counterpart, why do you need to pull here?

Wei.

^ permalink raw reply

* Re: [PATCH 2/2] cgroup: cls: remove unnecessary task_cls_classid
From: Neil Horman @ 2013-10-08 13:17 UTC (permalink / raw)
  To: Gao feng
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	jhs-jkUAjuhPggJWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w
In-Reply-To: <1381201520-25938-2-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

On Tue, Oct 08, 2013 at 11:05:20AM +0800, Gao feng wrote:
> We can get classid through cgroup_subsys_state,
> this is directviewing and effective.
> 
> Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> ---
>  net/sched/cls_cgroup.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 867b4a3..16006c9 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -72,11 +72,11 @@ static void cgrp_attach(struct cgroup_subsys_state *css,
>  			struct cgroup_taskset *tset)
>  {
>  	struct task_struct *p;
> -	void *v;
> +	struct cgroup_cls_state *cs = css_cls_state(css);
> +	void *v = (void *)(unsigned long)cs->classid;
>  
>  	cgroup_taskset_for_each(p, css, tset) {
>  		task_lock(p);
> -		v = (void *)(unsigned long)task_cls_classid(p);
>  		iterate_fd(p->files, 0, update_classid, v);
>  		task_unlock(p);
>  	}
> -- 
> 1.8.3.1
> 
> 

Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 13:16 UTC (permalink / raw)
  To: David Laight
  Cc: Johannes Berg, linux-wireless, netdev, Patch Tracking,
	John Linville
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B737B@saturn3.aculab.com>

On 8 October 2013 15:01, David Laight <David.Laight@aculab.com> wrote:
>> Hmm, thanks I guess. I'll need to review this in more detail, but I have
>> a question first:
>>
>> > +   /* allocate the variable sized aead_request on the stack */
>> > +   int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
>> > +                        sizeof(struct aead_request));
>> > +   struct aead_request req[1 + l];
>>
>> This looks a bit odd, why round up first and then add one? Why even
>> bother using a struct array rather than some local struct like
>
> Is it even a good idea to be allocating variable sized items
> on the kernel stack?
>
> There has to be enough stack available for the maximum number
> of entries - so there is little point in dynamically sizing it.
>

The result of crypto_aead_reqsize() has nothing to do with the input
or output data, it is a property of the particular implementation of
ccm(aes) that is being used. In the generic case (ccm.c),
it always returns the size of this struct

struct crypto_ccm_req_priv_ctx {
        u8 odata[16];
        u8 idata[16];
        u8 auth_tag[16];
        u32 ilen;
        u32 flags;
        struct scatterlist src[2];
        struct scatterlist dst[2];
        struct ablkcipher_request abreq;
};

while the particular implementation that I am working on for ARM64
always has size 0. Note that this is data that would otherwise be
allocated on the stack, but in the case of aead, which supports an
asynchronous interface (which this code does not use btw), the data is
attached to the end of the aead_request struct instead.

The alternative is to allocate it dynamically using GFP_ATOMIC and
free it at the end of the function, but in this particular case I
don't think that makes much sense tbh

-- 
Ard.

^ permalink raw reply

* Re: [PATCH 1/2] cgroup: netprio: remove unnecessary task_netprioidx
From: Neil Horman @ 2013-10-08 13:15 UTC (permalink / raw)
  To: Gao feng
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	jhs-jkUAjuhPggJWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w
In-Reply-To: <1381201520-25938-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

On Tue, Oct 08, 2013 at 11:05:19AM +0800, Gao feng wrote:
> Since the tasks have been migrated to the cgroup,
> there is no need to call task_netprioidx to get
> task's cgroup id.
> 
> Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> ---
>  net/core/netprio_cgroup.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index d9cd627..9b7cf6c 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -222,11 +222,10 @@ static void net_prio_attach(struct cgroup_subsys_state *css,
>  			    struct cgroup_taskset *tset)
>  {
>  	struct task_struct *p;
> -	void *v;
> +	void *v = (void *)(unsigned long)css->cgroup->id;
>  
>  	cgroup_taskset_for_each(p, css, tset) {
>  		task_lock(p);
> -		v = (void *)(unsigned long)task_netprioidx(p);
>  		iterate_fd(p->files, 0, update_netprio, v);
>  		task_unlock(p);
>  	}
> -- 
> 1.8.3.1
> 
> 

Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

^ permalink raw reply

* RE: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest
From: Paul Durrant @ 2013-10-08 13:14 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008131000.GH28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum
> offload to guest
> 
> On Tue, Oct 08, 2013 at 11:58:12AM +0100, Paul Durrant wrote:
> > Check xenstore flag feature-ipv6-csum-offload to determine if a
> > guest is happy to accept IPv6 packets with only partial checksum.
> > Also check analogous feature-ip-csum-offload to determone if a
> 
> determone -> determine
> 

Ok.

> > guest is happy to accept IPv4 packets with only partial checksum
> > as a replacement for a negated feature-no-csum-offload value and
> > add a comment to deprecate use of feature-no-csum-offload.
> >
> [...]
> > @@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent,
> domid_t domid,
> >  	vif->domid  = domid;
> >  	vif->handle = handle;
> >  	vif->can_sg = 1;
> > -	vif->csum = 1;
> > +	vif->ip_csum = 1;
> 
> Not setting a default value for vif->ipv6_csum?
> 

The default is 0 so no need.

> >  	vif->dev = dev;
> >
> [...]
> > +	/* Before feature-ipv6-csum-offload was introduced, checksum
> offload
> > +	 * was turned on by a zero value in (or lack of)
> > +	 * feature-no-csum-offload. Therefore, for compatibility, assume
> > +	 * that if feature-ip-csum-offload is missing that IP (v4) offload
> > +	 * is turned on. If this is not the case then the flag will be
> > +	 * cleared when we subsequently sample feature-no-csum-offload.
> > +	 */
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-
> offload",
> > +			 "%d", &val) < 0)
> > +		val = 1;
> > +	vif->ip_csum = !!val;
> > +
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-
> offload",
> > +			 "%d", &val) < 0)
> > +		val = 0;
> > +	vif->ipv6_csum = !!val;
> > +
> > +	/* This is deprecated. Frontends should set IP v4 and v6 checksum
> > +	 * offload using feature-ip-csum-offload and
> > +	 * feature-ipv6-csum-offload respectively.
> > +	 */
> 
> These comments on feature flags should also be synchronized to master
> netif.h in Xen and Linux's header. We've been trying to document thing
> along the way for quite some time and the long term benifit is big.
> 

Ok. That sounds like a good idea. I'll add a patch to do that.

> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-
> offload",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->csum = !val;
> > +	if (val)
> > +		vif->ip_csum = 0;
> 
> Do you also need to set vif->ipv6_csum to 0? I don't think a frontend
> which cannot deal with V4 csum offload has the ability to deal with V6
> csum offload.
> 

I was just trying to preserve the existing semantic of feature-no-csum-offload, which only affects v4. Since it's deprecated, should we add a new semantic?

  Paul

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox