Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-08 11:52 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-wireless, netdev, patches, linville
In-Reply-To: <1381231915-24232-1-git-send-email-ard.biesheuvel@linaro.org>

On Tue, 2013-10-08 at 13:31 +0200, Ard Biesheuvel wrote:

Hmm, thanks I guess. I'll need to review this in more detail, but I have
a question first:

> +	/* allocate the variable sized aead_request on the stack */
> +	int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
> +			     sizeof(struct aead_request));
> +	struct aead_request req[1 + l];

This looks a bit odd, why round up first and then add one? Why even
bother using a struct array rather than some local struct like

struct {
  struct aead_request req;
  u8 data[crypto_aed_reqsize(tfm)];
} req_data;

or so?

johannes

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 12:00 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, netdev, Patch Tracking, linville
In-Reply-To: <1381233152.13359.10.camel@jlt4.sipsolutions.net>

On 8 October 2013 13:52, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2013-10-08 at 13:31 +0200, Ard Biesheuvel wrote:
>
> Hmm, thanks I guess. I'll need to review this in more detail, but I have
> a question first:
>
>> +     /* allocate the variable sized aead_request on the stack */
>> +     int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
>> +                          sizeof(struct aead_request));
>> +     struct aead_request req[1 + l];
>
> This looks a bit odd, why round up first and then add one? Why even
> bother using a struct array rather than some local struct like
>
> struct {
>   struct aead_request req;
>   u8 data[crypto_aed_reqsize(tfm)];
> } req_data;
>
> or so?
>

Yes, that looks much better. I will put that in my v2, let me know if
you have more questions/comments.

BTW I should probably have mentioned that this is fully tested code:
my zd1211 works happily with it using pairwise CCMP with no
regressions in performance.


Cheers,
Ard.

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-08 12:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Patch Tracking,
	linville-2XuSBdqkA4R54TAoqtyWWQ
In-Reply-To: <CAKv+Gu9-ddqyuGg=NDFoWzNko+0uMG_po6WnzTc7g3DAzx=_dQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, 2013-10-08 at 14:00 +0200, Ard Biesheuvel wrote:

> BTW I should probably have mentioned that this is fully tested code:
> my zd1211 works happily with it using pairwise CCMP with no
> regressions in performance.

Good to know. Not that I think zd1211 is a good device for performance
tests (you'd have to measure CPU utilisation in some way), but
ultimately it doesn't really matter as all the high-performance devices
need hardware crypto anyway.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 12:20 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless, netdev, Patch Tracking, John Linville
In-Reply-To: <1381234564.13359.11.camel@jlt4.sipsolutions.net>

On 8 October 2013 14:16, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Tue, 2013-10-08 at 14:00 +0200, Ard Biesheuvel wrote:
>
>> BTW I should probably have mentioned that this is fully tested code:
>> my zd1211 works happily with it using pairwise CCMP with no
>> regressions in performance.
>
> Good to know. Not that I think zd1211 is a good device for performance
> tests (you'd have to measure CPU utilisation in some way), but
> ultimately it doesn't really matter as all the high-performance devices
> need hardware crypto anyway.
>

I agree. I am not saying the typical performance of a zd1211 is a
meaningful quantity, just that this particular device performs
identically before and after applying this patch, which suggests that
it is not creating lots of corrupted frames or messing up the
authentication.

Regards,
-- 
Ard.

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Alexander Gordeev @ 2013-10-08 12:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Benjamin Herrenschmidt, Ben Hutchings, linux-kernel,
	Bjorn Helgaas, Ralf Baechle, Michael Ellerman, Martin Schwidefsky,
	Ingo Molnar, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel
In-Reply-To: <20131007180111.GC2481@htj.dyndns.org>

On Mon, Oct 07, 2013 at 02:01:11PM -0400, Tejun Heo wrote:
> > What about introducing pci_lock_msi() and pci_unlock_msi() and let device
> > drivers care about their ranges and specifics in race-safe manner?
> > I do not call to introduce it right now (since it appears pSeries has not
> > been hitting the race for years) just as a possible alternative to Ben's
> > proposal.
> 
> I don't think the same race condition would happen with the loop.

We need to distinguish the contexts we're discussing.

If we talk about pSeries quota, then the current pSeries pci_enable_msix()
implementation is racy internally and could fail if the quota went down
*while* pci_enable_msix() is executing. In this case the loop will have to
exit rather than retry with a lower number (what number?).

In this regard the new scheme does not bring anything new and relies on
the fact this race does not hit and therefore does not worry.

If we talk about quota as it has to be, then yes - the loop scheme seems
more preferable.

Overall, looks like we just need to fix the pSeries implementation,
if the guys want it, he-he :)

> The problem case is where multiple msi(x) allocation fails completely
> because the global limit went down before inquiry and allocation.  In
> the loop based interface, it'd retry with the lower number.

I am probably missing something here. If the global limit went down before
inquiry then the inquiry will get what is available and try to allocate with
than number.

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH net RFC 1/2] netdevice: add might_sleep() call to napi_disable
From: Amir Vadai @ 2013-10-08 12:39 UTC (permalink / raw)
  To: Jacob Keller
  Cc: netdev, Alex Duyck, Hyong-Youb Kim, Dmitry Kravkov, Eliezer Tamir
In-Reply-To: <20130918201949.8697.50700.stgit@jekeller-desk1.amr.corp.intel.com>

On 18/09/2013 23:20, Jacob Keller wrote:
> napi_disable potentially calls msleep(1) if the NAPI_STATE_SCHED bit is
> previously set by something else. Because it does not always call msleep, it
> was difficult to track down a bug related to calling napi_disable within
> local_bh_disable()d context. This patch adds a might_sleep() call to the
> napi_disable routine in order to aid in the future debugging of similar issues.
> This will cause a BUG in drivers which have implemented busy polling in a
> similar fashion to ixgbe, and which call napi_disable inside the
> local_bh_disable()d section where the vector napi lock is taken.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> Cc: Alex Duyck <alexander.h.duyck@intel.com>
> Cc: Hyong-Youb Kim <hykim@myri.com>
> Cc: Amir Vadai <amirv@mellanox.com>
> Cc: Dmitry Kravkov <dmitry@broadcom.com>
> ---
>  include/linux/netdevice.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3de49ac..81dab00 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -483,6 +483,8 @@ extern void napi_hash_del(struct napi_struct *napi);
>   */
>  static inline void napi_disable(struct napi_struct *n)
>  {
> +	might_sleep();
> +
>  	set_bit(NAPI_STATE_DISABLE, &n->state);
>  	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
>  		msleep(1);
> 

Works ok with mlx4_en driver.

Acked-By: Amir Vadai <amirv@mellanox.com>

^ permalink raw reply

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Eric Dumazet @ 2013-10-08 13:00 UTC (permalink / raw)
  To: Jon Mason
  Cc: Jiri Pirko, netdev, yoshfuji, David Miller, kuznet, jmorris,
	kaber, herbert
In-Reply-To: <CAPoiz9wxxvHfmmMOCvPstzT51BafGfU1u5umxBQ8kqqCE=OzbQ@mail.gmail.com>

On Tue, 2013-10-08 at 01:07 -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> 
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants).  I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.
> 
> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).

I think this behavior is quite common. Driver should certainly already
do the right thing, as TCP frames can have the same property.

bnx2x for example splits skb->head if it contains payload after headers.

drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

                if (unlikely(skb_headlen(skb) > hlen)) {
                        nbd++;
                        bd_prod = bnx2x_tx_split(bp, txdata, tx_buf,
                                                 &tx_start_bd, hlen,
                                                 bd_prod);
                }

^ permalink raw reply

* RE: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: David Laight @ 2013-10-08 13:01 UTC (permalink / raw)
  To: Johannes Berg, Ard Biesheuvel
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, patches-QSEj5FYQhm4dnm+yROfE0A,
	linville-2XuSBdqkA4R54TAoqtyWWQ
In-Reply-To: <1381233152.13359.10.camel-8Nb76shvtaUJvtFkdXX2HixXY32XiHfO@public.gmane.org>

> Hmm, thanks I guess. I'll need to review this in more detail, but I have
> a question first:
> 
> > +	/* allocate the variable sized aead_request on the stack */
> > +	int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
> > +			     sizeof(struct aead_request));
> > +	struct aead_request req[1 + l];
> 
> This looks a bit odd, why round up first and then add one? Why even
> bother using a struct array rather than some local struct like

Is it even a good idea to be allocating variable sized items
on the kernel stack?

There has to be enough stack available for the maximum number
of entries - so there is little point in dynamically sizing it.

	David


^ permalink raw reply

* [PATCH iproute2] xfrm: enable to set non-wildcard mark 0 on SAs and SPs
From: Christophe Gouault @ 2013-10-08 12:56 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Christophe Gouault

ip xfrm considers that the user-defined mark is "any" as soon as
(mark.v & mark.m == 0), which prevents from specifying non-wildcard
marks that include the value 0 (typically 0/0xffffffff).

Yet, matching exactly mark 0 is useful for instance to separate
vti policies from global policies.

Always configure the user mark if mark.m != 0.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
---
 ip/xfrm_policy.c |    2 +-
 ip/xfrm_state.c  |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 36e33c9..a8d8b98 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -373,7 +373,7 @@ static int xfrm_policy_modify(int cmd, unsigned flags, int argc, char **argv)
 			  (void *)tmpls_buf, tmpls_len);
 	}
 
-	if (mark.m & mark.v) {
+	if (mark.m) {
 		int r = addattr_l(&req.n, sizeof(req.buf), XFRMA_MARK,
 				  (void *)&mark, sizeof(mark));
 		if (r < 0) {
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index f4ad4cb..c4d2bf6 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -528,7 +528,7 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv)
 		exit(1);
 	}
 
-	if (mark.m & mark.v) {
+	if (mark.m) {
 		int r = addattr_l(&req.n, sizeof(req.buf), XFRMA_MARK,
 				  (void *)&mark, sizeof(mark));
 		if (r < 0) {
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest
From: Wei Liu @ 2013-10-08 13:10 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-2-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:12AM +0100, Paul Durrant wrote:
> Check xenstore flag feature-ipv6-csum-offload to determine if a
> guest is happy to accept IPv6 packets with only partial checksum.
> Also check analogous feature-ip-csum-offload to determone if a

determone -> determine

> guest is happy to accept IPv4 packets with only partial checksum
> as a replacement for a negated feature-no-csum-offload value and
> add a comment to deprecate use of feature-no-csum-offload.
> 
[...]
> @@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	vif->domid  = domid;
>  	vif->handle = handle;
>  	vif->can_sg = 1;
> -	vif->csum = 1;
> +	vif->ip_csum = 1;

Not setting a default value for vif->ipv6_csum?

>  	vif->dev = dev;
>  
[...]
> +	/* Before feature-ipv6-csum-offload was introduced, checksum offload
> +	 * was turned on by a zero value in (or lack of)
> +	 * feature-no-csum-offload. Therefore, for compatibility, assume
> +	 * that if feature-ip-csum-offload is missing that IP (v4) offload
> +	 * is turned on. If this is not the case then the flag will be
> +	 * cleared when we subsequently sample feature-no-csum-offload.
> +	 */
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-offload",
> +			 "%d", &val) < 0)
> +		val = 1;
> +	vif->ip_csum = !!val;
> +
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
> +			 "%d", &val) < 0)
> +		val = 0;
> +	vif->ipv6_csum = !!val;
> +
> +	/* This is deprecated. Frontends should set IP v4 and v6 checksum
> +	 * offload using feature-ip-csum-offload and
> +	 * feature-ipv6-csum-offload respectively.
> +	 */

These comments on feature flags should also be synchronized to master
netif.h in Xen and Linux's header. We've been trying to document thing
along the way for quite some time and the long term benifit is big.

>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->csum = !val;
> +	if (val)
> +		vif->ip_csum = 0;

Do you also need to set vif->ipv6_csum to 0? I don't think a frontend
which cannot deal with V4 csum offload has the ability to deal with V6
csum offload.

Wei.

^ permalink raw reply

* RE: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest
From: Paul Durrant @ 2013-10-08 13:14 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008131000.GH28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum
> offload to guest
> 
> On Tue, Oct 08, 2013 at 11:58:12AM +0100, Paul Durrant wrote:
> > Check xenstore flag feature-ipv6-csum-offload to determine if a
> > guest is happy to accept IPv6 packets with only partial checksum.
> > Also check analogous feature-ip-csum-offload to determone if a
> 
> determone -> determine
> 

Ok.

> > guest is happy to accept IPv4 packets with only partial checksum
> > as a replacement for a negated feature-no-csum-offload value and
> > add a comment to deprecate use of feature-no-csum-offload.
> >
> [...]
> > @@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent,
> domid_t domid,
> >  	vif->domid  = domid;
> >  	vif->handle = handle;
> >  	vif->can_sg = 1;
> > -	vif->csum = 1;
> > +	vif->ip_csum = 1;
> 
> Not setting a default value for vif->ipv6_csum?
> 

The default is 0 so no need.

> >  	vif->dev = dev;
> >
> [...]
> > +	/* Before feature-ipv6-csum-offload was introduced, checksum
> offload
> > +	 * was turned on by a zero value in (or lack of)
> > +	 * feature-no-csum-offload. Therefore, for compatibility, assume
> > +	 * that if feature-ip-csum-offload is missing that IP (v4) offload
> > +	 * is turned on. If this is not the case then the flag will be
> > +	 * cleared when we subsequently sample feature-no-csum-offload.
> > +	 */
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-
> offload",
> > +			 "%d", &val) < 0)
> > +		val = 1;
> > +	vif->ip_csum = !!val;
> > +
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-
> offload",
> > +			 "%d", &val) < 0)
> > +		val = 0;
> > +	vif->ipv6_csum = !!val;
> > +
> > +	/* This is deprecated. Frontends should set IP v4 and v6 checksum
> > +	 * offload using feature-ip-csum-offload and
> > +	 * feature-ipv6-csum-offload respectively.
> > +	 */
> 
> These comments on feature flags should also be synchronized to master
> netif.h in Xen and Linux's header. We've been trying to document thing
> along the way for quite some time and the long term benifit is big.
> 

Ok. That sounds like a good idea. I'll add a patch to do that.

> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-
> offload",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->csum = !val;
> > +	if (val)
> > +		vif->ip_csum = 0;
> 
> Do you also need to set vif->ipv6_csum to 0? I don't think a frontend
> which cannot deal with V4 csum offload has the ability to deal with V6
> csum offload.
> 

I was just trying to preserve the existing semantic of feature-no-csum-offload, which only affects v4. Since it's deprecated, should we add a new semantic?

  Paul

^ permalink raw reply

* Re: [PATCH 1/2] cgroup: netprio: remove unnecessary task_netprioidx
From: Neil Horman @ 2013-10-08 13:15 UTC (permalink / raw)
  To: Gao feng
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	jhs-jkUAjuhPggJWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w
In-Reply-To: <1381201520-25938-1-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

On Tue, Oct 08, 2013 at 11:05:19AM +0800, Gao feng wrote:
> Since the tasks have been migrated to the cgroup,
> there is no need to call task_netprioidx to get
> task's cgroup id.
> 
> Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> ---
>  net/core/netprio_cgroup.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index d9cd627..9b7cf6c 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -222,11 +222,10 @@ static void net_prio_attach(struct cgroup_subsys_state *css,
>  			    struct cgroup_taskset *tset)
>  {
>  	struct task_struct *p;
> -	void *v;
> +	void *v = (void *)(unsigned long)css->cgroup->id;
>  
>  	cgroup_taskset_for_each(p, css, tset) {
>  		task_lock(p);
> -		v = (void *)(unsigned long)task_netprioidx(p);
>  		iterate_fd(p->files, 0, update_netprio, v);
>  		task_unlock(p);
>  	}
> -- 
> 1.8.3.1
> 
> 

Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 13:16 UTC (permalink / raw)
  To: David Laight
  Cc: Johannes Berg, linux-wireless, netdev, Patch Tracking,
	John Linville
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B737B@saturn3.aculab.com>

On 8 October 2013 15:01, David Laight <David.Laight@aculab.com> wrote:
>> Hmm, thanks I guess. I'll need to review this in more detail, but I have
>> a question first:
>>
>> > +   /* allocate the variable sized aead_request on the stack */
>> > +   int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
>> > +                        sizeof(struct aead_request));
>> > +   struct aead_request req[1 + l];
>>
>> This looks a bit odd, why round up first and then add one? Why even
>> bother using a struct array rather than some local struct like
>
> Is it even a good idea to be allocating variable sized items
> on the kernel stack?
>
> There has to be enough stack available for the maximum number
> of entries - so there is little point in dynamically sizing it.
>

The result of crypto_aead_reqsize() has nothing to do with the input
or output data, it is a property of the particular implementation of
ccm(aes) that is being used. In the generic case (ccm.c),
it always returns the size of this struct

struct crypto_ccm_req_priv_ctx {
        u8 odata[16];
        u8 idata[16];
        u8 auth_tag[16];
        u32 ilen;
        u32 flags;
        struct scatterlist src[2];
        struct scatterlist dst[2];
        struct ablkcipher_request abreq;
};

while the particular implementation that I am working on for ARM64
always has size 0. Note that this is data that would otherwise be
allocated on the stack, but in the case of aead, which supports an
asynchronous interface (which this code does not use btw), the data is
attached to the end of the aead_request struct instead.

The alternative is to allocate it dynamically using GFP_ATOMIC and
free it at the end of the function, but in this particular case I
don't think that makes much sense tbh

-- 
Ard.

^ permalink raw reply

* Re: [PATCH 2/2] cgroup: cls: remove unnecessary task_cls_classid
From: Neil Horman @ 2013-10-08 13:17 UTC (permalink / raw)
  To: Gao feng
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA,
	jhs-jkUAjuhPggJWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w
In-Reply-To: <1381201520-25938-2-git-send-email-gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>

On Tue, Oct 08, 2013 at 11:05:20AM +0800, Gao feng wrote:
> We can get classid through cgroup_subsys_state,
> this is directviewing and effective.
> 
> Signed-off-by: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
> ---
>  net/sched/cls_cgroup.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
> index 867b4a3..16006c9 100644
> --- a/net/sched/cls_cgroup.c
> +++ b/net/sched/cls_cgroup.c
> @@ -72,11 +72,11 @@ static void cgrp_attach(struct cgroup_subsys_state *css,
>  			struct cgroup_taskset *tset)
>  {
>  	struct task_struct *p;
> -	void *v;
> +	struct cgroup_cls_state *cs = css_cls_state(css);
> +	void *v = (void *)(unsigned long)cs->classid;
>  
>  	cgroup_taskset_for_each(p, css, tset) {
>  		task_lock(p);
> -		v = (void *)(unsigned long)task_cls_classid(p);
>  		iterate_fd(p->files, 0, update_classid, v);
>  		task_unlock(p);
>  	}
> -- 
> 1.8.3.1
> 
> 

Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

^ permalink raw reply

* Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
From: Wei Liu @ 2013-10-08 13:30 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-3-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
[...]
> -/*
> - * This is the amount of packet we copy rather than map, so that the
> - * guest can't fiddle with the contents of the headers while we do
> - * packet processing on them (netfilter, routing, etc).
> +/* This is a miniumum size for the linear area to avoid lots of
> + * calls to __pskb_pull_tail() as we set up checksum offsets.
>   */
> -#define PKT_PROT_LEN    (ETH_HLEN + \
> -			 VLAN_HLEN + \
> -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> +#define PKT_PROT_LEN 128
>  

Where does 128 come from?

>  static u16 frag_get_pending_idx(skb_frag_t *frag)
>  {
> @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
>  	return 0;
>  }
>  
> -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
>  {
> -	struct iphdr *iph;
> +	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> +		int target = min_t(int, skb->len, len);
> +		__pskb_pull_tail(skb, target - skb_headlen(skb));
> +	}
> +}
> +
> +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> +			     int recalculate_partial_csum)
> +{
[...]
>  
>  		if (recalculate_partial_csum) {
>  			struct tcphdr *tcph = tcp_hdr(skb);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct tcphdr) +
> +				MAX_TCP_OPTION_SPACE;
> +			maybe_pull_tail(skb, header_size);
> +

I presume this function is checksum_setup stripped down to handle IPv4
packet. What's the purpose of changing its behaviour? Why the pull_tail
here?

> +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> +			       int recalculate_partial_csum)
> +{
> +	int err = -EPROTO;
> +	struct ipv6hdr *ipv6h = (void *)skb->data;
> +	u8 nexthdr;
> +	unsigned int header_size;
> +	unsigned int off;
> +	bool done;
> +
> +	done = false;
> +	off = sizeof(struct ipv6hdr);
> +
> +	header_size = skb->network_header + off;
> +	maybe_pull_tail(skb, header_size);
> +
> +	nexthdr = ipv6h->nexthdr;
> +
> +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> +	       !done) {
> +		/* We only access at most the first 2 bytes of any option header
> +		 * to determine the next one.
> +		 */
> +		header_size = skb->network_header + off + 2;
> +		maybe_pull_tail(skb, header_size);
> +

Will this cause performance problem? Is it possible that you pull too
many times?

> +		switch (nexthdr) {
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += 8;
> +			break;
> +		}
> +		case IPPROTO_DSTOPTS:
> +		case IPPROTO_HOPOPTS:
> +		case IPPROTO_ROUTING: {
> +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_optlen(hp);
> +			break;
> +		}
> +		case IPPROTO_AH: {
> +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += (hp->hdrlen+2)<<2;
> +			break;
> +		}
> +		default:
> +			done = true;
> +			break;

Can you make the logic explicit here?

   case IPPROTO_TCP:
   case IPPROTO_UDP:
        done = true;
	break;
   default:
        break;

Another minor suggestion is that change "done" to "found" because you're
trying to find the two type of headers.

> +	switch (nexthdr) {
> +	case IPPROTO_TCP:
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct tcphdr, check)))
> +			goto out;
> +
> +		if (recalculate_partial_csum) {
> +			struct tcphdr *tcph = tcp_hdr(skb);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct tcphdr) +
> +				MAX_TCP_OPTION_SPACE;
> +			maybe_pull_tail(skb, header_size);
> +

Same question as IPv4 counterpart, why do you need to pull here?

Wei.

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Wei Liu @ 2013-10-08 13:31 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-5-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
> This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
            ^ adds            feature               advertise

> handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the frontend
> passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6 added to
> netif.h.

Note the new type should be synced to Xen's netif.h tree with separate
patch for Xen.

Wei.

^ permalink raw reply

* Re: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Wei Liu @ 2013-10-08 13:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381229896-18657-6-git-send-email-paul.durrant@citrix.com>

On Tue, Oct 08, 2013 at 11:58:16AM +0100, Paul Durrant wrote:
[...]
>  	/* Data must not cross a page boundary. */
>  	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>  		}
>  
>  		/* Leave a gap for the GSO descriptor. */
> -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		else
> +			gso_type = 0;

Should probably #define XEN_NETIF_GSO_TYPE_NONE 0 instead of using plain
0?

> +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else {
> +		gso_type = 0;

Ditto.

> -	if (!vif->gso_prefix)
> -		meta->gso_size = skb_shinfo(skb)->gso_size;
> -	else
> +	if ((1 << gso_type) & vif->gso_mask) {
> +		meta->gso_type = gso_type;
> +		meta->gso_size = gso_size;
> +	} else {
> +		meta->gso_type = 0;

Ditto.

>  		meta->gso_size = 0;
[...]
> @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
>  		val = 0;
>  	vif->can_sg = !!val;
>  
> +	vif->gso_mask = 0;
> +	vif->gso_prefix_mask = 0;
> +
>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->gso = !!val;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;

Not using "|=" ?

>  
>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->gso_prefix = !!val;
> +	if (val)
> +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> +
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
> +			 "%d", &val) < 0)
> +		val = 0;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
> +

Not using "|="? Are feature-gso_tcpv{4,6} mutually exclusive?

Same question goes to the setting of gso_prefix_mask.

Wei.

^ permalink raw reply

* RE: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-08 13:40 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008133421.GK28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:34
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to
> the guest
> 
> On Tue, Oct 08, 2013 at 11:58:16AM +0100, Paul Durrant wrote:
> [...]
> >  	/* Data must not cross a page boundary. */
> >  	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> > @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif
> *vif, struct sk_buff *skb,
> >  		}
> >
> >  		/* Leave a gap for the GSO descriptor. */
> > -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> > +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> > +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> > +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		else
> > +			gso_type = 0;
> 
> Should probably #define XEN_NETIF_GSO_TYPE_NONE 0 instead of using
> plain
> 0?

All I need is a bit shift that's not going to hit in the mask. Probably worth reserving the value in the header.

> 
> > +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else {
> > +		gso_type = 0;
> 
> Ditto.
> 
> > -	if (!vif->gso_prefix)
> > -		meta->gso_size = skb_shinfo(skb)->gso_size;
> > -	else
> > +	if ((1 << gso_type) & vif->gso_mask) {
> > +		meta->gso_type = gso_type;
> > +		meta->gso_size = gso_size;
> > +	} else {
> > +		meta->gso_type = 0;
> 
> Ditto.
> 
> >  		meta->gso_size = 0;
> [...]
> > @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
> >  		val = 0;
> >  	vif->can_sg = !!val;
> >
> > +	vif->gso_mask = 0;
> > +	vif->gso_prefix_mask = 0;
> > +
> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->gso = !!val;
> > +	if (val)
> > +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> 
> Not using "|=" ?
> 
> >
> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-
> prefix",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->gso_prefix = !!val;
> > +	if (val)
> > +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> > +
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
> > +			 "%d", &val) < 0)
> > +		val = 0;
> > +	if (val)
> > +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
> > +
> 
> Not using "|="? Are feature-gso_tcpv{4,6} mutually exclusive?
> 
> Same question goes to the setting of gso_prefix_mask.
> 

That's very odd. You're correct and I could have sworn the code did have |= in these places; thanks for catching that.

  Paul

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Ard Biesheuvel @ 2013-10-08 13:41 UTC (permalink / raw)
  To: David Laight
  Cc: Johannes Berg,
	<linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	<linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B737B-CgBM+Bx2aUAnGFn1LkZF6NBPR1lH4CV8@public.gmane.org>



On 8 okt. 2013, at 15:01, "David Laight" <David.Laight-ZS65k/vG3HxXrIkS9f7CXA@public.gmane.org> wrote:

>> Hmm, thanks I guess. I'll need to review this in more detail, but I have
>> a question first:
>> 
>>> +    /* allocate the variable sized aead_request on the stack */
>>> +    int l = DIV_ROUND_UP(crypto_aead_reqsize(tfm),
>>> +                 sizeof(struct aead_request));
>>> +    struct aead_request req[1 + l];
>> 
>> This looks a bit odd, why round up first and then add one? Why even
>> bother using a struct array rather than some local struct like
> 
> Is it even a good idea to be allocating variable sized items
> on the kernel stack?
> 
> There has to be enough stack available for the maximum number
> of entries - so there is little point in dynamically sizing it.

Actually, as the size is always the same, it should be feasible to alloc a couple of request structs at init time. would one for rx and one for tx be sufficient? or is this code more reentrant than that?

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Paul Durrant @ 2013-10-08 13:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008133142.GJ28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:32
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO
> packets from the guest
> 
> On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
> > This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
>             ^ adds            feature               advertise
> 
> > handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the
> frontend
> > passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6
> added to
> > netif.h.
> 
> Note the new type should be synced to Xen's netif.h tree with separate
> patch for Xen.
> 

Ok. Do you think it is reasonable to make incremental changes to netif.h in this series and then submit a patch to xen-devel with the complete set once this series is accepted?

  Paul

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-08 13:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: David Laight, <linux-wireless@vger.kernel.org>,
	<netdev@vger.kernel.org>, <patches@linaro.org>,
	<linville@tuxdriver.com>
In-Reply-To: <23D74823-0DE3-4A87-9E89-310F437A328D@linaro.org>

On Tue, 2013-10-08 at 15:41 +0200, Ard Biesheuvel wrote:

> Actually, as the size is always the same, it should be feasible to
> alloc a couple of request structs at init time. would one for rx and
> one for tx be sufficient? or is this code more reentrant than that?

TX can run concurrently on multiple (four) queues using the same key.

johannes

^ permalink raw reply

* Re: [PATCH] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-08 13:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: David Laight, <linux-wireless@vger.kernel.org>,
	<netdev@vger.kernel.org>, <patches@linaro.org>,
	<linville@tuxdriver.com>
In-Reply-To: <1381239908.13359.12.camel@jlt4.sipsolutions.net>

On Tue, 2013-10-08 at 15:45 +0200, Johannes Berg wrote:
> On Tue, 2013-10-08 at 15:41 +0200, Ard Biesheuvel wrote:
> 
> > Actually, as the size is always the same, it should be feasible to
> > alloc a couple of request structs at init time. would one for rx and
> > one for tx be sufficient? or is this code more reentrant than that?
> 
> TX can run concurrently on multiple (four) queues using the same key.

And maybe even more with injection ... I wouldn't go there.

johannes

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Jan Beulich @ 2013-10-08 13:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Vrabel, Ian Campbell, Wei Liu, xen-devel@lists.xen.org,
	netdev@vger.kernel.org
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD012F306@AMSPEX01CL01.citrite.net>

>>> On 08.10.13 at 15:42, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Wei Liu [mailto:wei.liu2@citrix.com]
>> Sent: 08 October 2013 14:32
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
>> Ian Campbell
>> Subject: Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO
>> packets from the guest
>> 
>> On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
>> > This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
>>             ^ adds            feature               advertise
>> 
>> > handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the
>> frontend
>> > passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6
>> added to
>> > netif.h.
>> 
>> Note the new type should be synced to Xen's netif.h tree with separate
>> patch for Xen.
>> 
> 
> Ok. Do you think it is reasonable to make incremental changes to netif.h in 
> this series and then submit a patch to xen-devel with the complete set once 
> this series is accepted?

I would think so.

Jan

^ permalink raw reply

* RE: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-08 13:50 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131008133045.GI28411@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:31
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6
> checksum offload from guest
> 
> On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
> [...]
> > -/*
> > - * This is the amount of packet we copy rather than map, so that the
> > - * guest can't fiddle with the contents of the headers while we do
> > - * packet processing on them (netfilter, routing, etc).
> > +/* This is a miniumum size for the linear area to avoid lots of
> > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> >   */
> > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > -			 VLAN_HLEN + \
> > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > +#define PKT_PROT_LEN 128
> >
> 
> Where does 128 come from?
> 

It's just an arbitrary power of 2 that was chosen because it seems to cover most likely v6 headers and all v4 headers.

> >  static u16 frag_get_pending_idx(skb_frag_t *frag)
> >  {
> > @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
> >  	return 0;
> >  }
> >
> > -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
> >  {
> > -	struct iphdr *iph;
> > +	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> > +		int target = min_t(int, skb->len, len);
> > +		__pskb_pull_tail(skb, target - skb_headlen(skb));
> > +	}
> > +}
> > +
> > +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> > +			     int recalculate_partial_csum)
> > +{
> [...]
> >
> >  		if (recalculate_partial_csum) {
> >  			struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct tcphdr) +
> > +				MAX_TCP_OPTION_SPACE;
> > +			maybe_pull_tail(skb, header_size);
> > +
> 
> I presume this function is checksum_setup stripped down to handle IPv4
> packet. What's the purpose of changing its behaviour? Why the pull_tail
> here?
> 

We have to make sure that the TCP header is in the linear area as we are about to write to the checksum field. In practice, the 128 byte pull should guarantee this but in case that is varied later I wanted to make sure things did not start to fail in an add way.

> > +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> > +			       int recalculate_partial_csum)
> > +{
> > +	int err = -EPROTO;
> > +	struct ipv6hdr *ipv6h = (void *)skb->data;
> > +	u8 nexthdr;
> > +	unsigned int header_size;
> > +	unsigned int off;
> > +	bool done;
> > +
> > +	done = false;
> > +	off = sizeof(struct ipv6hdr);
> > +
> > +	header_size = skb->network_header + off;
> > +	maybe_pull_tail(skb, header_size);
> > +
> > +	nexthdr = ipv6h->nexthdr;
> > +
> > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> &&
> > +	       !done) {
> > +		/* We only access at most the first 2 bytes of any option
> header
> > +		 * to determine the next one.
> > +		 */
> > +		header_size = skb->network_header + off + 2;
> > +		maybe_pull_tail(skb, header_size);
> > +
> 
> Will this cause performance problem? Is it possible that you pull too
> many times?
> 

I guess it means we may get two pulls for the TCP/UDP headers rather than one so could push the pulls into the individual cases if you think it will affect performance that badly.

> > +		switch (nexthdr) {
> > +		case IPPROTO_FRAGMENT: {
> > +			struct frag_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += 8;
> > +			break;
> > +		}
> > +		case IPPROTO_DSTOPTS:
> > +		case IPPROTO_HOPOPTS:
> > +		case IPPROTO_ROUTING: {
> > +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += ipv6_optlen(hp);
> > +			break;
> > +		}
> > +		case IPPROTO_AH: {
> > +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += (hp->hdrlen+2)<<2;
> > +			break;
> > +		}
> > +		default:
> > +			done = true;
> > +			break;
> 
> Can you make the logic explicit here?
> 
>    case IPPROTO_TCP:
>    case IPPROTO_UDP:
>         done = true;
> 	break;
>    default:
>         break;
> 
> Another minor suggestion is that change "done" to "found" because you're
> trying to find the two type of headers.
> 

Yes, I could code it that way if you prefer.

> > +	switch (nexthdr) {
> > +	case IPPROTO_TCP:
> > +		if (!skb_partial_csum_set(skb, off,
> > +					  offsetof(struct tcphdr, check)))
> > +			goto out;
> > +
> > +		if (recalculate_partial_csum) {
> > +			struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct tcphdr) +
> > +				MAX_TCP_OPTION_SPACE;
> > +			maybe_pull_tail(skb, header_size);
> > +
> 
> Same question as IPv4 counterpart, why do you need to pull here?
> 

Same answer as before ;-)

  Paul

^ permalink raw reply

* Re: IPv6 kernel warning
From: Neal Cardwell @ 2013-10-08 14:05 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: dormando, Michele Baldessari, Russell King - ARM Linux, netdev,
	Nandita Dukkipati
In-Reply-To: <CAK6E8=czoej81t=-J=gjjyQiGVbZ0qiNKBbeRVSWYtweXfSRNQ@mail.gmail.com>

On Mon, Oct 7, 2013 at 3:51 PM, Yuchung Cheng <ycheng@google.com> wrote:
> I suspect tcp_process_tlp_ack() should not revert state to Open
> directly, but calling tcp_try_keep_open() instead, similar to all the
> undo processing in the tcp_fastretrans_alert(): after
> tcp_end_cwnd_reduction(), the process (E) falls back to check other
> stats before moving to CA_Open.
>
>
> index 9c62257..9012b42 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3314,7 +3314,7 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack,
>                         tcp_init_cwnd_reduction(sk, true);
>                         tcp_set_ca_state(sk, TCP_CA_CWR);
>                         tcp_end_cwnd_reduction(sk);
> -                       tcp_set_ca_state(sk, TCP_CA_Open);
> +                       tcp_try_keep_open(sk);
>                         NET_INC_STATS_BH(sock_net(sk),
>                                          LINUX_MIB_TCPLOSSPROBERECOVERY);
>                 }

Yes, nice catch! This looks good to me. My testing confirms that this
definitely fixes a bug when this code fires and there are segments
SACKed out. Since it will stay in CA_Disorder if there are outstanding
retransmissions, I bet it will also fix the WARN_ON(tp->retrans_out !=
0) in state TCP_CA_Open that people are seeing.

neal

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox