Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] net: fix HAVE_EFFICIENT_UNALIGNED_ACCESS typos
From: Eric Dumazet @ 2016-04-20 14:31 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, roopa, tgraf, jhs
In-Reply-To: <1461146278.10638.253.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

HAVE_EFFICIENT_UNALIGNED_ACCESS needs CONFIG_ prefix.

Also add a comment in nla_align_64bit() explaining we have
to add a padding if current skb->data is aligned, as it
certainly can be confusing.

Fixes: 35c5845957c7 ("net: Add helpers for 64-bit aligning netlink attributes.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/netlink.h |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index e644b3489acf..cf95df1fa14b 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1238,18 +1238,21 @@ static inline int nla_validate_nested(const struct nlattr *start, int maxtype,
  * Conditionally emit a padding netlink attribute in order to make
  * the next attribute we emit have a 64-bit aligned nla_data() area.
  * This will only be done in architectures which do not have
- * HAVE_EFFICIENT_UNALIGNED_ACCESS defined.
+ * CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS defined.
  *
  * Returns zero on success or a negative error code.
  */
 static inline int nla_align_64bit(struct sk_buff *skb, int padattr)
 {
-#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
-	if (IS_ALIGNED((unsigned long)skb->data, 8)) {
-		struct nlattr *attr = nla_reserve(skb, padattr, 0);
-		if (!attr)
-			return -EMSGSIZE;
-	}
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+	/* The nlattr header is 4 bytes in size, that's why we test
+	 * if the skb->data _is_ aligned.  This NOP attribute, plus
+	 * nlattr header for next attribute, will make nla_data()
+	 * 8-byte aligned.
+	 */
+	if (IS_ALIGNED((unsigned long)skb->data, 8) &&
+	    !nla_reserve(skb, padattr, 0))
+		return -EMSGSIZE;
 #endif
 	return 0;
 }
@@ -1261,7 +1264,7 @@ static inline int nla_align_64bit(struct sk_buff *skb, int padattr)
 static inline int nla_total_size_64bit(int payload)
 {
 	return NLA_ALIGN(nla_attr_size(payload))
-#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 		+ NLA_ALIGN(nla_attr_size(0))
 #endif
 		;

^ permalink raw reply related

* Re: [PATCH] MAINTAINERS: net: add entry for TI Ethernet Switch drivers
From: Tony Lindgren @ 2016-04-20 14:23 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: netdev, linux-kernel, Sekhar Nori, linux-omap, David S. Miller,
	Mugunthan V N, Richard Cochran
In-Reply-To: <1461151509-2368-1-git-send-email-grygorii.strashko@ti.com>

* Grygorii Strashko <grygorii.strashko@ti.com> [160420 04:26]:
> Add record for TI Ethernet Switch Driver CPSW/CPDMA/MDIO HW
> (am33/am43/am57/dr7/davinci) to ensure that related patches
> will go through dedicated linux-omap list.
> 
> Also add Mugunthan as maintainer and myself as the reviewer.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mugunthan V N <mugunthanvnm@ti.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  MAINTAINERS | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1d5b4be..aca864d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11071,6 +11071,14 @@ S:	Maintained
>  F:	drivers/clk/ti/
>  F:	include/linux/clk/ti.h
>  
> +TI ETHERNET SWITCH DRIVER (CPSW)
> +M:	Mugunthan V N <mugunthanvnm@ti.com>
> +R:	Grygorii Strashko <grygorii.strashko@ti.com>
> +L:	linux-omap@vger.kernel.org
> +S:	Maintained
> +F:	drivers/net/ethernet/ti/cpsw*
> +F:	drivers/net/ethernet/ti/davinci*
> +
>  TI FLASH MEDIA INTERFACE DRIVER
>  M:	Alex Dubov <oakad@yahoo.com>
>  S:	Maintained
> -- 

Please add netdev list also there as the primary list:

L:	netdev@vger.kernel.org
L:	linux-omap@vger.kernel.org

Then we can easily review and ack the patches for Dave to apply.

Regards,

Tony

^ permalink raw reply

* [PATCH net-next] macvtap: add namespace support to the sysfs device class
From: Marc Angel @ 2016-04-20 14:11 UTC (permalink / raw)
  To: netdev; +Cc: ebiederm

When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):

/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN

This is reproducible by running the following commands:

ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap

The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.

The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.

This patch adds namespace support the the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.

However, doing this has the side effect of changing
/sys/devices/virtual/net/macvtapX/tapNN  into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN

This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces.")

Signed-off-by: Marc Angel <marc@arista.com>
---
I'm not sure that the problems described in that commit message
apply to macvtaps so maybe it is possible to keep the 'tapNN'
device directly under 'macvtapX' and not disrupt userland.

Should it even be possible to add a device of a class that doesn't
support namespaces under one that does?
This could lead to dead symlinks in the new device class directory or
duplicate warnings because a device of the same name already exists in
another namespace.
---
 drivers/net/macvtap.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 95394ed..a76d72d 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -129,7 +129,18 @@ static DEFINE_MUTEX(minor_lock);
 static DEFINE_IDR(minor_idr);
 
 #define GOODCOPY_LEN 128
-static struct class *macvtap_class;
+static const void *macvtap_net_namespace(struct device *d)
+{
+	struct net_device *dev = to_net_dev(d->parent);
+	return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+	.name = "macvtap",
+	.owner = THIS_MODULE,
+	.ns_type = &net_ns_type_operations,
+	.namespace = macvtap_net_namespace,
+};
 static struct cdev macvtap_cdev;
 
 static const struct proto_ops macvtap_socket_ops;
@@ -1295,7 +1306,7 @@ static int macvtap_device_event(struct notifier_block *unused,
 			return notifier_from_errno(err);
 
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		classdev = device_create(macvtap_class, &dev->dev, devt,
+		classdev = device_create(&macvtap_class, &dev->dev, devt,
 					 dev, "tap%d", dev->ifindex);
 		if (IS_ERR(classdev)) {
 			macvtap_free_minor(vlan);
@@ -1304,7 +1315,7 @@ static int macvtap_device_event(struct notifier_block *unused,
 		break;
 	case NETDEV_UNREGISTER:
 		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
-		device_destroy(macvtap_class, devt);
+		device_destroy(&macvtap_class, devt);
 		macvtap_free_minor(vlan);
 		break;
 	}
@@ -1330,11 +1341,9 @@ static int macvtap_init(void)
 	if (err)
 		goto out2;
 
-	macvtap_class = class_create(THIS_MODULE, "macvtap");
-	if (IS_ERR(macvtap_class)) {
-		err = PTR_ERR(macvtap_class);
+	err = class_register(&macvtap_class);
+	if (err)
 		goto out3;
-	}
 
 	err = register_netdevice_notifier(&macvtap_notifier_block);
 	if (err)
@@ -1349,7 +1358,7 @@ static int macvtap_init(void)
 out5:
 	unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 out3:
 	cdev_del(&macvtap_cdev);
 out2:
@@ -1363,7 +1372,7 @@ static void macvtap_exit(void)
 {
 	rtnl_link_unregister(&macvtap_link_ops);
 	unregister_netdevice_notifier(&macvtap_notifier_block);
-	class_unregister(macvtap_class);
+	class_unregister(&macvtap_class);
 	cdev_del(&macvtap_cdev);
 	unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
 	idr_destroy(&minor_idr);
-- 
2.8.0

^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH net-next V5 2/2] intel: ixgbevf: Support Windows hosts (Hyper-V)
From: KY Srinivasan @ 2016-04-20 13:44 UTC (permalink / raw)
  To: Jeff Kirsher, davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com,
	eli@mellanox.com, jackm@mellanox.com, yevgenyp@mellanox.com,
	john.ronciak@intel.com, intel-wired-lan@lists.osuosl.org,
	alexander.duyck@gmail.com
In-Reply-To: <1461133482.2923.28.camel@intel.com>



> -----Original Message-----
> From: Jeff Kirsher [mailto:jeffrey.t.kirsher@intel.com]
> Sent: Tuesday, April 19, 2016 11:25 PM
> To: KY Srinivasan <kys@microsoft.com>; davem@davemloft.net;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com;
> jasowang@redhat.com; eli@mellanox.com; jackm@mellanox.com;
> yevgenyp@mellanox.com; john.ronciak@intel.com; intel-wired-
> lan@lists.osuosl.org; alexander.duyck@gmail.com
> Subject: Re: [Intel-wired-lan] [PATCH net-next V5 2/2] intel: ixgbevf: Support
> Windows hosts (Hyper-V)
> 
> On Tue, 2016-04-19 at 19:17 -0700, K. Y. Srinivasan wrote:
> > On Hyper-V, the VF/PF communication is a via software mediated path
> > as opposed to the hardware mailbox. Make the necessary
> > adjustments to support Hyper-V.
> >
> > Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
> > ---
> >         V2: Addressed most of the comments from
> >             Alexander Duyck <alexander.duyck@gmail.com>
> >             and Rustad, Mark D <mark.d.rustad@intel.com>.
> >
> >         V3: Addressed additional comments from
> >             Alexander Duyck <alexander.duyck@gmail.com>
> >
> >         V4: Addressed kbuild errors reported by:
> >             kbuild test robot <lkp@intel.com>
> >
> >         V5: Addressed additional comments from
> >             Alexander Duyck <alexander.duyck@gmail.com>
> 
> First I commend you on actually making a proper changelog for a patch.
>  The only issue I have is that the changelog provides not actual
> changes.  Saying you reacted to comments does not summarize what the
> actual changes were.  I purposely did not review the earlier versions
> because by the time I went to do a review, there was another version
> already submitted.  So your changelog requires that users look at all
> the previous emails to actually see what Alex and Mark requested in
> changes.
> 
> I am telling you this for future patches, it will not impact these
> patches, just really disappointing.  I will take the extra time to
> track down all the previous emails and the changes requested to ensure
> you made all the changes that were requested of you. :-(  This will
> just delay when I apply the patches for testing.

Thank you for taking the time. Here is a brief summary of all the changes (from
the initial version - ignoring cosmetic changes):

1. Changed the check for running on Hyper-V to be instruction set 
architecture independent. This is now based on device IDs (since we have
unique device IDs when running on Hyper-V).

2. Introduced a bunch of stub functions to eliminate a bunch of checks I initially
had in ixgbev_main.c. I now fully populate the mac operations many of the entry
points being stubs (ixgbevf_hv_mac_ops).

3. Added a check to ensure that we return appropriate error if  PCI_MMCONFIG
Is not defined - on Hyper-V we get the mac address by reading the PCI config space
at special offset.

4. Made the following functions standalone functions:
 ixgbevf_hv_rlpml_set_vf() and
ixgbevf_negotiate_api_version()
These are the Hyper-V variants. We now have the check at the call sites
in ixgbevf_main.c.

Hope this helps and again I am sorry for not including the summary.

Regards,

K. Y
> 
> >
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |   12 ++
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   31 +++-
> >  drivers/net/ethernet/intel/ixgbevf/mbx.c          |   12 ++
> >  drivers/net/ethernet/intel/ixgbevf/vf.c           |  216
> > +++++++++++++++++++++
> >  drivers/net/ethernet/intel/ixgbevf/vf.h           |    2 +
> >  5 files changed, 266 insertions(+), 7 deletions(-)

^ permalink raw reply

* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Timur Tabi @ 2016-04-20 13:41 UTC (permalink / raw)
  To: Sinan Kaya, eranlinuxmellanox-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	cov-sgV2jX0FEOL9JmXXK+q4OQ, Yishai Hadas,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <57178663.4050503-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

Sinan Kaya wrote:
> I'd like to see a version of the solution
> to get merged until Mellanox comes up with a better solution with another
> patch.

Yes, I agree 100%.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Sinan Kaya @ 2016-04-20 13:38 UTC (permalink / raw)
  To: eranlinuxmellanox-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	timur-sgV2jX0FEOL9JmXXK+q4OQ, cov-sgV2jX0FEOL9JmXXK+q4OQ,
	Yishai Hadas, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <571785A5.5040306-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

Apologies,

Replied to an older post by mistake. I was trying to reply to Eran.

>Hi Sinan,
>
>We are working in Mellanox for a solution which
>removes the vmap call and allocate contiguous memory (using dma_alloc_coherent).
>
>Thanks,
>Eran
>
>
>On 4/20/2016 9:35 AM, Sinan Kaya wrote:
> On 4/19/2016 2:22 PM, Christoph Hellwig wrote:
>> What I think we need is something like the patch below.  In the long
>> ru nwe should also kill the mlx4_buf structure which now is pretty
>> pointless.
>>
> 

It is been 1.5 years since I reported the problem. We came up with three
different solutions this week. I'd like to see a version of the solution
to get merged until Mellanox comes up with a better solution with another
patch. My proposal is to use this one.

> 
> -- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
> @@ -588,7 +588,7 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
>  {
>         dma_addr_t t;
> 
> -       if (size <= max_direct) {
> +       if ((size <= max_direct) || (BITS_PER_LONG == 64)){
>                 buf->nbufs        = 1;
>                 buf->npages       = 1;
>                 buf->page_shift   = get_order(size) + PAGE_SHIFT;
> 
> Of course, this is assuming that you are not ready to submit your patch yet. If you 
> are, feel free to post.
> 


-- 
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Sinan Kaya @ 2016-04-20 13:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	cov-sgV2jX0FEOL9JmXXK+q4OQ, Yishai Hadas,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160419182212.GA8441-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>

On 4/19/2016 2:22 PM, Christoph Hellwig wrote:
> What I think we need is something like the patch below.  In the long
> ru nwe should also kill the mlx4_buf structure which now is pretty
> pointless.
> 

It is been 1.5 years since I reported the problem. We came up with three
different solutions this week. I'd like to see a version of the solution
to get merged until Mellanox comes up with a better solution with another
patch. My proposal is to use this one.

-- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -588,7 +588,7 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
 {
        dma_addr_t t;

-       if (size <= max_direct) {
+       if ((size <= max_direct) || (BITS_PER_LONG == 64)){
                buf->nbufs        = 1;
                buf->npages       = 1;
                buf->page_shift   = get_order(size) + PAGE_SHIFT;

Of course, this is assuming that you are not ready to submit your patch yet. If you 
are, feel free to post.

-- 
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Jiri Benc @ 2016-04-20 13:34 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David Ahern, David Miller, eric.dumazet, roopa, netdev, jhs,
	tgraf, nicolas.dichtel, egrumbach
In-Reply-To: <1461158228.2176.18.camel@sipsolutions.net>

On Wed, 20 Apr 2016 15:17:08 +0200, Johannes Berg wrote:
> Looks like you have this on a per-message basis. I thought it was
> better on an attribute basis because that's really where the issue is.

No problem. I'm not that happy with my patchset myself. Just wanted to
point it out in case it's useful.

 Jiri

^ permalink raw reply

* Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Johannes Berg @ 2016-04-20 13:17 UTC (permalink / raw)
  To: Jiri Benc
  Cc: David Ahern, David Miller, eric.dumazet, roopa, netdev, jhs,
	tgraf, nicolas.dichtel, egrumbach
In-Reply-To: <20160420144828.5537dce7@griffin>

On Wed, 2016-04-20 at 14:48 +0200, Jiri Benc wrote:
> On Wed, 20 Apr 2016 09:32:20 +0200, Johannes Berg wrote:
> > 
> > 2) Use the new attribute flag with some required attribute for
> >    existing commands, so that older kernel will not find the
> > required
> >    attribute and will reject the operation entirely.
> >    May or may not fall back to trying the operation again without
> > the
> >    flag.
> This is basically what I submitted half a year ago. See:
> http://thread.gmane.org/gmane.linux.network/382850
> 

That looks like a *huge* patchset though - whereas my proposal really
required only what Emmanuel sent in this thread. It did make some
assumptions, for example that any attribute lower than the "maxtype"
argument to nla_parse() was understood. [1]

Looks like you have this on a per-message basis. I thought it was
better on an attribute basis because that's really where the issue is.

You can still detect it with the per-attribute flag approach as I
described in (2) - if, for your lwtunnel example, you could specify the
flag on the RTA_ENCAP attribute, without which no lwtunnel can be
created (if I understand the code correctly.)

johannes



[1] for example, if I have three attributes:
enum attrs {__unused, A, B, C};

and the policy

policy = {
	[A] = { .type = NLA_U32 },
	[C] = { .type = NLA_U8 },
}

and then do

nla_parse(tb, 3, msg, msg_len, &policy)

it would assume that "B" is valid. Since this policy is equivalent to
the policy with
	[B] = { .type = NLA_BINARY }

(minimum length 0) we could also reject anything that has type=len=0 in
the policy, if the NLA_F_NET_MUST_PARSE flag is set in the nla_type.

This would likely be the right approach for most netlink families,
since they usually don't have holes that they actually care about -
I've yet to see any attribute that's not specified at all in the policy
but used anyway, normally you want some level of checking, and indicate
that by using { .type = NLA_BINARY } - but other things are possible.

johannes

^ permalink raw reply

* [PATCH net 4/4] net/mlx4_en: Split SW RX dropped counter per RX ring
From: Or Gerlitz @ 2016-04-20 13:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Yishai Hadas, Saeed Mahameed, Or Gerlitz
In-Reply-To: <1461157278-18528-1-git-send-email-ogerlitz@mellanox.com>

From: Eran Ben Elisha <eranbe@mellanox.com>

Count SW packet drops per RX ring instead of a global counter. This
will allow monitoring the number of rx drops per ring.

In addition, SW rx_dropped counter was overwritten by HW rx_dropped
counter, sum both of them instead to show the accurate value.

Fixes: a3333b35da16 ('net/mlx4_en: Moderate ethtool callback to [...] ')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 5 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_port.c    | 5 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      | 2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    | 1 +
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index f69584a..c761194 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -337,7 +337,7 @@ static int mlx4_en_get_sset_count(struct net_device *dev, int sset)
 	case ETH_SS_STATS:
 		return bitmap_iterator_count(&it) +
 			(priv->tx_ring_num * 2) +
-			(priv->rx_ring_num * 2);
+			(priv->rx_ring_num * 3);
 	case ETH_SS_TEST:
 		return MLX4_EN_NUM_SELF_TEST - !(priv->mdev->dev->caps.flags
 					& MLX4_DEV_CAP_FLAG_UC_LOOPBACK) * 2;
@@ -404,6 +404,7 @@ static void mlx4_en_get_ethtool_stats(struct net_device *dev,
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		data[index++] = priv->rx_ring[i]->packets;
 		data[index++] = priv->rx_ring[i]->bytes;
+		data[index++] = priv->rx_ring[i]->dropped;
 	}
 	spin_unlock_bh(&priv->stats_lock);
 
@@ -477,6 +478,8 @@ static void mlx4_en_get_strings(struct net_device *dev,
 				"rx%d_packets", i);
 			sprintf(data + (index++) * ETH_GSTRING_LEN,
 				"rx%d_bytes", i);
+			sprintf(data + (index++) * ETH_GSTRING_LEN,
+				"rx%d_dropped", i);
 		}
 		break;
 	case ETH_SS_PRIV_FLAGS:
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 3904b5f..20b6c2e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -158,6 +158,7 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 	u64 in_mod = reset << 8 | port;
 	int err;
 	int i, counter_index;
+	unsigned long sw_rx_dropped = 0;
 
 	mailbox = mlx4_alloc_cmd_mailbox(mdev->dev);
 	if (IS_ERR(mailbox))
@@ -180,6 +181,7 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		stats->rx_packets += priv->rx_ring[i]->packets;
 		stats->rx_bytes += priv->rx_ring[i]->bytes;
+		sw_rx_dropped += priv->rx_ring[i]->dropped;
 		priv->port_stats.rx_chksum_good += priv->rx_ring[i]->csum_ok;
 		priv->port_stats.rx_chksum_none += priv->rx_ring[i]->csum_none;
 		priv->port_stats.rx_chksum_complete += priv->rx_ring[i]->csum_complete;
@@ -236,7 +238,8 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 					  &mlx4_en_stats->MCAST_prio_1,
 					  NUM_PRIORITIES);
 	stats->collisions = 0;
-	stats->rx_dropped = be32_to_cpu(mlx4_en_stats->RDROP);
+	stats->rx_dropped = be32_to_cpu(mlx4_en_stats->RDROP) +
+			    sw_rx_dropped;
 	stats->rx_length_errors = be32_to_cpu(mlx4_en_stats->RdropLength);
 	stats->rx_over_errors = 0;
 	stats->rx_crc_errors = be32_to_cpu(mlx4_en_stats->RCRC);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 86bcfe5..91abc13 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -939,7 +939,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		/* GRO not possible, complete processing here */
 		skb = mlx4_en_rx_skb(priv, rx_desc, frags, length);
 		if (!skb) {
-			priv->stats.rx_dropped++;
+			ring->dropped++;
 			goto next;
 		}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d12ab6a..63b1aea 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -323,6 +323,7 @@ struct mlx4_en_rx_ring {
 	unsigned long csum_ok;
 	unsigned long csum_none;
 	unsigned long csum_complete;
+	unsigned long dropped;
 	int hwtstamp_rx_filter;
 	cpumask_var_t affinity_mask;
 };
-- 
2.3.7

^ permalink raw reply related

* [PATCH net 1/4] net/mlx4_core: Implement pci_resume callback
From: Or Gerlitz @ 2016-04-20 13:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Yishai Hadas, Daniel Jurgens, Or Gerlitz
In-Reply-To: <1461157278-18528-1-git-send-email-ogerlitz@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Move resume related activities to a new pci_resume function instead of
performing them in mlx4_pci_slot_reset.  This change is needed to avoid
a hotplug during EEH recovery due to commit f2da4ccf8bd4 ("powerpc/eeh:
More relaxed hotplug criterion").

Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 39 +++++++++++++++++++------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 358f723..5d45aa3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -4050,45 +4050,53 @@ static pci_ers_result_t mlx4_pci_slot_reset(struct pci_dev *pdev)
 {
 	struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
 	struct mlx4_dev	 *dev  = persist->dev;
-	struct mlx4_priv *priv = mlx4_priv(dev);
-	int               ret;
-	int nvfs[MLX4_MAX_PORTS + 1] = {0, 0, 0};
-	int total_vfs;
+	int err;
 
 	mlx4_err(dev, "mlx4_pci_slot_reset was called\n");
-	ret = pci_enable_device(pdev);
-	if (ret) {
-		mlx4_err(dev, "Can not re-enable device, ret=%d\n", ret);
+	err = pci_enable_device(pdev);
+	if (err) {
+		mlx4_err(dev, "Can not re-enable device, err=%d\n", err);
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
 	pci_set_master(pdev);
 	pci_restore_state(pdev);
 	pci_save_state(pdev);
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void mlx4_pci_resume(struct pci_dev *pdev)
+{
+	struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
+	struct mlx4_dev	 *dev  = persist->dev;
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int nvfs[MLX4_MAX_PORTS + 1] = {0, 0, 0};
+	int total_vfs;
+	int err;
 
+	mlx4_err(dev, "%s was called\n", __func__);
 	total_vfs = dev->persist->num_vfs;
 	memcpy(nvfs, dev->persist->nvfs, sizeof(dev->persist->nvfs));
 
 	mutex_lock(&persist->interface_state_mutex);
 	if (!(persist->interface_state & MLX4_INTERFACE_STATE_UP)) {
-		ret = mlx4_load_one(pdev, priv->pci_dev_data, total_vfs, nvfs,
+		err = mlx4_load_one(pdev, priv->pci_dev_data, total_vfs, nvfs,
 				    priv, 1);
-		if (ret) {
-			mlx4_err(dev, "%s: mlx4_load_one failed, ret=%d\n",
-				 __func__,  ret);
+		if (err) {
+			mlx4_err(dev, "%s: mlx4_load_one failed, err=%d\n",
+				 __func__,  err);
 			goto end;
 		}
 
-		ret = restore_current_port_types(dev, dev->persist->
+		err = restore_current_port_types(dev, dev->persist->
 						 curr_port_type, dev->persist->
 						 curr_port_poss_type);
-		if (ret)
-			mlx4_err(dev, "could not restore original port types (%d)\n", ret);
+		if (err)
+			mlx4_err(dev, "could not restore original port types (%d)\n", err);
 	}
 end:
 	mutex_unlock(&persist->interface_state_mutex);
 
-	return ret ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
 }
 
 static void mlx4_shutdown(struct pci_dev *pdev)
@@ -4105,6 +4113,7 @@ static void mlx4_shutdown(struct pci_dev *pdev)
 static const struct pci_error_handlers mlx4_err_handler = {
 	.error_detected = mlx4_pci_err_detected,
 	.slot_reset     = mlx4_pci_slot_reset,
+	.resume		= mlx4_pci_resume,
 };
 
 static struct pci_driver mlx4_driver = {
-- 
2.3.7

^ permalink raw reply related

* [PATCH net 3/4] net/mlx4_core: Don't allow to VF change global pause settings
From: Or Gerlitz @ 2016-04-20 13:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Yishai Hadas, Eugenia Emantayev,
	Saeed Mahameed, Or Gerlitz
In-Reply-To: <1461157278-18528-1-git-send-email-ogerlitz@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

Currently changing global pause settings is done via SET_PORT
command with input modifier GENERAL. This command is allowed
for each VF since MTU setting is done via the same command.

Change the above to the following scheme: before passing the
request to the FW, the PF will check whether it was issued
by a slave. If yes, don't change global pause and warn,
otherwise change to the requested value and store for
further reference.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |  2 ++
 drivers/net/ethernet/mellanox/mlx4/port.c | 13 +++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index ef96831..c9d7fc51 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -586,6 +586,8 @@ struct mlx4_mfunc_master_ctx {
 	struct mlx4_master_qp0_state qp0_state[MLX4_MAX_PORTS + 1];
 	int			init_port_ref[MLX4_MAX_PORTS + 1];
 	u16			max_mtu[MLX4_MAX_PORTS + 1];
+	u8			pptx;
+	u8			pprx;
 	int			disable_mcast_ref[MLX4_MAX_PORTS + 1];
 	struct mlx4_resource_tracker res_tracker;
 	struct workqueue_struct *comm_wq;
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 211c650..087b23b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -1317,6 +1317,19 @@ static int mlx4_common_set_port(struct mlx4_dev *dev, int slave, u32 in_mod,
 			}
 
 			gen_context->mtu = cpu_to_be16(master->max_mtu[port]);
+			/* Slave cannot change Global Pause configuration */
+			if (slave != mlx4_master_func_num(dev) &&
+			    ((gen_context->pptx != master->pptx) ||
+			     (gen_context->pprx != master->pprx))) {
+				gen_context->pptx = master->pptx;
+				gen_context->pprx = master->pprx;
+				mlx4_warn(dev,
+					  "denying Global Pause change for slave:%d\n",
+					  slave);
+			} else {
+				master->pptx = gen_context->pptx;
+				master->pprx = gen_context->pprx;
+			}
 			break;
 		case MLX4_SET_PORT_GID_TABLE:
 			/* change to MULTIPLE entries: number of guest's gids
-- 
2.3.7

^ permalink raw reply related

* [PATCH net 2/4] net/mlx4_core: Avoid repeated calls to pci enable/disable
From: Or Gerlitz @ 2016-04-20 13:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Yishai Hadas, Daniel Jurgens, Or Gerlitz
In-Reply-To: <1461157278-18528-1-git-send-email-ogerlitz@mellanox.com>

From: Daniel Jurgens <danielj@mellanox.com>

Maintain the PCI status and provide wrappers for enabling and disabling
the PCI device.  Performing the actions more than once without doing
its opposite results in warning logs.

This occurred when EEH hotplugged the device causing a warning for
disabling an already disabled device.

Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 39 +++++++++++++++++++++++++++----
 include/linux/mlx4/device.h               |  7 ++++++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 5d45aa3..12c77a7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3172,6 +3172,34 @@ static int mlx4_check_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap
 	return 0;
 }
 
+static int mlx4_pci_enable_device(struct mlx4_dev *dev)
+{
+	struct pci_dev *pdev = dev->persist->pdev;
+	int err = 0;
+
+	mutex_lock(&dev->persist->pci_status_mutex);
+	if (dev->persist->pci_status == MLX4_PCI_STATUS_DISABLED) {
+		err = pci_enable_device(pdev);
+		if (!err)
+			dev->persist->pci_status = MLX4_PCI_STATUS_ENABLED;
+	}
+	mutex_unlock(&dev->persist->pci_status_mutex);
+
+	return err;
+}
+
+static void mlx4_pci_disable_device(struct mlx4_dev *dev)
+{
+	struct pci_dev *pdev = dev->persist->pdev;
+
+	mutex_lock(&dev->persist->pci_status_mutex);
+	if (dev->persist->pci_status == MLX4_PCI_STATUS_ENABLED) {
+		pci_disable_device(pdev);
+		dev->persist->pci_status = MLX4_PCI_STATUS_DISABLED;
+	}
+	mutex_unlock(&dev->persist->pci_status_mutex);
+}
+
 static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
 			 int total_vfs, int *nvfs, struct mlx4_priv *priv,
 			 int reset_flow)
@@ -3582,7 +3610,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data,
 
 	pr_info(DRV_NAME ": Initializing %s\n", pci_name(pdev));
 
-	err = pci_enable_device(pdev);
+	err = mlx4_pci_enable_device(&priv->dev);
 	if (err) {
 		dev_err(&pdev->dev, "Cannot enable PCI device, aborting\n");
 		return err;
@@ -3715,7 +3743,7 @@ err_release_regions:
 	pci_release_regions(pdev);
 
 err_disable_pdev:
-	pci_disable_device(pdev);
+	mlx4_pci_disable_device(&priv->dev);
 	pci_set_drvdata(pdev, NULL);
 	return err;
 }
@@ -3775,6 +3803,7 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	priv->pci_dev_data = id->driver_data;
 	mutex_init(&dev->persist->device_state_mutex);
 	mutex_init(&dev->persist->interface_state_mutex);
+	mutex_init(&dev->persist->pci_status_mutex);
 
 	ret = devlink_register(devlink, &pdev->dev);
 	if (ret)
@@ -3923,7 +3952,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 	}
 
 	pci_release_regions(pdev);
-	pci_disable_device(pdev);
+	mlx4_pci_disable_device(dev);
 	devlink_unregister(devlink);
 	kfree(dev->persist);
 	devlink_free(devlink);
@@ -4042,7 +4071,7 @@ static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev,
 	if (state == pci_channel_io_perm_failure)
 		return PCI_ERS_RESULT_DISCONNECT;
 
-	pci_disable_device(pdev);
+	mlx4_pci_disable_device(persist->dev);
 	return PCI_ERS_RESULT_NEED_RESET;
 }
 
@@ -4053,7 +4082,7 @@ static pci_ers_result_t mlx4_pci_slot_reset(struct pci_dev *pdev)
 	int err;
 
 	mlx4_err(dev, "mlx4_pci_slot_reset was called\n");
-	err = pci_enable_device(pdev);
+	err = mlx4_pci_enable_device(dev);
 	if (err) {
 		mlx4_err(dev, "Can not re-enable device, err=%d\n", err);
 		return PCI_ERS_RESULT_DISCONNECT;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 8541a91..d1f904c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -828,6 +828,11 @@ struct mlx4_vf_dev {
 	u8			n_ports;
 };
 
+enum mlx4_pci_status {
+	MLX4_PCI_STATUS_DISABLED,
+	MLX4_PCI_STATUS_ENABLED,
+};
+
 struct mlx4_dev_persistent {
 	struct pci_dev	       *pdev;
 	struct mlx4_dev	       *dev;
@@ -841,6 +846,8 @@ struct mlx4_dev_persistent {
 	u8		state;
 	struct mutex	interface_state_mutex; /* protect SW state */
 	u8	interface_state;
+	struct mutex		pci_status_mutex; /* sync pci state */
+	enum mlx4_pci_status	pci_status;
 };
 
 struct mlx4_dev {
-- 
2.3.7

^ permalink raw reply related

* [PATCH net 0/4] Mellaox 40G driver fixes for 4.6-rc
From: Or Gerlitz @ 2016-04-20 13:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Yishai Hadas, Or Gerlitz

Hi Dave,

With the fix for ARM bug being under the works, these are 
few other fixes for mlx4 we have ready to go.

Eran addressed the problematic/wrong reporting of dropped packets, Daniel 
fixed some matters related to PPC EEH's and Jenny's patch makes sure
VFs can't change the port's pause settings.

Or.

Daniel Jurgens (2):
  net/mlx4_core: Implement pci_resume callback
  net/mlx4_core: Avoid repeated calls to pci enable/disable

Eran Ben Elisha (1):
  net/mlx4_en: Split SW RX dropped counter per RX ring

Eugenia Emantayev (1):
  net/mlx4_core: Don't allow to VF change global pause settings

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  5 +-
 drivers/net/ethernet/mellanox/mlx4/en_port.c    |  5 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      |  2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c       | 76 ++++++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx4/mlx4.h       |  2 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |  1 +
 drivers/net/ethernet/mellanox/mlx4/port.c       | 13 +++++
 include/linux/mlx4/device.h                     |  7 +++
 8 files changed, 89 insertions(+), 22 deletions(-)

-- 
2.3.7

^ permalink raw reply

* Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Jiri Benc @ 2016-04-20 12:48 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David Ahern, David Miller, eric.dumazet, roopa, netdev, jhs,
	tgraf, nicolas.dichtel, egrumbach
In-Reply-To: <1461137540.2176.5.camel@sipsolutions.net>

On Wed, 20 Apr 2016 09:32:20 +0200, Johannes Berg wrote:
> 2) Use the new attribute flag with some required attribute for
>    existing commands, so that older kernel will not find the required
>    attribute and will reject the operation entirely.
>    May or may not fall back to trying the operation again without the
>    flag.

This is basically what I submitted half a year ago. See:
http://thread.gmane.org/gmane.linux.network/382850

 Jiri

^ permalink raw reply

* [PATCH] net: phy: spi_ks8895: Don't leak references to SPI devices
From: Mark Brown @ 2016-04-20 11:54 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, Mark Brown

The ks8895 driver is using spi_dev_get() apparently just to take a copy
of the SPI device used to instantiate it but never calls spi_dev_put()
to free it.  Since the device is guaranteed to exist between probe() and
remove() there should be no need for the driver to take an extra
reference to it so fix the leak by just using a straight assignment.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 drivers/net/phy/spi_ks8995.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/spi_ks8995.c b/drivers/net/phy/spi_ks8995.c
index b5d50d458728..93ffedfa2994 100644
--- a/drivers/net/phy/spi_ks8995.c
+++ b/drivers/net/phy/spi_ks8995.c
@@ -441,7 +441,7 @@ static int ks8995_probe(struct spi_device *spi)
 		return -ENOMEM;
 
 	mutex_init(&ks->lock);
-	ks->spi = spi_dev_get(spi);
+	ks->spi = spi;
 	ks->chip = &ks8995_chip[variant];
 
 	if (ks->spi->dev.of_node) {
-- 
2.8.0.rc3

^ permalink raw reply related

* [PATCH] MAINTAINERS: net: add entry for TI Ethernet Switch drivers
From: Grygorii Strashko @ 2016-04-20 11:25 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Sekhar Nori, Tony Lindgren, linux-omap, Grygorii Strashko,
	David S. Miller, Mugunthan V N, Richard Cochran

Add record for TI Ethernet Switch Driver CPSW/CPDMA/MDIO HW
(am33/am43/am57/dr7/davinci) to ensure that related patches
will go through dedicated linux-omap list.

Also add Mugunthan as maintainer and myself as the reviewer.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1d5b4be..aca864d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11071,6 +11071,14 @@ S:	Maintained
 F:	drivers/clk/ti/
 F:	include/linux/clk/ti.h
 
+TI ETHERNET SWITCH DRIVER (CPSW)
+M:	Mugunthan V N <mugunthanvnm@ti.com>
+R:	Grygorii Strashko <grygorii.strashko@ti.com>
+L:	linux-omap@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/ti/cpsw*
+F:	drivers/net/ethernet/ti/davinci*
+
 TI FLASH MEDIA INTERFACE DRIVER
 M:	Alex Dubov <oakad@yahoo.com>
 S:	Maintained
-- 
2.8.1

^ permalink raw reply related

* Re: [Intel-gfx] [PATCH 4/4] drm/i915: Move ioremap_wc tracking onto VMA
From: Daniel Vetter @ 2016-04-20 11:17 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: David Hildenbrand, netdev, intel-gfx, linux-kernel, dri-devel,
	Peter Zijlstra (Intel), Daniel Vetter, Dan Williams, Yishai Hadas,
	Ingo Molnar, linux-rdma
In-Reply-To: <20160420091054.GL1990@wotan.suse.de>

On Wed, Apr 20, 2016 at 11:10:54AM +0200, Luis R. Rodriguez wrote:
> Reason I ask is since I noticed a while ago a lot of drivers
> were using info->fix.smem_start and info->fix.smem_len consistently
> for their ioremap'd areas it might make sense instead to let the
> internal framebuffer (register_framebuffer()) optionally manage the
> ioremap_wc() for drivers, given that this is pretty generic stuff.

All that legacy fbdev stuff is just for legacy support, and I prefer to
have that as dumb as possible. There's been some discussion even around
lifting the "kick out firmware fb driver" out of fbdev, since we'd need it
to have a simple drm driver for e.g. uefi.

But I definitely don't want a legacy horror show like fbdev to
automagically take care of device mappings for drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Eran Ben Elisha @ 2016-04-20 11:08 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Christoph Hellwig, linux-rdma, timur, cov, Yishai Hadas,
	Linux Netdev List, linux-kernel
In-Reply-To: <57167AF6.9090507@codeaurora.org>

Hi Sinan,

We are working in Mellanox for a solution which
removes the vmap call and allocate contiguous memory (using dma_alloc_coherent).

Thanks,
Eran


On Tue, Apr 19, 2016 at 9:37 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 4/19/2016 2:22 PM, Christoph Hellwig wrote:
>> What I think we need is something like the patch below.  In the long
>> ru nwe should also kill the mlx4_buf structure which now is pretty
>> pointless.
>
> Maybe; this could be the correct approach if we can guarantee that the
> architecture can allocate the requested amount of memory with
> dma_alloc_coherent.
>
> When I brought this issue a year ago, the objection was that my code
> doesn't compile on intel (dma_to_phys) and also some arches run out of
> DMA memory with existing customer base.
>
> If there is a real need to maintain compatibility with the existing
> architectures due to limited amount of DMA memory, we need to correct this
> code to make proper use of vmap via alloc_pages and also insert the
> dma_sync in proper places for DMA API conformance.
>
> Also, the tx descriptors always has to be allocated from a single DMA region
> or the tx code needs to be corrected to support page_list.
>
> If we can live with just using dma_alloc_coherent, your solution is
> better. I was trying to put this support for 64bit arches only while
> maintaining support for the existing code base.
>
>>
>> ---
>> From a493881d2a6c90152d3daabb7b6b3afd1d254d78 Mon Sep 17 00:00:00 2001
>> From: Christoph Hellwig <hch@lst.de>
>> Date: Tue, 19 Apr 2016 14:12:14 -0400
>> Subject: mlx4_en: don't try to split and vmap dma coherent allocations
>>
>> The memory returned by dma_alloc_coherent is not suitable for calling
>> virt_to_page on it, as it might for example come from vmap allocator.
>>
>> Remove the code that calls virt_to_page and vmap on dma coherent
>> allocations from the mlx4 drivers, and replace them with a single
>> high-order dma_alloc_coherent call.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Reported-by: Sinan Kaya <okaya@codeaurora.org>
>
>
> --
> Sinan Kaya
> Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: skb_at_tc_ingress helper breaks compilation of oot modules
From: Ingo Saitz @ 2016-04-20 10:53 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Ingo Saitz, netdev
In-Reply-To: <57175C13.8080109@iogearbox.net>

On Wed, Apr 20, 2016 at 12:38:11PM +0200, Daniel Borkmann wrote:
> Hmm, your fail.c test case only contains '#include <net/ipv6.h>'?

No, only when building oot modules (virtualbox, and I found batman-adv
having the same issue), so I reduced it to the most simple test case. This
actually builds a fail.ko on 4.4.7 with CONFIG_NET_CLS_ACT=y without
errors.

> Note, upstream kernel never cared about out-of-tree modules, only
> in-tree code. ;) Did you run into an issue with any in-tree code?

A current fix for oot modules would be to add:

#include <uapi/linux/pkt_cls.h>

in front of that include for kernel >= 4.5…

    Ingo
-- 
Kennedy's Lemma:
    If you can parse Perl, you can solve the Halting Problem.

http://www.perlmonks.org/?node_id=663393

^ permalink raw reply

* Re: skb_at_tc_ingress helper breaks compilation of oot modules
From: Daniel Borkmann @ 2016-04-20 10:38 UTC (permalink / raw)
  To: Ingo Saitz; +Cc: netdev
In-Reply-To: <20160420102148.GA18615@echse.zoo>

On 04/20/2016 12:21 PM, Ingo Saitz wrote:
> In Linux 4.5, when CONFIG_NET_CLS_ACT is defined, compilation of out of
> tree modules breaks with undeclared functions/constants. The culprit is:
>
> commit fdc5432a7b44ab7de17141beec19d946b9344e91
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date:   Thu Jan 7 15:50:22 2016 +0100
>
>      net, sched: add skb_at_tc_ingress helper
>
> which uses G_TC_AT and AT_INGRESS but only includes linux/pkt_cls.h,
> which does not include these #defines for oot builds. Unfortunately I'm
> not sure what the correct fix is, maybe the uapi folks could help, but i
> attached a simple testcase and build log (Makefile is straight from
> kernelnewbies).

Hmm, your fail.c test case only contains '#include <net/ipv6.h>'?

Note, upstream kernel never cared about out-of-tree modules, only
in-tree code. ;) Did you run into an issue with any in-tree code?

Thanks,
Daniel

^ permalink raw reply

* skb_at_tc_ingress helper breaks compilation of oot modules
From: Ingo Saitz @ 2016-04-20 10:21 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

In Linux 4.5, when CONFIG_NET_CLS_ACT is defined, compilation of out of
tree modules breaks with undeclared functions/constants. The culprit is:

commit fdc5432a7b44ab7de17141beec19d946b9344e91
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Thu Jan 7 15:50:22 2016 +0100

    net, sched: add skb_at_tc_ingress helper

which uses G_TC_AT and AT_INGRESS but only includes linux/pkt_cls.h,
which does not include these #defines for oot builds. Unfortunately I'm
not sure what the correct fix is, maybe the uapi folks could help, but i
attached a simple testcase and build log (Makefile is straight from
kernelnewbies).

    Ingo
-- 
 ╭─╮  Kennedy's Lemma:
╭│───╮  If you can parse Perl, you can solve the Halting Problem.
│╰─│─╯
╰──╯  http://www.perlmonks.org/?node_id=663393

[-- Attachment #2: fail.c --]
[-- Type: text/x-csrc, Size: 22 bytes --]

#include <net/ipv6.h>

[-- Attachment #3: Makefile --]
[-- Type: text/plain, Size: 154 bytes --]

obj-m := fail.o

all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

[-- Attachment #4: make.log --]
[-- Type: text/plain, Size: 1551 bytes --]

make -C /lib/modules/4.5.0-pinguin20160314/build M=/home/ingo/src/linux/pkt_cls-bug modules
make[1]: Entering directory '/usr/src/linux-headers-4.5.0-pinguin20160314'
  CC [M]  /home/ingo/src/linux/pkt_cls-bug/fail.o
In file included from include/linux/filter.h:16:0,
                 from include/net/sock.h:64,
                 from include/linux/tcp.h:22,
                 from include/linux/ipv6.h:72,
                 from include/net/ipv6.h:16,
                 from /home/ingo/src/linux/pkt_cls-bug/fail.c:1:
include/net/sch_generic.h: In function ‘skb_at_tc_ingress’:
include/net/sch_generic.h:413:9: error: implicit declaration of function ‘G_TC_AT’ [-Werror=implicit-function-declaration]
  return G_TC_AT(skb->tc_verd) & AT_INGRESS;
         ^
include/net/sch_generic.h:413:33: error: ‘AT_INGRESS’ undeclared (first use in this function)
  return G_TC_AT(skb->tc_verd) & AT_INGRESS;
                                 ^
include/net/sch_generic.h:413:33: note: each undeclared identifier is reported only once for each function it appears in
cc1: some warnings being treated as errors
scripts/Makefile.build:264: recipe for target '/home/ingo/src/linux/pkt_cls-bug/fail.o' failed
make[2]: *** [/home/ingo/src/linux/pkt_cls-bug/fail.o] Error 1
Makefile:1391: recipe for target '_module_/home/ingo/src/linux/pkt_cls-bug' failed
make[1]: *** [_module_/home/ingo/src/linux/pkt_cls-bug] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.5.0-pinguin20160314'
Makefile:4: recipe for target 'all' failed
make: *** [all] Error 2

^ permalink raw reply

* [PATCH net-next 2/2] RDS: TCP:  Call pskb_extract() helper function
From: Sowmini Varadhan @ 2016-04-20 10:17 UTC (permalink / raw)
  To: netdev, rds-devel, santosh.shilimkar, davem
  Cc: sowmini.varadhan, eric.dumazet, marcelo.leitner
In-Reply-To: <cover.1461086306.git.sowmini.varadhan@oracle.com>

rds-stress experiments with request size 256 bytes, 8K acks,
using 16 threads show a 40% improvment when pskb_extract()
replaces the {skb_clone(..); pskb_pull(..); pskb_trim(..);}
pattern in the Rx path, so we leverage the perf gain with
this commit.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/rds/tcp_recv.c |   14 +++-----------
 1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index 27a9921..d75d8b5 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -207,22 +207,14 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
 		}
 
 		if (left && tc->t_tinc_data_rem) {
-			clone = skb_clone(skb, arg->gfp);
+			to_copy = min(tc->t_tinc_data_rem, left);
+
+			clone = pskb_extract(skb, offset, to_copy, arg->gfp);
 			if (!clone) {
 				desc->error = -ENOMEM;
 				goto out;
 			}
 
-			to_copy = min(tc->t_tinc_data_rem, left);
-			if (!pskb_pull(clone, offset) ||
-			    pskb_trim(clone, to_copy)) {
-				pr_warn("rds_tcp_data_recv: pull/trim failed "
-					"left %zu data_rem %zu skb_len %d\n",
-					left, tc->t_tinc_data_rem, skb->len);
-				kfree_skb(clone);
-				desc->error = -ENOMEM;
-				goto out;
-			}
 			skb_queue_tail(&tinc->ti_skb_list, clone);
 
 			rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/2] skbuff: Add pskb_extract() helper function
From: Sowmini Varadhan @ 2016-04-20 10:17 UTC (permalink / raw)
  To: netdev, rds-devel, santosh.shilimkar, davem
  Cc: sowmini.varadhan, eric.dumazet, marcelo.leitner
In-Reply-To: <cover.1461086306.git.sowmini.varadhan@oracle.com>

A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.

The existing code uses the sequence
	clone = skb_clone(..);
	pskb_pull(clone, off, ..);
	pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().

To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 include/linux/skbuff.h |    2 +
 net/core/skbuff.c      |  248 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 250 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da0ace3..a1ce639 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2986,6 +2986,8 @@ struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
 int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
+struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
+			     gfp_t gfp);
 
 static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4cc594c..e8b6d20 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4619,3 +4619,251 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
 	return NULL;
 }
 EXPORT_SYMBOL(alloc_skb_with_frags);
+
+/* carve out the first off bytes from skb when off < headlen */
+static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
+				    const int headlen, gfp_t gfp_mask)
+{
+	int i;
+	int size = skb_end_offset(skb);
+	int new_hlen = headlen - off;
+	u8 *data;
+	int doff = 0;
+
+	size = SKB_DATA_ALIGN(size);
+
+	if (skb_pfmemalloc(skb))
+		gfp_mask |= __GFP_MEMALLOC;
+	data = kmalloc_reserve(size +
+			       SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+			       gfp_mask, NUMA_NO_NODE, NULL);
+	if (!data)
+		return -ENOMEM;
+
+	size = SKB_WITH_OVERHEAD(ksize(data));
+
+	/* Copy real data, and all frags */
+	skb_copy_from_linear_data_offset(skb, off, data, new_hlen);
+	skb->len -= off;
+
+	memcpy((struct skb_shared_info *)(data + size),
+	       skb_shinfo(skb),
+	       offsetof(struct skb_shared_info,
+			frags[skb_shinfo(skb)->nr_frags]));
+	if (skb_cloned(skb)) {
+		/* drop the old head gracefully */
+		if (skb_orphan_frags(skb, gfp_mask)) {
+			kfree(data);
+			return -ENOMEM;
+		}
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+			skb_frag_ref(skb, i);
+		if (skb_has_frag_list(skb))
+			skb_clone_fraglist(skb);
+		skb_release_data(skb);
+	} else {
+		/* we can reuse existing recount- all we did was
+		 * relocate values
+		 */
+		skb_free_head(skb);
+	}
+
+	doff = (data - skb->head);
+	skb->head = data;
+	skb->data = data;
+	skb->head_frag = 0;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+	skb->end = size;
+	doff = 0;
+#else
+	skb->end = skb->head + size;
+#endif
+	skb_set_tail_pointer(skb, skb_headlen(skb));
+	skb_headers_offset_update(skb, 0);
+	skb->cloned = 0;
+	skb->hdr_len = 0;
+	skb->nohdr = 0;
+	atomic_set(&skb_shinfo(skb)->dataref, 1);
+
+	return 0;
+}
+
+static int pskb_carve(struct sk_buff *skb, const u32 off, gfp_t gfp);
+
+/* carve out the first eat bytes from skb's frag_list. May recurse into
+ * pskb_carve()
+ */
+static int pskb_carve_frag_list(struct sk_buff *skb,
+				struct skb_shared_info *shinfo, int eat,
+				gfp_t gfp_mask)
+{
+	struct sk_buff *list = shinfo->frag_list;
+	struct sk_buff *clone = NULL;
+	struct sk_buff *insp = NULL;
+
+	do {
+		if (!list) {
+			pr_err("Not enough bytes to eat. Want %d\n", eat);
+			return -EFAULT;
+		}
+		if (list->len <= eat) {
+			/* Eaten as whole. */
+			eat -= list->len;
+			list = list->next;
+			insp = list;
+		} else {
+			/* Eaten partially. */
+			if (skb_shared(list)) {
+				clone = skb_clone(list, gfp_mask);
+				if (!clone)
+					return -ENOMEM;
+				insp = list->next;
+				list = clone;
+			} else {
+				/* This may be pulled without problems. */
+				insp = list;
+			}
+			if (pskb_carve(list, eat, gfp_mask) < 0) {
+				kfree_skb(clone);
+				return -ENOMEM;
+			}
+			break;
+		}
+	} while (eat);
+
+	/* Free pulled out fragments. */
+	while ((list = shinfo->frag_list) != insp) {
+		shinfo->frag_list = list->next;
+		kfree_skb(list);
+	}
+	/* And insert new clone at head. */
+	if (clone) {
+		clone->next = list;
+		shinfo->frag_list = clone;
+	}
+	return 0;
+}
+
+/* carve off first len bytes from skb. Split line (off) is in the
+ * non-linear part of skb
+ */
+static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
+				       int pos, gfp_t gfp_mask)
+{
+	int i, k = 0;
+	int size = skb_end_offset(skb);
+	u8 *data;
+	const int nfrags = skb_shinfo(skb)->nr_frags;
+	struct skb_shared_info *shinfo;
+	int doff = 0;
+
+	size = SKB_DATA_ALIGN(size);
+
+	if (skb_pfmemalloc(skb))
+		gfp_mask |= __GFP_MEMALLOC;
+	data = kmalloc_reserve(size +
+			       SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+			       gfp_mask, NUMA_NO_NODE, NULL);
+	if (!data)
+		return -ENOMEM;
+
+	size = SKB_WITH_OVERHEAD(ksize(data));
+
+	memcpy((struct skb_shared_info *)(data + size),
+	       skb_shinfo(skb), offsetof(struct skb_shared_info,
+					 frags[skb_shinfo(skb)->nr_frags]));
+	if (skb_orphan_frags(skb, gfp_mask)) {
+		kfree(data);
+		return -ENOMEM;
+	}
+	shinfo = (struct skb_shared_info *)(data + size);
+	for (i = 0; i < nfrags; i++) {
+		int fsize = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+
+		if (pos + fsize > off) {
+			shinfo->frags[k] = skb_shinfo(skb)->frags[i];
+
+			if (pos < off) {
+				/* Split frag.
+				 * We have two variants in this case:
+				 * 1. Move all the frag to the second
+				 *    part, if it is possible. F.e.
+				 *    this approach is mandatory for TUX,
+				 *    where splitting is expensive.
+				 * 2. Split is accurately. We make this.
+				 */
+				shinfo->frags[0].page_offset += off - pos;
+				skb_frag_size_sub(&shinfo->frags[0], off - pos);
+			}
+			skb_frag_ref(skb, i);
+			k++;
+		}
+		pos += fsize;
+	}
+	shinfo->nr_frags = k;
+	if (skb_has_frag_list(skb))
+		skb_clone_fraglist(skb);
+
+	if (k == 0) {
+		/* split line is in frag list */
+		pskb_carve_frag_list(skb, shinfo, off - pos, gfp_mask);
+	}
+	skb_release_data(skb);
+
+	doff = (data - skb->head);
+	skb->head = data;
+	skb->head_frag = 0;
+	skb->data = data;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+	skb->end = size;
+	doff = 0;
+#else
+	skb->end = skb->head + size;
+#endif
+	skb_reset_tail_pointer(skb);
+	skb_headers_offset_update(skb, 0);
+	skb->cloned   = 0;
+	skb->hdr_len  = 0;
+	skb->nohdr    = 0;
+	skb->len -= off;
+	skb->data_len = skb->len;
+	atomic_set(&skb_shinfo(skb)->dataref, 1);
+	return 0;
+}
+
+/* remove len bytes from the beginning of the skb */
+static int pskb_carve(struct sk_buff *skb, const u32 len, gfp_t gfp)
+{
+	int headlen = skb_headlen(skb);
+
+	if (len < headlen)
+		return pskb_carve_inside_header(skb, len, headlen, gfp);
+	else
+		return pskb_carve_inside_nonlinear(skb, len, headlen, gfp);
+}
+
+/* Extract to_copy bytes starting at off from skb, and return this in
+ * a new skb
+ */
+struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
+			     int to_copy, gfp_t gfp)
+{
+	struct sk_buff  *clone = skb_clone(skb, gfp);
+
+	if (!clone)
+		return NULL;
+
+	if (pskb_carve(clone, off, gfp) < 0) {
+		pr_warn("pskb_carve failed\n");
+		kfree_skb(clone);
+		return NULL;
+	}
+
+	if (pskb_trim(clone, to_copy)) {
+		pr_warn("pskb_trim failed\n");
+		kfree_skb(clone);
+		return NULL;
+	}
+	return clone;
+}
+EXPORT_SYMBOL(pskb_extract);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 0/2] pskb_extract() helper function.
From: Sowmini Varadhan @ 2016-04-20 10:17 UTC (permalink / raw)
  To: netdev, rds-devel, santosh.shilimkar, davem
  Cc: sowmini.varadhan, eric.dumazet, marcelo.leitner

This patchset follows up on the discussion in
 https://www.mail-archive.com/netdev@vger.kernel.org/msg105090.html

For RDS-TCP, we have to deal with the full gamut of
nonlinear sk_buffs, including all the frag_list variants.
Also, the parent skb has to remain unchanged, while the clone
is queued for Rx on the PF_RDS socket. 

Patch 1 of this patchset adds a pskb_extract() function that 
does all this without the redundant memcpy's in pskb_expand_head() 
and __pskb_pull_tail().

Sowmini Varadhan (2):
  Add pskb_extract() helper function
  Call pskb_extract() helper function

 include/linux/skbuff.h |    2 +
 net/core/skbuff.c      |  248 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/rds/tcp_recv.c     |   14 +--
 3 files changed, 253 insertions(+), 11 deletions(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox