Netdev List
 help / color / mirror / Atom feed
* [PATCH] vxlan: using pskb_may_pull as early as possible
From: roy.qing.li @ 2014-10-16  1:17 UTC (permalink / raw)
  To: netdev; +Cc: xiyou.wangcong

From: Li RongQing <roy.qing.li@gmail.com>

pskb_may_pull should be used to check if skb->data has enough space,
skb->len can not ensure that.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 drivers/net/vxlan.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index faf1bd1..77ab844 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1437,9 +1437,6 @@ static int neigh_reduce(struct net_device *dev, struct sk_buff *skb)
 	if (!in6_dev)
 		goto out;
 
-	if (!pskb_may_pull(skb, skb->len))
-		goto out;
-
 	iphdr = ipv6_hdr(skb);
 	saddr = &iphdr->saddr;
 	daddr = &iphdr->daddr;
@@ -1880,7 +1877,8 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 			return arp_reduce(dev, skb);
 #if IS_ENABLED(CONFIG_IPV6)
 		else if (ntohs(eth->h_proto) == ETH_P_IPV6 &&
-			 skb->len >= sizeof(struct ipv6hdr) + sizeof(struct nd_msg) &&
+			 pskb_may_pull(skb, sizeof(struct ipv6hdr)
+				       + sizeof(struct nd_msg)) &&
 			 ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
 				struct nd_msg *msg;
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next] cxgb4 : Improve handling of DCB negotiation or loss thereof
From: Anish Bhatt @ 2014-10-16  2:08 UTC (permalink / raw)
  To: netdev; +Cc: davem, hariprasad, leedom, Anish Bhatt

Clear out any DCB apps we might have added to kernel table when we lose DCB
sync (or IEEE equivalent event). IEEE allows individual components to work
independently, so improve check for IEEE completion by specifying individual
components.

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c | 48 ++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
index 8edf0f5bd679..ee819fd12bd2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_dcb.c
@@ -60,6 +60,42 @@ void cxgb4_dcb_version_init(struct net_device *dev)
 	dcb->dcb_version = FW_PORT_DCB_VER_AUTO;
 }
 
+static void cxgb4_dcb_cleanup_apps(struct net_device *dev)
+{
+	struct port_info *pi = netdev2pinfo(dev);
+	struct adapter *adap = pi->adapter;
+	struct port_dcb_info *dcb = &pi->dcb;
+	struct dcb_app app;
+	int i, err;
+
+	/* zero priority implies remove */
+	app.priority = 0;
+
+	for (i = 0; i < CXGB4_MAX_DCBX_APP_SUPPORTED; i++) {
+		/* Check if app list is exhausted */
+		if (!dcb->app_priority[i].protocolid)
+			break;
+
+		app.protocol = dcb->app_priority[i].protocolid;
+
+		if (dcb->dcb_version == FW_PORT_DCB_VER_IEEE) {
+			app.selector = dcb->app_priority[i].sel_field + 1;
+			err = dcb_ieee_setapp(dev, &app);
+		} else {
+			app.selector = !!(dcb->app_priority[i].sel_field);
+			err = dcb_setapp(dev, &app);
+		}
+
+		if (err) {
+			dev_err(adap->pdev_dev,
+				"Failed DCB Clear %s Application Priority: sel=%d, prot=%d, , err=%d\n",
+				dcb_ver_array[dcb->dcb_version], app.selector,
+				app.protocol, -err);
+			break;
+		}
+	}
+}
+
 /* Finite State machine for Data Center Bridging.
  */
 void cxgb4_dcb_state_fsm(struct net_device *dev,
@@ -145,6 +181,7 @@ void cxgb4_dcb_state_fsm(struct net_device *dev,
 			 * state.  We need to reset back to a ground state
 			 * of incomplete.
 			 */
+			cxgb4_dcb_cleanup_apps(dev);
 			cxgb4_dcb_state_init(dev);
 			dcb->state = CXGB4_DCB_STATE_FW_INCOMPLETE;
 			dcb->supported = CXGB4_DCBX_FW_SUPPORT;
@@ -833,11 +870,16 @@ static int cxgb4_setapp(struct net_device *dev, u8 app_idtype, u16 app_id,
 
 /* Return whether IEEE Data Center Bridging has been negotiated.
  */
-static inline int cxgb4_ieee_negotiation_complete(struct net_device *dev)
+static inline int
+cxgb4_ieee_negotiation_complete(struct net_device *dev,
+				enum cxgb4_dcb_fw_msgs dcb_subtype)
 {
 	struct port_info *pi = netdev2pinfo(dev);
 	struct port_dcb_info *dcb = &pi->dcb;
 
+	if (dcb_subtype && !(dcb->msgs & dcb_subtype))
+		return 0;
+
 	return (dcb->state == CXGB4_DCB_STATE_FW_ALLSYNCED &&
 		(dcb->supported & DCB_CAP_DCBX_VER_IEEE));
 }
@@ -850,7 +892,7 @@ static int cxgb4_ieee_getapp(struct net_device *dev, struct dcb_app *app)
 {
 	int prio;
 
-	if (!cxgb4_ieee_negotiation_complete(dev))
+	if (!cxgb4_ieee_negotiation_complete(dev, CXGB4_DCB_FW_APP_ID))
 		return -EINVAL;
 	if (!(app->selector && app->protocol))
 		return -EINVAL;
@@ -872,7 +914,7 @@ static int cxgb4_ieee_setapp(struct net_device *dev, struct dcb_app *app)
 {
 	int ret;
 
-	if (!cxgb4_ieee_negotiation_complete(dev))
+	if (!cxgb4_ieee_negotiation_complete(dev, CXGB4_DCB_FW_APP_ID))
 		return -EINVAL;
 	if (!(app->selector && app->protocol))
 		return -EINVAL;
-- 
2.1.2

^ permalink raw reply related

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-16  2:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org,
	Jiafei.Pan@freescale.com
In-Reply-To: <1413364533.12304.44.camel@edumazet-glaptop2.roam.corp.google.com>


> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Wednesday, October 15, 2014 5:16 PM
> To: Pan Jiafei-B37022
> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;
> linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote:
> 
> > Yes, for this matter, in order to do this we should modify the Ethernet
> > drivers. For example, driver A want to driver B, C, D.. to support driver
> > A's Hardware block access functions, so we have to modify driver B, C, D...
> > It will be so complex for this matter.
> >
> > But by using my patch, I just modify a Ethernet device (I don't care
> > Which driver it is used) flag in driver A in order to implement this
> > Ethernet device using hardware block access functions provided by
> > Driver A.
> 
> We care a lot of all the bugs added by your patches. You have little
> idea of how many of them were added. We do not want to spend days of
> work explaining everything or fixing all the details for you.
> 
> Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see
> how many spots you missed.
> 
> You cannot control how skbs are cooked before reaching your driver
> ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other
> drivers RX path. This would be absolutely insane.
> 

Thanks for your comments and suggestion. In my case, I want to build skb
from hardware block specified memory, I only can see two ways, one is modified
net card driver replace common skb allocation function with my specially
functions, another way is to hack common skb allocation function in which
redirect to my specially functions. My patch is just for the second way.
Except these two ways, would you please give me some advice for some other
ways for my case? Thanks.

> Trying to control how skb are cooked in RX path is absolutely something
> drivers do, using page frags that are read-only by all the stack.
> 
> Fix your driver to use existing infra, your suggestion is not going to
> be accepted.
> 


^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-16  2:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org,
	Jiafei.Pan@freescale.com
In-Reply-To: <1413364533.12304.44.camel@edumazet-glaptop2.roam.corp.google.com>


> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Wednesday, October 15, 2014 5:16 PM
> To: Pan Jiafei-B37022
> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;
> linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> On Wed, 2014-10-15 at 05:34 +0000, Jiafei.Pan@freescale.com wrote:
> 
> > Yes, for this matter, in order to do this we should modify the Ethernet
> > drivers. For example, driver A want to driver B, C, D.. to support driver
> > A's Hardware block access functions, so we have to modify driver B, C, D...
> > It will be so complex for this matter.
> >
> > But by using my patch, I just modify a Ethernet device (I don't care
> > Which driver it is used) flag in driver A in order to implement this
> > Ethernet device using hardware block access functions provided by
> > Driver A.
> 
> We care a lot of all the bugs added by your patches. You have little
> idea of how many of them were added. We do not want to spend days of
> work explaining everything or fixing all the details for you.
> 
> Carefully read net/core/skbuff.c, net/core/dev.c, GRO layer, you'll see
> how many spots you missed.
> 
> You cannot control how skbs are cooked before reaching your driver
> ndo_start_xmit(). You are not going to add hooks in UDP , TCP, or other
> drivers RX path. This would be absolutely insane.
> 
> Trying to control how skb are cooked in RX path is absolutely something
> drivers do, using page frags that are read-only by all the stack.
> 
> Fix your driver to use existing infra, your suggestion is not going to
> be accepted.
> 
I think my patch can connect some hardware buffer block with the third party
net card drivers. this should be a general requirement in order to get
a better performance. Yes, maybe some defect in my patch, but any comments
and suggestions for this target is welcome and thanks.


^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-16  2:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: davem@davemloft.net, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org,
	Jiafei.Pan@freescale.com
In-Reply-To: <20141015113323.5321b2f7@uryu.home.lan>



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, October 15, 2014 5:33 PM
> To: Pan Jiafei-B37022
> Cc: davem@davemloft.net; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-
> R58472; linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> Since an skb can sit forever in an application queue, you have created
> an easy way to livelock the system when enough skb's are waiting to be
> read.

I think there is no possible to livelock the system, because in my patch
The function __netdev_alloc_skb will try to allocate hardware block buffer
Firstly if dev->alloc_hw_skb is set, but it will continue allocate normal
skb buffer if the hardware block buffer allocation fails.

^ permalink raw reply

* Re: [PATCH net-next 0/4] Add support for few debugfs entries
From: David Miller @ 2014-10-16  3:24 UTC (permalink / raw)
  To: hariprasad; +Cc: netdev, leedom, kumaras, nirranjan, santosh, anish
In-Reply-To: <1413422524-14054-1-git-send-email-hariprasad@chelsio.com>

From: Hariprasad Shenai <hariprasad@chelsio.com>
Date: Thu, 16 Oct 2014 06:52:00 +0530

> This patch series adds a new file for debugfs and support for devlog, cim_la,
> cim_la_qcfg and mps_tcam debugfs entries.
> 
> The patches series is created against 'net-next' tree.
> And includes patches on cxgb4 driver.
> 
> We have included all the maintainers of respective drivers. Kindly review the
> change and let us know in case of any review comments.

net-next is not open at this time, please resubmit this series
when it is open again.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] cxgb4 : Improve handling of DCB negotiation or loss thereof
From: David Miller @ 2014-10-16  3:24 UTC (permalink / raw)
  To: anish; +Cc: netdev, hariprasad, leedom
In-Reply-To: <1413425295-22824-1-git-send-email-anish@chelsio.com>

From: Anish Bhatt <anish@chelsio.com>
Date: Wed, 15 Oct 2014 19:08:15 -0700

> Clear out any DCB apps we might have added to kernel table when we lose DCB
> sync (or IEEE equivalent event). IEEE allows individual components to work
> independently, so improve check for IEEE completion by specifying individual
> components.
> 
> Signed-off-by: Anish Bhatt <anish@chelsio.com>

net-next is not open at this time, please resubmit this patch
when it is open again.

Thank you.

^ permalink raw reply

* Re: [PATCH 1/1 net-next] openvswitch: kerneldoc warning fix
From: David Miller @ 2014-10-16  3:25 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, pshelar, dev, netdev
In-Reply-To: <1413399798-8514-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Wed, 15 Oct 2014 21:03:18 +0200

> s/sock/gs
> 
> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1 net-next] openvswitch: use vport instead of p
From: David Miller @ 2014-10-16  3:26 UTC (permalink / raw)
  To: fabf; +Cc: linux-kernel, pshelar, dev, netdev
In-Reply-To: <1413399821-8558-1-git-send-email-fabf@skynet.be>

From: Fabian Frederick <fabf@skynet.be>
Date: Wed, 15 Oct 2014 21:03:41 +0200

> All functions used struct vport *vport except
> ovs_vport_find_upcall_portid.
> 
> This fixes 1 kerneldoc warning
> 
> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied.

^ permalink raw reply

* Re: [PATCH] vxlan: fix a use after free in vxlan_encap_bypass
From: David Miller @ 2014-10-16  3:31 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1413420581-12989-1-git-send-email-roy.qing.li@gmail.com>

From: roy.qing.li@gmail.com
Date: Thu, 16 Oct 2014 08:49:41 +0800

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> when netif_rx() is done, the netif_rx handled skb maybe be freed,
> and should not be used.
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

Looks good, applied, and queued up for -stable thanks!

^ permalink raw reply

* Re: [PATCH] vxlan: using pskb_may_pull as early as possible
From: David Miller @ 2014-10-16  3:33 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev, xiyou.wangcong
In-Reply-To: <1413422238-2562-1-git-send-email-roy.qing.li@gmail.com>

From: roy.qing.li@gmail.com
Date: Thu, 16 Oct 2014 09:17:18 +0800

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> pskb_may_pull should be used to check if skb->data has enough space,
> skb->len can not ensure that.
> 
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

Also applied and queued up for -stable,thanks.

^ permalink raw reply

* Re: Netlink mmap tx security?
From: David Miller @ 2014-10-16  3:34 UTC (permalink / raw)
  To: luto; +Cc: dborkman, torvalds, kaber, netdev, tgraf
In-Reply-To: <CALCETrUW6yX8B=ANyxYqhwPbKkosbuBGXO8omTD6rGA1A1XhvA@mail.gmail.com>

From: Andy Lutomirski <luto@amacapital.net>
Date: Wed, 15 Oct 2014 16:58:39 -0700

> On Wed, Oct 15, 2014 at 4:57 PM, David Miller <davem@davemloft.net> wrote:
>> From: Daniel Borkmann <dborkman@redhat.com>
>> Date: Thu, 16 Oct 2014 01:45:22 +0200
>>
>>> On 10/15/2014 04:01 AM, David Miller wrote:
>>>> From: Andy Lutomirski <luto@amacapital.net>
>>>> Date: Tue, 14 Oct 2014 15:16:46 -0700
>>>>
>>>>> It's at least remotely possible that there's something that assumes
>>>>> that assumes that the availability of NETLINK_RX_RING implies
>>>>> NETLINK_TX_RING, which would be unfortunate.
>>>>
>>>> I already found one such case, nlmon :-/
>>>
>>> Hmm, can you elaborate? I currently don't think that nlmon cares
>>> actually.
>>
>> nlmon cares, openvswitch cares, etc:
> 
> It'll still work, just slower, right?
> 
> +    if (setsockopt(sock->fd, SOL_NETLINK, NETLINK_RX_RING, &req,
> sizeof(req)) < 0
> +        || setsockopt(sock->fd, SOL_NETLINK, NETLINK_TX_RING, &req,
> sizeof(req)) < 0) {
> +        VLOG_INFO("mmap netlink is not supported");
> +        return 0;
> +    }

So NETLINK_RX_RING succeeds but NETLINK_TX_RING fails, so now the
recvmsg() path will take the RX ring code path because this code
doesn't clean up and undo the setsockopt().

This is the common coding paradigm I've seen in all NETLINK_*_RING
uses.

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Eric Dumazet @ 2014-10-16  4:15 UTC (permalink / raw)
  To: Jiafei.Pan@freescale.com
  Cc: David Miller, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org
In-Reply-To: <524626e093684abeba65839d26e94262@BLUPR03MB517.namprd03.prod.outlook.com>

On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote:

> Thanks for your comments and suggestion. In my case, I want to build skb
> from hardware block specified memory, I only can see two ways, one is modified
> net card driver replace common skb allocation function with my specially
> functions, another way is to hack common skb allocation function in which
> redirect to my specially functions. My patch is just for the second way.
> Except these two ways, would you please give me some advice for some other
> ways for my case? Thanks

I suggest you read drivers/net/ethernet numerous examples.

No need to change anything  in net/* or include/*, really.

For a start, look at drivers/net/ethernet/intel/igb/igb_main.c

Mentioning 'hack' in your mails simply should hint you are doing
something very wrong.

What makes you think your hardware is so special ?



^ permalink raw reply

* Re: [Patch net 1/3] ipv4: call __ip_options_echo() in cookie_v4_check()
From: Eric Dumazet @ 2014-10-16  4:15 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, davem, Krzysztof Kolasa, Eric Dumazet, Cong Wang
In-Reply-To: <1413408802-21052-1-git-send-email-xiyou.wangcong@gmail.com>

On Wed, 2014-10-15 at 14:33 -0700, Cong Wang wrote:
> From: Cong Wang <cwang@twopensource.com>
> 
> commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
> missed that cookie_v4_check() still calls ip_options_echo() which uses
> IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo()
> instead.
> 
> Fixes: commit 971f10eca186cab238c49da ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
> Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
> Cc: Eric Dumazet <edumazet@google.com>
> Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
> Signed-off-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/ipv4/syncookies.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 0431a8f..7e7401c 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -321,7 +321,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  		int opt_size = sizeof(struct ip_options_rcu) + opt->optlen;
>  
>  		ireq->opt = kmalloc(opt_size, GFP_ATOMIC);
> -		if (ireq->opt != NULL && ip_options_echo(&ireq->opt->opt, skb)) {
> +		if (ireq->opt != NULL && __ip_options_echo(&ireq->opt->opt, skb, opt)) {
>  			kfree(ireq->opt);
>  			ireq->opt = NULL;
>  		}

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [Patch net 2/3] ipv4: share tcp_v4_save_options() with cookie_v4_check()
From: Eric Dumazet @ 2014-10-16  4:16 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, davem, Krzysztof Kolasa, Eric Dumazet, Cong Wang
In-Reply-To: <1413408802-21052-2-git-send-email-xiyou.wangcong@gmail.com>

On Wed, 2014-10-15 at 14:33 -0700, Cong Wang wrote:
> From: Cong Wang <cwang@twopensource.com>
> 
> cookie_v4_check() allocates ip_options_rcu in the same way
> with tcp_v4_save_options(), we can just make it a helper function.
> 
> Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [Patch net 3/3] ipv4: clean up cookie_v4_check()
From: Eric Dumazet @ 2014-10-16  4:17 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, davem, Krzysztof Kolasa, Eric Dumazet, Cong Wang
In-Reply-To: <1413408802-21052-3-git-send-email-xiyou.wangcong@gmail.com>

On Wed, 2014-10-15 at 14:33 -0700, Cong Wang wrote:
> From: Cong Wang <cwang@twopensource.com>
> 
> We can retrieve opt from skb, no need to pass it as a parameter.
> And opt should always be non-NULL, no need to check.
> 
> Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
> Cc: Eric Dumazet <edumazet@google.com>
> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
> Signed-off-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* RE: [PATCH] net: use hardware buffer pool to allocate skb
From: Jiafei.Pan @ 2014-10-16  5:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org,
	Jiafei.Pan@freescale.com
In-Reply-To: <1413432912.28798.7.camel@edumazet-glaptop2.roam.corp.google.com>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Thursday, October 16, 2014 12:15 PM
> To: Pan Jiafei-B37022
> Cc: David Miller; jkosina@suse.cz; netdev@vger.kernel.org; Li Yang-Leo-R58472;
> linux-doc@vger.kernel.org
> Subject: Re: [PATCH] net: use hardware buffer pool to allocate skb
> 
> On Thu, 2014-10-16 at 02:17 +0000, Jiafei.Pan@freescale.com wrote:
> 
> > Thanks for your comments and suggestion. In my case, I want to build skb
> > from hardware block specified memory, I only can see two ways, one is modified
> > net card driver replace common skb allocation function with my specially
> > functions, another way is to hack common skb allocation function in which
> > redirect to my specially functions. My patch is just for the second way.
> > Except these two ways, would you please give me some advice for some other
> > ways for my case? Thanks
> 
> I suggest you read drivers/net/ethernet numerous examples.
> 
> No need to change anything  in net/* or include/*, really.
> 
> For a start, look at drivers/net/ethernet/intel/igb/igb_main.c
> 
> Mentioning 'hack' in your mails simply should hint you are doing
> something very wrong.
> 
> What makes you think your hardware is so special ?
> 
In fact, I am developing a bridge driver, it can bridge between any other the
third party net card and my own net card. My target is to let any other the
third party net card can directly use my own net card specified buffer, then
there will be no memory copy in the whole bridge process.
By the way, I don’t see any similar between igb_main.c and my case. And also
My bridge also can’t implemented with "skb frag" in order to aim at zero memory
copy.

^ permalink raw reply

* Re: Netlink mmap tx security?
From: Thomas Graf @ 2014-10-16  5:52 UTC (permalink / raw)
  To: David Miller; +Cc: dborkman, luto, torvalds, kaber, netdev
In-Reply-To: <20141015.195737.1429281929513331763.davem@davemloft.net>

On 10/15/14 at 07:57pm, David Miller wrote:
> From: Daniel Borkmann <dborkman@redhat.com>
> Date: Thu, 16 Oct 2014 01:45:22 +0200
> 
> > On 10/15/2014 04:01 AM, David Miller wrote:
> >> From: Andy Lutomirski <luto@amacapital.net>
> >> Date: Tue, 14 Oct 2014 15:16:46 -0700
> >>
> >>> It's at least remotely possible that there's something that assumes
> >>> that assumes that the availability of NETLINK_RX_RING implies
> >>> NETLINK_TX_RING, which would be unfortunate.
> >>
> >> I already found one such case, nlmon :-/
> > 
> > Hmm, can you elaborate? I currently don't think that nlmon cares
> > actually.
> 
> nlmon cares, openvswitch cares, etc:
> 
> http://openvswitch.org/pipermail/dev/2013-December/034496.html

(Fortunately) the OVS patch has not been merged yet because the number
of Netlink sockets created per vport in the current architecture
currently make it a non scalable approach.

I think introdcing a NETLINK_RX_RING2 and having NETLINK_RX_RING fail
is not a bad way out of this.

^ permalink raw reply

* Re: tcpdump's capture filter: "vlan" doesn't match
From: Daniel Borkmann @ 2014-10-16  6:10 UTC (permalink / raw)
  To: Lukas Tribus
  Cc: netdev@vger.kernel.org, John Fastabend, Michał Mirosław,
	Jiri Pirko, Ben Hutchings, Atzm Watanabe, Patrick McHardy,
	Jesse Gross
In-Reply-To: <DUB123-W23D5ECAD4869F9A799E1F6EDAA0@phx.gbl>



On 10/16/2014 12:58 AM, Lukas Tribus wrote:
> Hi,
>
>
>
> since 2.6.39 (including -rc1), tcpdump "vlan" capture filters don't match
> anymore. All 2.6.38 and older kernels are fine.
>
>
> I reproduced this specifically on a r8169 NIC on 2.6.39-rc1, but I found
> this problem initially on bnx2 and e1000e nics.
>
>
> Howto reproduce: just tcpdump with a "not vlan", "vlan" or "vlan <vlanid>"
> capture filter on a passive eth interface (dot1q/vlan/ip config not necessary).
>
> Actual behavior is that a "vlan [vlanid]" capture filter doesn't match the
> (tagged) packet, and a "not vlan" capture filter matches everything.
>
>
> Disabling rx-vlan-offloading via
> ethtool -K eth0 rxvlan off
>
> doesn't change anything.
>
>
> Here we are filtering for "not vlan" and we can see that the matched frame
> is vlan tagged:
>
> # tcpdump -Uenc1 not vlan
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 22:03:39.077584 70:ca:9b:01:23:34> 00:18:f8:01:23:34, \
> *ethertype 802.1Q (0x8100), length 70: vlan 7, p 0*, ethertype IPv4, \
> 192.168.47.9.443> 192.168.32.30.39436: Flags [.], ack 255248912, \
> [...]
> 1 packet captured
> 169 packets received by filter
> 0 packets dropped by kernel
> 59 packets dropped by interface
> #
>
>
>
>
> As suggested here [1], we can pipe everything through another tcpdump
> instance:
> tcpdump -Uw - | tcpdump -en -r - vlan <vlanid>
>
>
> But that is not something that works for my specific use-case (dedicated
> sniffer box, dedicated interface connected to a Cisco SPAN/mirror port,
> un/single/double-tagged packets, remotely accessible via remote-pcap [2]).
>
>
> The sniffer should also be able to:
> - maintain the frame as-is, including dot1q, dot1p (preferably
>    without artificial recreation of header fields/values and including CFI/DEI)
> - "direct" capture filter based on vlan (not through multiple userspace
>    instances)
>
> Kernel <= 2.6.38 perfectly satisfies those requirements.
>
>
> Isn't disabling rx-vlan-offloading supposed to remedy those problems?

There were some discussions on this in the past e.g. [1]. We have
SKF_AD_VLAN_TAG and SKF_AD_VLAN_TAG_PRESENT for the BPF filter on
this, but libpcap is currently not making use of any of them.

  [1] http://thread.gmane.org/gmane.linux.network/247947

> Thanks,
>
> Lukas
>
>
>
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=498981
> [2] https://github.com/frgtn/rpcapd-linux
>
>   		 	   		  --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: Netlink mmap tx security?
From: Daniel Borkmann @ 2014-10-16  6:45 UTC (permalink / raw)
  To: David Miller; +Cc: luto, torvalds, kaber, netdev, tgraf
In-Reply-To: <20141014.220908.123550384430402000.davem@davemloft.net>

On 10/15/2014 04:09 AM, David Miller wrote:
> From: Andy Lutomirski <luto@amacapital.net>
> Date: Tue, 14 Oct 2014 19:03:11 -0700
>
>> On Tue, Oct 14, 2014 at 7:01 PM, David Miller <davem@davemloft.net> wrote:
>>> I really think this means I'll have to remove all of the netlink
>>> mmap() support in order to prevent from breaking applications. :(
>>>
>>> The other option is to keep NETLINK_TX_RING, but copy the data into
>>> a kernel side buffer before acting upon it.
>>
>> Option 3, which sucks but maybe not that badly: change the value of
>> NETLINK_RX_RING.  (Practically: add NETLINK_RX_RING2 or something like
>> that.)
>
> That would work as well.
>
> There are pros and cons to all of these approaches.
>
> I was thinking that if we do the "TX mmap --> copy to kernel buffer"
> approach, then if in the future we find a way to make it work
> reliably, we can avoid the copy.  And frankly performance wise it's no
> worse than what happens via normal sendmsg() calls.
>
> And all applications using NETLINK_RX_RING keep working and keep
> getting the performance boost.

That would be better, yes. This would avoid having such a TPACKET_V*
API chaos we have in packet sockets if this could be fixed for netlink
eventually.

^ permalink raw reply

* Re: Netlink mmap tx security?
From: Thomas Graf @ 2014-10-16  7:07 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, luto, torvalds, kaber, netdev
In-Reply-To: <543F6998.5090000@redhat.com>

On 10/16/14 at 08:45am, Daniel Borkmann wrote:
> On 10/15/2014 04:09 AM, David Miller wrote:
> >That would work as well.
> >
> >There are pros and cons to all of these approaches.
> >
> >I was thinking that if we do the "TX mmap --> copy to kernel buffer"
> >approach, then if in the future we find a way to make it work
> >reliably, we can avoid the copy.  And frankly performance wise it's no
> >worse than what happens via normal sendmsg() calls.
> >
> >And all applications using NETLINK_RX_RING keep working and keep
> >getting the performance boost.
> 
> That would be better, yes. This would avoid having such a TPACKET_V*
> API chaos we have in packet sockets if this could be fixed for netlink
> eventually.

Only saw the second part of Dave's message now. I agree that this
is even a better option.

^ permalink raw reply

* Re: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
From: Thierry Herbelot @ 2014-10-16  7:23 UTC (permalink / raw)
  To: Tantilov, Emil S, Kirsher, Jeffrey T, Brandeburg, Jesse,
	Allan, Bruce W, netdev@vger.kernel.org
In-Reply-To: <87618083B2453E4A8714035B62D67992500E2629@FMSMSX105.amr.corp.intel.com>

On 10/16/2014 12:50 AM, Tantilov, Emil S wrote:
>> -----Original Message-----
>> From: Thierry Herbelot [mailto:thierry.herbelot@6wind.com]
>> Sent: Wednesday, October 15, 2014 2:58 AM
>> To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W;
>> netdev@vger.kernel.org; Tantilov, Emil S
>> Cc: Thierry Herbelot
>> Subject: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
>>
>> this protects against the following panic:
>> (before a VF was actually created on p96p1 PF Ethernet port)
>>
>> ip link set p96p1 vf 0 spoofchk off
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
>> IP: [<ffffffffa044a1c1>]
>> ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
>>
>> Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
>> ---
>>
>> v2:
>>   compilation fixes
>>
>> v3:
>>   remove checks in functions where vfinfo is known not to be NULL
>>   return -EINVAL as error code
>>
>> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   42
>> ++++++++++++++++++++++--
>> 1 file changed, 40 insertions(+), 2 deletions(-)
>
> Actually looking into this a bit more, the check for vfinfo is not sufficient
> because it does not protect against specifying vf that is outside of sriov_num_vfs range.
>
> All of the ndo functions have a check for it except for ixgbevf_ndo_set_spoofcheck().
>
> The following patch should be all we need to protect against this panic:
>
> This patch adds a check to return -EINVAL when setting spoofcheck on
> VF that is not configured.
>
> Reported-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> ---
>   drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |    3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> index 706fc69..97c85b8 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> @@ -1261,6 +1261,9 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
>   	struct ixgbe_hw *hw = &adapter->hw;
>   	u32 regval;
>
> +	if (vf >= adapter->num_vfs)
> +		return -EINVAL;
> +
>   	adapter->vfinfo[vf].spoofchk_enabled = setting;
>
>   	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
>
>

Hello,

Indeed your patch solves the initial issue.

And indeed I also double-checked that all other instances are protected 
by the (vf >= adapter->num_vfs) condition.

	Best regards

	Thierry

-- 
Thierry Herbelot
6WIND
Software Engineer

^ permalink raw reply

* Re: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
From: Jeff Kirsher @ 2014-10-16  7:32 UTC (permalink / raw)
  To: Thierry Herbelot
  Cc: Tantilov, Emil S, Brandeburg, Jesse, Allan, Bruce W,
	netdev@vger.kernel.org
In-Reply-To: <543F7255.7070606@6wind.com>

[-- Attachment #1: Type: text/plain, Size: 2836 bytes --]

On Thu, 2014-10-16 at 09:23 +0200, Thierry Herbelot wrote:
> On 10/16/2014 12:50 AM, Tantilov, Emil S wrote:
> >> -----Original Message-----
> >> From: Thierry Herbelot [mailto:thierry.herbelot@6wind.com]
> >> Sent: Wednesday, October 15, 2014 2:58 AM
> >> To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W;
> >> netdev@vger.kernel.org; Tantilov, Emil S
> >> Cc: Thierry Herbelot
> >> Subject: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
> >>
> >> this protects against the following panic:
> >> (before a VF was actually created on p96p1 PF Ethernet port)
> >>
> >> ip link set p96p1 vf 0 spoofchk off
> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
> >> IP: [<ffffffffa044a1c1>]
> >> ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
> >>
> >> Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> >> ---
> >>
> >> v2:
> >>   compilation fixes
> >>
> >> v3:
> >>   remove checks in functions where vfinfo is known not to be NULL
> >>   return -EINVAL as error code
> >>
> >> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   42
> >> ++++++++++++++++++++++--
> >> 1 file changed, 40 insertions(+), 2 deletions(-)
> >
> > Actually looking into this a bit more, the check for vfinfo is not sufficient
> > because it does not protect against specifying vf that is outside of sriov_num_vfs range.
> >
> > All of the ndo functions have a check for it except for ixgbevf_ndo_set_spoofcheck().
> >
> > The following patch should be all we need to protect against this panic:
> >
> > This patch adds a check to return -EINVAL when setting spoofcheck on
> > VF that is not configured.
> >
> > Reported-by: Thierry Herbelot <thierry.herbelot@6wind.com>
> > Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> > ---
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |    3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> > index 706fc69..97c85b8 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> > @@ -1261,6 +1261,9 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
> >   	struct ixgbe_hw *hw = &adapter->hw;
> >   	u32 regval;
> >
> > +	if (vf >= adapter->num_vfs)
> > +		return -EINVAL;
> > +
> >   	adapter->vfinfo[vf].spoofchk_enabled = setting;
> >
> >   	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
> >
> >
> 
> Hello,
> 
> Indeed your patch solves the initial issue.
> 
> And indeed I also double-checked that all other instances are protected 
> by the (vf >= adapter->num_vfs) condition.

So Thierry, can I add your Acked-by or Tested-by to Emil's patch?

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
From: Thierry Herbelot @ 2014-10-16  7:34 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Tantilov, Emil S, Brandeburg, Jesse, Allan, Bruce W,
	netdev@vger.kernel.org
In-Reply-To: <1413444750.2412.43.camel@jtkirshe-mobl>

On 10/16/2014 09:32 AM, Jeff Kirsher wrote:
> On Thu, 2014-10-16 at 09:23 +0200, Thierry Herbelot wrote:
>> On 10/16/2014 12:50 AM, Tantilov, Emil S wrote:
>>>> -----Original Message-----
>>>> From: Thierry Herbelot [mailto:thierry.herbelot@6wind.com]
>>>> Sent: Wednesday, October 15, 2014 2:58 AM
>>>> To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W;
>>>> netdev@vger.kernel.org; Tantilov, Emil S
>>>> Cc: Thierry Herbelot
>>>> Subject: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
>>>>
>>>> this protects against the following panic:
>>>> (before a VF was actually created on p96p1 PF Ethernet port)
>>>>
>>>> ip link set p96p1 vf 0 spoofchk off
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000052
>>>> IP: [<ffffffffa044a1c1>]
>>>> ixgbe_ndo_set_vf_spoofchk+0x51/0x150 [ixgbe]
>>>>
>>>> Signed-off-by: Thierry Herbelot <thierry.herbelot@6wind.com>
>>>> ---
>>>>
>>>> v2:
>>>>    compilation fixes
>>>>
>>>> v3:
>>>>    remove checks in functions where vfinfo is known not to be NULL
>>>>    return -EINVAL as error code
>>>>
>>>> drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   42
>>>> ++++++++++++++++++++++--
>>>> 1 file changed, 40 insertions(+), 2 deletions(-)
>>>
>>> Actually looking into this a bit more, the check for vfinfo is not sufficient
>>> because it does not protect against specifying vf that is outside of sriov_num_vfs range.
>>>
>>> All of the ndo functions have a check for it except for ixgbevf_ndo_set_spoofcheck().
>>>
>>> The following patch should be all we need to protect against this panic:
>>>
>>> This patch adds a check to return -EINVAL when setting spoofcheck on
>>> VF that is not configured.
>>>
>>> Reported-by: Thierry Herbelot <thierry.herbelot@6wind.com>
>>> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
>>> ---
>>>    drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |    3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>>> index 706fc69..97c85b8 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>>> @@ -1261,6 +1261,9 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
>>>    	struct ixgbe_hw *hw = &adapter->hw;
>>>    	u32 regval;
>>>
>>> +	if (vf >= adapter->num_vfs)
>>> +		return -EINVAL;
>>> +
>>>    	adapter->vfinfo[vf].spoofchk_enabled = setting;
>>>
>>>    	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
>>>
>>>
>>
>> Hello,
>>
>> Indeed your patch solves the initial issue.
>>
>> And indeed I also double-checked that all other instances are protected
>> by the (vf >= adapter->num_vfs) condition.
>
> So Thierry, can I add your Acked-by or Tested-by to Emil's patch?
>

Hello,

I agree with the patch.

Acked-by Thierry Herbelot <thierry.herbelot@6wind.com>

	Best regards

	Th

-- 
Thierry Herbelot
6WIND
Software Engineer

^ permalink raw reply

* Re: [PATCH v3 net] ixgbe: check adapter->vfinfo before dereference
From: Jeff Kirsher @ 2014-10-16  7:36 UTC (permalink / raw)
  To: Thierry Herbelot
  Cc: Tantilov, Emil S, Brandeburg, Jesse, Allan, Bruce W,
	netdev@vger.kernel.org
In-Reply-To: <543F74F5.1040706@6wind.com>

[-- Attachment #1: Type: text/plain, Size: 160 bytes --]

On Thu, 2014-10-16 at 09:34 +0200, Thierry Herbelot wrote:
> I agree with the patch.
> 
> Acked-by Thierry Herbelot <thierry.herbelot@6wind.com>

Thanks!

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox