Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 09/10] can: use __dev_get_by_index instead of dev_get_by_index to find interface
From: Ying Xue @ 2014-01-14  7:41 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel
In-Reply-To: <1389685269-18600-1-git-send-email-ying.xue@windriver.com>

As cgw_create_job() is always under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used to
find interface handler in it having us avoid to change interface
reference counter.

Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 net/can/gw.c |   15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/can/gw.c b/net/can/gw.c
index 88c8a39..ac31891 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -839,21 +839,21 @@ static int cgw_create_job(struct sk_buff *skb,  struct nlmsghdr *nlh)
 	if (!gwj->ccgw.src_idx || !gwj->ccgw.dst_idx)
 		goto out;
 
-	gwj->src.dev = dev_get_by_index(&init_net, gwj->ccgw.src_idx);
+	gwj->src.dev = __dev_get_by_index(&init_net, gwj->ccgw.src_idx);
 
 	if (!gwj->src.dev)
 		goto out;
 
 	if (gwj->src.dev->type != ARPHRD_CAN)
-		goto put_src_out;
+		goto out;
 
-	gwj->dst.dev = dev_get_by_index(&init_net, gwj->ccgw.dst_idx);
+	gwj->dst.dev = __dev_get_by_index(&init_net, gwj->ccgw.dst_idx);
 
 	if (!gwj->dst.dev)
-		goto put_src_out;
+		goto out;
 
 	if (gwj->dst.dev->type != ARPHRD_CAN)
-		goto put_src_dst_out;
+		goto out;
 
 	gwj->limit_hops = limhops;
 
@@ -862,11 +862,6 @@ static int cgw_create_job(struct sk_buff *skb,  struct nlmsghdr *nlh)
 	err = cgw_register_filter(gwj);
 	if (!err)
 		hlist_add_head_rcu(&gwj->list, &cgw_list);
-
-put_src_dst_out:
-	dev_put(gwj->dst.dev);
-put_src_out:
-	dev_put(gwj->src.dev);
 out:
 	if (err)
 		kmem_cache_free(cgw_cache, gwj);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 08/10] caif: __dev_get_by_index instead of dev_get_by_index to find interface
From: Ying Xue @ 2014-01-14  7:41 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel
In-Reply-To: <1389685269-18600-1-git-send-email-ying.xue@windriver.com>

The following call chains indicate that chnl_net_open() is under
rtnl_lock protection as __dev_open() is protected by rtnl_lock.
So if __dev_get_by_index() instead of dev_get_by_index() is used
to find interface handler in it, this would help us avoid to change
interface reference counter.

__dev_open()
  chnl_net_open()

Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 net/caif/chnl_net.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 7344a8f..4589ff67 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -285,7 +285,7 @@ static int chnl_net_open(struct net_device *dev)
 				goto error;
 		}
 
-		lldev = dev_get_by_index(dev_net(dev), llifindex);
+		lldev = __dev_get_by_index(dev_net(dev), llifindex);
 
 		if (lldev == NULL) {
 			pr_debug("no interface?\n");
@@ -307,7 +307,6 @@ static int chnl_net_open(struct net_device *dev)
 		mtu = min_t(int, dev->mtu, lldev->mtu - (headroom + tailroom));
 		mtu = min_t(int, GPRS_PDP_MTU, mtu);
 		dev_set_mtu(dev, mtu);
-		dev_put(lldev);
 
 		if (mtu < 100) {
 			pr_warn("CAIF Interface MTU too small (%d)\n", mtu);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 07/10] batman-adv: use __dev_get_by_index instead of dev_get_by_index to find interface
From: Ying Xue @ 2014-01-14  7:41 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel
In-Reply-To: <1389685269-18600-1-git-send-email-ying.xue@windriver.com>

The following call chains indicate that batadv_is_on_batman_iface()
is always under rtnl_lock protection as call_netdevice_notifier()
is protected by rtnl_lock. So if __dev_get_by_index() rather than
dev_get_by_index() is used to find interface handler in it, this
would help us avoid to change interface reference counter.

call_netdevice_notifier()
  batadv_hard_if_event()
    batadv_hardif_add_interface()
      batadv_is_valid_iface()
        batadv_is_on_batman_iface()

Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 net/batman-adv/hard-interface.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index bebd46c..115d14e 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -86,15 +86,13 @@ static bool batadv_is_on_batman_iface(const struct net_device *net_dev)
 		return false;
 
 	/* recurse over the parent device */
-	parent_dev = dev_get_by_index(&init_net, net_dev->iflink);
+	parent_dev = __dev_get_by_index(&init_net, net_dev->iflink);
 	/* if we got a NULL parent_dev there is something broken.. */
 	if (WARN(!parent_dev, "Cannot find parent device"))
 		return false;
 
 	ret = batadv_is_on_batman_iface(parent_dev);
 
-	if (parent_dev)
-		dev_put(parent_dev);
 	return ret;
 }
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 05/10] decnet: use __dev_get_by_index instead of dev_get_by_index to find interface
From: Ying Xue @ 2014-01-14  7:41 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel
In-Reply-To: <1389685269-18600-1-git-send-email-ying.xue@windriver.com>

The following call chain we can identify that dn_cache_getroute() is
protected under rtnl_lock. So if we use __dev_get_by_index() instead
of dev_get_by_index() to find interface handlers in it, this would help
us avoid to change interface reference counter.

rtnetlink_rcv()
  rtnl_lock()
    netlink_rcv_skb()
      dn_cache_getroute()
  rtnl_unlock()

Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 net/decnet/dn_route.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index ad2efa5..22390e4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1666,12 +1666,12 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 
 	if (fld.flowidn_iif) {
 		struct net_device *dev;
-		if ((dev = dev_get_by_index(&init_net, fld.flowidn_iif)) == NULL) {
+		dev = __dev_get_by_index(&init_net, fld.flowidn_iif);
+		if (!dev) {
 			kfree_skb(skb);
 			return -ENODEV;
 		}
 		if (!dev->dn_ptr) {
-			dev_put(dev);
 			kfree_skb(skb);
 			return -ENODEV;
 		}
@@ -1693,8 +1693,6 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
 		err = dn_route_output_key((struct dst_entry **)&rt, &fld, 0);
 	}
 
-	if (skb->dev)
-		dev_put(skb->dev);
 	skb->dev = NULL;
 	if (err)
 		goto out_free;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 04/10] dcb: use __dev_get_by_name instead of dev_get_by_name to find interface
From: Ying Xue @ 2014-01-14  7:41 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel
In-Reply-To: <1389685269-18600-1-git-send-email-ying.xue@windriver.com>

The following call chain indicates that dcb_doit() is protected
under rtnl_lock. So if we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handlers in it, this would
help us avoid to change interface reference counter.

rtnetlink_rcv()
  rtnl_lock()
  netlink_rcv_skb()
    dcb_doit()
  rtnl_unlock()

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 net/dcb/dcbnl.c |   15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 66fbe19..5536444 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1688,21 +1688,17 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (!tb[DCB_ATTR_IFNAME])
 		return -EINVAL;
 
-	netdev = dev_get_by_name(net, nla_data(tb[DCB_ATTR_IFNAME]));
+	netdev = __dev_get_by_name(net, nla_data(tb[DCB_ATTR_IFNAME]));
 	if (!netdev)
 		return -ENODEV;
 
-	if (!netdev->dcbnl_ops) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
+	if (!netdev->dcbnl_ops)
+		return -EOPNOTSUPP;
 
 	reply_skb = dcbnl_newmsg(fn->type, dcb->cmd, portid, nlh->nlmsg_seq,
 				 nlh->nlmsg_flags, &reply_nlh);
-	if (!reply_skb) {
-		ret = -ENOBUFS;
-		goto out;
-	}
+	if (!reply_skb)
+		return -ENOBUFS;
 
 	ret = fn->cb(netdev, nlh, nlh->nlmsg_seq, tb, reply_skb);
 	if (ret < 0) {
@@ -1714,7 +1710,6 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 	ret = rtnl_unicast(reply_skb, net, portid);
 out:
-	dev_put(netdev);
 	return ret;
 }
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 02/10] bonding: use __dev_get_by_name instead of dev_get_by_name to find interface
From: Ying Xue @ 2014-01-14  7:41 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel
In-Reply-To: <1389685269-18600-1-git-send-email-ying.xue@windriver.com>

The following call chain indicates that bond_do_ioctl() is protected
under rtnl_lock. If we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handler in it, this would
help us avoid to change reference counter of interface once.

dev_ioctl()
  rtnl_lock()
  dev_ifsioc()
    bond_do_ioctl()
  rtnl_unlock()

Additionally we also change the coding style in bond_do_ioctl(),
letting it more readable for us.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 drivers/net/bonding/bond_main.c |   49 ++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e06c445..a69afbf 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3213,37 +3213,34 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
-	slave_dev = dev_get_by_name(net, ifr->ifr_slave);
+	slave_dev = __dev_get_by_name(net, ifr->ifr_slave);
 
 	pr_debug("slave_dev=%p:\n", slave_dev);
 
 	if (!slave_dev)
-		res = -ENODEV;
-	else {
-		pr_debug("slave_dev->name=%s:\n", slave_dev->name);
-		switch (cmd) {
-		case BOND_ENSLAVE_OLD:
-		case SIOCBONDENSLAVE:
-			res = bond_enslave(bond_dev, slave_dev);
-			break;
-		case BOND_RELEASE_OLD:
-		case SIOCBONDRELEASE:
-			res = bond_release(bond_dev, slave_dev);
-			break;
-		case BOND_SETHWADDR_OLD:
-		case SIOCBONDSETHWADDR:
-			bond_set_dev_addr(bond_dev, slave_dev);
-			res = 0;
-			break;
-		case BOND_CHANGE_ACTIVE_OLD:
-		case SIOCBONDCHANGEACTIVE:
-			res = bond_option_active_slave_set(bond, slave_dev);
-			break;
-		default:
-			res = -EOPNOTSUPP;
-		}
+		return -ENODEV;
 
-		dev_put(slave_dev);
+	pr_debug("slave_dev->name=%s:\n", slave_dev->name);
+	switch (cmd) {
+	case BOND_ENSLAVE_OLD:
+	case SIOCBONDENSLAVE:
+		res = bond_enslave(bond_dev, slave_dev);
+		break;
+	case BOND_RELEASE_OLD:
+	case SIOCBONDRELEASE:
+		res = bond_release(bond_dev, slave_dev);
+		break;
+	case BOND_SETHWADDR_OLD:
+	case SIOCBONDSETHWADDR:
+		bond_set_dev_addr(bond_dev, slave_dev);
+		res = 0;
+		break;
+	case BOND_CHANGE_ACTIVE_OLD:
+	case SIOCBONDCHANGEACTIVE:
+		res = bond_option_active_slave_set(bond, slave_dev);
+		break;
+	default:
+		res = -EOPNOTSUPP;
 	}
 
 	return res;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH net-next 00/10] use appropriate APIs to get interfaces
From: Ying Xue @ 2014-01-14  7:40 UTC (permalink / raw)
  To: davem
  Cc: vfalico, john.r.fastabend, stephen, antonio, dmitry.tarnyagin,
	socketcan, johannes, netdev, linux-kernel

Under rtnl_lock protection, we should use __dev_get_name/index()
rather than dev_get_name()/index() to find interface handlers
because the former interfaces can help us avoid to change interface
reference counter.

Ying Xue (10):
  Drivers: Staging: cxt1e1: use __dev_get_name instead of dev_get_name
    to find interfaces
  bonding: use __dev_get_by_name instead of dev_get_by_name to find
    interface
  eql: use __dev_get_by_name instead of dev_get_by_name to find
    interface
  dcb: use __dev_get_by_name instead of dev_get_by_name to find
    interface
  decnet: use __dev_get_by_index instead of dev_get_by_index to find
    interface
  vxlan: use __dev_get_by_index instead of dev_get_by_index to find
    interface
  batman-adv: use __dev_get_by_index instead of dev_get_by_index to
    find interface
  caif: __dev_get_by_index instead of dev_get_by_index to find
    interface
  can: use __dev_get_by_index instead of dev_get_by_index to find
    interface
  net: nl80211: __dev_get_by_index instead of dev_get_by_index to find
    interface

 drivers/net/bonding/bond_main.c |   49 +++++++++----------
 drivers/net/eql.c               |   95 ++++++++++++++++---------------------
 drivers/net/vxlan.c             |    3 +-
 drivers/staging/cxt1e1/linux.c  |   15 +++---
 net/batman-adv/hard-interface.c |    4 +-
 net/caif/chnl_net.c             |    3 +-
 net/can/gw.c                    |   15 ++----
 net/dcb/dcbnl.c                 |   15 ++----
 net/decnet/dn_route.c           |    6 +--
 net/wireless/nl80211.c          |  100 ++++++++++++++-------------------------
 10 files changed, 123 insertions(+), 182 deletions(-)

-- 
1.7.9.5

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: 3com: fix warning for incorrect type in argument
From: David Miller @ 2014-01-14  7:31 UTC (permalink / raw)
  To: dingtianhong; +Cc: netdev, joe, julia.lawall
In-Reply-To: <52D0FF89.2020206@huawei.com>

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Sat, 11 Jan 2014 16:23:37 +0800

> The commit c466a9b2b329f7d9982c14eedc83a923d3bc711c
> (net: 3com: slight optimization of addr compare)
> cause a warning: "passing argument 1 of 'ether_addr_equal'
> from incompatible pointer type", so fix it.
> 
> I think julia will convert ether_addr_equal to ether_addr_equal_64bits later.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: qlcnic: fix warning for incorrect type in argument
From: David Miller @ 2014-01-14  7:31 UTC (permalink / raw)
  To: dingtianhong; +Cc: himanshu.madhani, rajesh.borundia, netdev, joe, julia.lawall
In-Reply-To: <52D0FF87.4090806@huawei.com>

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Sat, 11 Jan 2014 16:23:35 +0800

> The commit 6878f79a8b71e8c7b0587a1185584f54fd31f185
> (net: qlcnic: slight optimization of addr compare)
> cause a warning "sparse: incorrect type in argument 2
> (different type sizes)", so fix it.
> 
> I think julia will convert ether_addr_equal to ether_addr_equal_64bits later.
> 
> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
> Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/5] net: mvneta: replace Tx timer with a real interrupt
From: Willy Tarreau @ 2014-01-14  7:30 UTC (permalink / raw)
  To: Arnaud Ebalard
  Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT, Eric Dumazet
In-Reply-To: <87y52jeack.fsf@natisbad.org>

On Tue, Jan 14, 2014 at 12:22:03AM +0100, Arnaud Ebalard wrote:
> Hi Willy,
> 
> Willy Tarreau <w@1wt.eu> writes:
> 
> > @@ -1935,14 +1907,22 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
> >  
> >  	/* Read cause register */
> >  	cause_rx_tx = mvreg_read(pp, MVNETA_INTR_NEW_CAUSE) &
> > -		MVNETA_RX_INTR_MASK(rxq_number);
> > +		(MVNETA_RX_INTR_MASK(rxq_number) | MVNETA_TX_INTR_MASK(txq_number));
> > +
> > +	/* Release Tx descriptors */
> > +	if (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL) {
> > +		int tx_todo = 0;
> > +
> > +		mvneta_tx_done_gbe(pp, (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL), &tx_todo);
> > +		cause_rx_tx &= ~MVNETA_TX_INTR_MASK_ALL;
> > +	}
> 
> Unless I missed something, tx_todo above is just here to make the
> compiler happy w/ current prototype of mvneta_tx_done_gbe() but is
> otherwise unused: you could simply remove the third parameter of the
> function (it is only used here) and remove tx_todo.

A number of such changes could be done but should be merged separately,
along with the cleanup and improvement series.

> Additionally, as you do not use the return value of the function, you
> could probably make it void and spare some additional cycles by removing
> the computation of the return value. While at it, mvneta_txq_done()
> could also be made void.
> 
> The patch below gives the idea, it's compile-tested only and applies on
> your whole set (fixes + perf).

You should propose your patches for net-next on top of my series, really,
it's not too late.

Please see my comments below.

> Index: linux/drivers/net/ethernet/marvell/mvneta.c
> ===================================================================
> --- linux.orig/drivers/net/ethernet/marvell/mvneta.c	2014-01-14 00:07:18.728729578 +0100
> +++ linux/drivers/net/ethernet/marvell/mvneta.c	2014-01-14 00:11:57.740949448 +0100
> @@ -1314,25 +1314,23 @@
>  }
>  
>  /* Handle end of transmission */
> -static int mvneta_txq_done(struct mvneta_port *pp,
> +static void mvneta_txq_done(struct mvneta_port *pp,
>  			   struct mvneta_tx_queue *txq)
>  {
>  	struct netdev_queue *nq = netdev_get_tx_queue(pp->dev, txq->id);
>  	int tx_done;
>  
>  	tx_done = mvneta_txq_sent_desc_proc(pp, txq);
> -	if (tx_done == 0)
> -		return tx_done;
> -	mvneta_txq_bufs_free(pp, txq, tx_done);
> +	if (tx_done) {
> +		mvneta_txq_bufs_free(pp, txq, tx_done);

Better just use "if (tx_done == 0) return" above and avoid adding an
extra indent level by inverting the if, that makes the code more readable.

> -	txq->count -= tx_done;
> +		txq->count -= tx_done;
>  
> -	if (netif_tx_queue_stopped(nq)) {
> -		if (txq->size - txq->count >= MAX_SKB_FRAGS + 1)
> -			netif_tx_wake_queue(nq);
> +		if (netif_tx_queue_stopped(nq)) {
> +			if (txq->size - txq->count >= MAX_SKB_FRAGS + 1)
> +				netif_tx_wake_queue(nq);
> +		}
>  	}
> -
> -	return tx_done;
>  }
>  
>  static void *mvneta_frag_alloc(const struct mvneta_port *pp)
> @@ -1704,30 +1702,23 @@
>  /* Handle tx done - called in softirq context. The <cause_tx_done> argument
>   * must be a valid cause according to MVNETA_TXQ_INTR_MASK_ALL.
>   */
> -static u32 mvneta_tx_done_gbe(struct mvneta_port *pp, u32 cause_tx_done,
> -			      int *tx_todo)
> +static void mvneta_tx_done_gbe(struct mvneta_port *pp, u32 cause_tx_done)
>  {
>  	struct mvneta_tx_queue *txq;
> -	u32 tx_done = 0;
>  	struct netdev_queue *nq;
>  
> -	*tx_todo = 0;
>  	while (cause_tx_done) {
>  		txq = mvneta_tx_done_policy(pp, cause_tx_done);
>  
>  		nq = netdev_get_tx_queue(pp->dev, txq->id);
>  		__netif_tx_lock(nq, smp_processor_id());
>  
> -		if (txq->count) {
> -			tx_done += mvneta_txq_done(pp, txq);
> -			*tx_todo += txq->count;
> -		}
> +		if (txq->count)
> +			mvneta_txq_done(pp, txq);
>  
>  		__netif_tx_unlock(nq);
>  		cause_tx_done &= ~((1 << txq->id));
>  	}
> -
> -	return tx_done;
>  }

Seems fine.

>  /* Compute crc8 of the specified address, using a unique algorithm ,
> @@ -1961,9 +1952,7 @@
>  
>  	/* Release Tx descriptors */
>  	if (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL) {
> -		int tx_todo = 0;
> -
> -		mvneta_tx_done_gbe(pp, (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL), &tx_todo);
> +		mvneta_tx_done_gbe(pp, (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL));
>  		cause_rx_tx &= ~MVNETA_TX_INTR_MASK_ALL;
>  	}

Seems fine as well.

Thanks!
Willy

^ permalink raw reply

* Re: [PATCH] sh_eth: fix garbled TX error message
From: David Miller @ 2014-01-14  7:29 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-sh
In-Reply-To: <201401110241.49471.sergei.shtylyov@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Sat, 11 Jan 2014 02:41:49 +0300

> sh_eth_error() in case of a TX error tries to print a message using 2 dev_err()
> calls with the first string not finished by '\n', so that the resulting message
> would inevitably come out garbled, with something like "3net eth0: " inserted
> in the middle.  Avoid that by merging 2 calls into one.
> 
> While at it, insert an empty line after the nearby declaration.
> 
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Applied, thanks.

I don't think this is really -stable material, sorry.

^ permalink raw reply

* Re: [PATCH 0/5] Assorted mvneta fixes
From: Willy Tarreau @ 2014-01-14  7:24 UTC (permalink / raw)
  To: Arnaud Ebalard
  Cc: davem, netdev, Thomas Petazzoni, Gregory CLEMENT, Eric Dumazet
In-Reply-To: <87d2jvfr1m.fsf@natisbad.org>

Hi Arnaud,

On Mon, Jan 13, 2014 at 11:36:05PM +0100, Arnaud Ebalard wrote:
> Hi,
> 
> Willy Tarreau <w@1wt.eu> writes:
> 
> >> Funny enough, I spent some time this week-end trying to find the root
> >> cause of some kernel freezes and panics appearing randomly after some GB
> >> read on a ReadyNAS 102 configured as a NFS server. 
> >> 
> >> I tested your fixes and performance series together on top of current
> >> 3.13.0-rc7 and I am now unable to reproduce the freeze and panics after
> >> having read more than the 300GB of traffic from the NAS: following
> >> bandwith with a bwm-ng shows the rate is also far more stable than w/
> >> previous driver logic (55MB/sec). So, FWIW:
> >> 
> >> Tested-by: Arnaud Ebalard <arno@natisbad.org>
> >
> > Thanks for this.
> >
> > BTW, the "performance" series is not supposed to fix anything, 
> 
> I was lazy and wanted to give the whole set a try in a single pass.
> 
> 
> > and still it seems difficult to me to find what patch might have fixed
> > your problem. Maybe the timer used in place of an IRQ has an even
> > worse effect than what we could imagine ?
> 
> I guess so.
> 
> 
> >> Willy, I can extend the test to RN2120 if you think it is useful to also
> >> do additional tests on a dual-core armada XP.
> >
> > It's up to you. These patches have run extensively on my Mirabox (Armada370),
> > OpenBlocks AX3 (ArmadaXP dual core) and the XP-GP board (ArmadaXP quad core),
> > and fixed the stability issues and performance issues I was facing there. But
> > you may be interested in testing them with your workloads (none of my boxes
> > is used as an NFS server, NAS or whatever, they mainly see HTTP and very small
> > packets used in stress tests).
> 
> Well, I spent the evening on my RN104 (Aramda370 w/ 2 GbE ifaces) and my
> RN2120 (Dual core ArmadaXP w/ 2GbE ifaces) using one as a router and
> serving NFS traffic from the other (and then changing roles). I passed
> hundreds of GB of TCP/NFS traffic and did not see any issue.
> 
> Additionally, FWIW, testing both using netperf show they easily support
> routing traffic w/ line rate perf.
> 
> Regarding the patches, the problem they solve impacts all Armada boards
> (370 and XP) which are used for network tasks. I think it would be nice
> to have those backported to stable. I can commit to do the tests of the
> backports both on XP and 370 hardware down to 3.12 or 3.11 kernel if it
> can help. 

I think so. I've been successfully using them from 3.10 and upwards.

Cheers,
Willy

^ permalink raw reply

* Re: pull request (net-next): ipsec-next 2014-01-14
From: David Miller @ 2014-01-14  7:14 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 14 Jan 2014 07:49:04 +0100

> This pull request has a merge conflict between commits be7928d20bab
> ("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
> da7c224b1baa ("net: xfrm: xfrm_policy: silence compiler warning") from
> the net-next tree and commit 2f3ea9a95c58 ("xfrm: checkpatch erros with
> inline keyword position") from the ipsec-next tree.
> 
> The version from net-next can be used, like it is done in linux-next.
> 
> 1) Checkpatch cleanups, from Weilong Chen.
> 
> 2) Fix lockdep complaints when pktgen is used with IPsec,
>    from Fan Du.
> 
> 3) Update pktgen to allow any combination of IPsec transport/tunnel mode
>    and AH/ESP/IPcomp type, from Fan Du.
> 
> 4) Make pktgen_dst_metrics static, Fengguang Wu.
> 
> 5) Compile fix for pktgen when CONFIG_XFRM is not set,
>    from Fan Du.
> 
> Please pull or let me know if there are problems.

Pulled, thanks for the heads up about the merge conflicts.

^ permalink raw reply

* Re: [PATCH V2 0/4] misc: xgene: Add support for APM X-Gene SoC Queue Manager/Traffic Manager
From: Arnd Bergmann @ 2014-01-14  6:58 UTC (permalink / raw)
  To: Ravi Patel
  Cc: Greg KH, Loc Ho, davem, netdev, linux-kernel,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Jon Masters, patches@apm.com, Keyur Chudgar
In-Reply-To: <CAN1v_Pt9rOGn8zmf+u7MR6-GJPDPswBbuhXu6i=MJVa2mwqNtw@mail.gmail.com>

On Monday 13 January 2014, Ravi Patel wrote:
> > For inbound messages, the QMTM serves a similar purpose as an MSI
> > controller, ensuring that inbound DMA data has arrived in RAM
> > before an interrupt is delivered to the CPU and thereby avoiding
> > the need for an expensive MMIO read to serialize the DMA.
> 
> For inbound messages, slave device generates message on a completion
> of a inbound DMA operation or any relevant operation targeted to the
> CPU. The QMTM's role is to just trigger an interrupt to CPU when there
> is a new message arrived from a slave device. QMTM doesn't know what
> the message was for. It is upto the upper layer drivers to decide how
> to process this message.

That doesn't seem to contradict what I wrote above. The DMA ordering
would be an implicit side-effect of the message generated by the
slave device if the QMTM is on the same bus as the external memory
controller and the message has the "strict ordering" bit set on the
bus transaction.

	Arnd

^ permalink raw reply

* Re: [PATCH net-next 1/3] bonding: update the primary slave when slave's name changed
From: Ding Tianhong @ 2014-01-14  6:51 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: Jay Vosburgh, David S. Miller, Netdev
In-Reply-To: <20140114063847.GB7798@redhat.com>

On 2014/1/14 14:38, Veaceslav Falico wrote:
> On Tue, Jan 14, 2014 at 10:36:56AM +0800, Ding Tianhong wrote:
>> If the slave's name changed, and the bond params primary is exist,
>> the bond should deal with the situation in two ways:
>>
>> 1) If the slave is the primary slave yet, clean the primary slave
>>   and reselect active slave.
>> 2) If the slave's new name is as same as bond primary, set the slave
>>   as primary slave and reselect active slave.
>>
>> Thanks for Veaceslav's suggestion.
>>
>> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
> 
> As in my previous email - please, don't use my name until I say so.
> 
> I'll add my signed-off-by to any patch that I've worked enough on.
> 
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>> drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++++++++++--
>> 1 file changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index e06c445..63d6533 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2860,9 +2860,35 @@ static int bond_slave_netdev_event(unsigned long event,
>>          */
>>         break;
>>     case NETDEV_CHANGENAME:
>> -        /*
>> -         * TODO: handle changing the primary's name
>> +        /* Handle changing the slave's name:
>> +         * 1) If the slave is primary save yet,
>> +         * clean the primary slave and reselect
>> +         * active slave.
> 
> I usually don't mind bad english (as I myself am speaking quite horrible
> one), but I can't really understand what you've meant here. And given that
> it's a comment in code - please, proof-read it first.
> 
>> +         * 2) If the slave's new name is bond
>> +         * primary, set the slave as primary
>> +         * slave and reselect active slave.
>>          */
>> +        if (USES_PRIMARY(bond->params.mode) &&
>> +            bond->params.primary[0]) {
> 
> Too many indentions. Verify if we're not using primary or primary string
> name is null and break, otherwise go further.
> 
>> +            if (bond->primary_slave &&
>> +                slave == bond->primary_slave) {
> 
> Useless verification, slave can't be NULL.
> 
>> +                pr_info("%s: Setting primary slave to None.\n",
>> +                    bond->dev->name);
>> +                bond->primary_slave = NULL;
>> +                write_lock_bh(&bond->curr_slave_lock);
>> +                bond_select_active_slave(bond);
> 
> Get bond_select_active_slave() out of if()s, you use it twice here.
> 
>> +                write_unlock_bh(&bond->curr_slave_lock);
>> +            } else if (!bond->primary_slave &&
> 
> Useless verification, if the name of a slave changed to our params.primary
> - then it means that bond->primary_slave was NULL, as it can only be
> not-null when we have a matching interface, and that would mean that we
> have two interfaces with the same name.
> 
>> +                   !strcmp(bond->params.primary,
>> +                      slave_dev->name)) {
>> +                pr_info("%s: Setting %s as primary slave.\n",
>> +                    bond->dev->name, slave_dev->name);
>> +                bond->primary_slave = slave;
>> +                write_lock_bh(&bond->curr_slave_lock);
>> +                bond_select_active_slave(bond);
>> +                write_unlock_bh(&bond->curr_slave_lock);
>> +            }
>> +        }
>>         break;
>>     case NETDEV_FEAT_CHANGE:
>>         bond_compute_features(bond);
>> -- 
>> 1.8.0
>>
>>
> 

fix in v2, thanks.

Regards
Ding 

> .
> 

^ permalink raw reply

* Re: [PATCH net-next] bonding: don't permit slaves to change their mtu
From: Ding Tianhong @ 2014-01-14  6:51 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: Jay Vosburgh, David S. Miller, Netdev
In-Reply-To: <20140114061556.GA7798@redhat.com>

On 2014/1/14 14:15, Veaceslav Falico wrote:
> On Tue, Jan 14, 2014 at 11:01:59AM +0800, Ding Tianhong wrote:
>> The commit 2315dc91a5059d7da9a8b9b9daf78d695c11383e
>> (net: make dev_set_mtu() honor notification return code)
>> will deal with the return value for NETDEV_CHANGEMTU notification,
>> and the slaves should not change their mtu, so add return value
>> to prevent doing it.
> 
> In another email you said you've tested the mtu changes and some of the
> bonds have packet loss when mtu is changed, and some of them don't.
> 
> Maybe it'd be good to understand which modes can tolerate the mtu change
> (if it can be tolerated at all/if it should really matter) and allow it for
> specific bond modes only/for any bond modes?
> 
Ok, need more analysis.


>>
>> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
> 
> Don't add my name unless I specifically ask you to, please.
> 
> Thank you.
> 

Ok

>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>> drivers/net/bonding/bond_main.c | 16 ++++------------
>> 1 file changed, 4 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index e06c445..af4e678 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -2846,19 +2846,11 @@ static int bond_slave_netdev_event(unsigned long event,
>>          */
>>         break;
>>     case NETDEV_CHANGEMTU:
>> -        /*
>> -         * TODO: Should slaves be allowed to
>> -         * independently alter their MTU?  For
>> -         * an active-backup bond, slaves need
>> -         * not be the same type of device, so
>> -         * MTUs may vary.  For other modes,
>> -         * slaves arguably should have the
>> -         * same MTUs. To do this, we'd need to
>> -         * take over the slave's change_mtu
>> -         * function for the duration of their
>> -         * servitude.
>> +        /* The master and slaves should have the
>> +         * the same mtu, so do't permit slaves
>> +         * to change their mtu independently.
>>          */
>> -        break;
>> +        return NOTIFY_BAD;
>>     case NETDEV_CHANGENAME:
>>         /*
>>          * TODO: handle changing the primary's name
>> -- 
>> 1.8.0
>>
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply

* [PATCH net-next] tun/macvtap: limit the packets queued through rcvbuf
From: Jason Wang @ 2014-01-14  6:53 UTC (permalink / raw)
  To: davem, netdev, linux-kernel
  Cc: Jason Wang, Vlad Yasevich, Michael S. Tsirkin, John Fastabend,
	Stephen Hemminger, Herbert Xu

We used to limit the number of packets queued through tx_queue_length. This
has several issues:

- tx_queue_length is the control of qdisc queue length, simply reusing it
  to control the packets queued by device may cause confusion.
- After commit 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 ("macvtap: Add
  support of packet capture on macvtap device."), an unexpected qdisc
  caused by non-zero tx_queue_length will lead qdisc lock contention for
  multiqueue deivce.
- What we really want is to limit the total amount of memory occupied not
  the number of packets.

So this patch tries to solve the above issues by using socket rcvbuf to
limit the packets could be queued for tun/macvtap. This was done by using
sock_queue_rcv_skb() instead of a direct call to skb_queue_tail(). Also two
new ioctl() were introduced for userspace to change the rcvbuf like what we
have done for sndbuf.

With this fix, we can safely change the tx_queue_len of macvtap to
zero. This will make multiqueue works without extra lock contention.

Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/macvtap.c       | 31 ++++++++++++++++++++---------
 drivers/net/tun.c           | 48 +++++++++++++++++++++++++++++++++------------
 include/uapi/linux/if_tun.h |  3 +++
 3 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index a2c3a89..c429c56 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -292,9 +292,6 @@ static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb)
 	if (!q)
 		return RX_HANDLER_PASS;
 
-	if (skb_queue_len(&q->sk.sk_receive_queue) >= dev->tx_queue_len)
-		goto drop;
-
 	skb_push(skb, ETH_HLEN);
 
 	/* Apply the forward feature mask so that we perform segmentation
@@ -310,8 +307,10 @@ static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb)
 			goto drop;
 
 		if (!segs) {
-			skb_queue_tail(&q->sk.sk_receive_queue, skb);
-			goto wake_up;
+			if (sock_queue_rcv_skb(&q->sk, skb))
+				goto drop;
+			else
+				goto wake_up;
 		}
 
 		kfree_skb(skb);
@@ -319,11 +318,17 @@ static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb)
 			struct sk_buff *nskb = segs->next;
 
 			segs->next = NULL;
-			skb_queue_tail(&q->sk.sk_receive_queue, segs);
+			if (sock_queue_rcv_skb(&q->sk, segs)) {
+				skb = segs;
+				skb->next = nskb;
+				goto drop;
+			}
+
 			segs = nskb;
 		}
 	} else {
-		skb_queue_tail(&q->sk.sk_receive_queue, skb);
+		if (sock_queue_rcv_skb(&q->sk, skb))
+			goto drop;
 	}
 
 wake_up:
@@ -333,7 +338,7 @@ wake_up:
 drop:
 	/* Count errors/drops only here, thus don't care about args. */
 	macvlan_count_rx(vlan, 0, 0, 0);
-	kfree_skb(skb);
+	kfree_skb_list(skb);
 	return RX_HANDLER_CONSUMED;
 }
 
@@ -414,7 +419,7 @@ static void macvtap_dellink(struct net_device *dev,
 static void macvtap_setup(struct net_device *dev)
 {
 	macvlan_common_setup(dev);
-	dev->tx_queue_len = TUN_READQ_SIZE;
+	dev->tx_queue_len = 0;
 }
 
 static struct rtnl_link_ops macvtap_link_ops __read_mostly = {
@@ -469,6 +474,7 @@ static int macvtap_open(struct inode *inode, struct file *file)
 	sock_init_data(&q->sock, &q->sk);
 	q->sk.sk_write_space = macvtap_sock_write_space;
 	q->sk.sk_destruct = macvtap_sock_destruct;
+	q->sk.sk_rcvbuf = TUN_RCVBUF;
 	q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
 	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
@@ -1040,6 +1046,13 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
 		q->sk.sk_sndbuf = u;
 		return 0;
 
+	case TUNSETRCVBUF:
+		if (get_user(u, up))
+			return -EFAULT;
+
+		q->sk.sk_rcvbuf = u;
+		return 0;
+
 	case TUNGETVNETHDRSZ:
 		s = q->vnet_hdr_sz;
 		if (put_user(s, sp))
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 09f6662..7a08fa3 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -177,6 +177,7 @@ struct tun_struct {
 
 	int			vnet_hdr_sz;
 	int			sndbuf;
+	int			rcvbuf;
 	struct tap_filter	txflt;
 	struct sock_fprog	fprog;
 	/* protected by rtnl lock */
@@ -771,17 +772,6 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (!check_filter(&tun->txflt, skb))
 		goto drop;
 
-	if (tfile->socket.sk->sk_filter &&
-	    sk_filter(tfile->socket.sk, skb))
-		goto drop;
-
-	/* Limit the number of packets queued by dividing txq length with the
-	 * number of queues.
-	 */
-	if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
-			  >= dev->tx_queue_len / tun->numqueues)
-		goto drop;
-
 	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
 		goto drop;
 
@@ -798,7 +788,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	nf_reset(skb);
 
 	/* Enqueue packet */
-	skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
+	if (sock_queue_rcv_skb(tfile->socket.sk, skb))
+		goto drop;
 
 	/* Notify and wake up reader process */
 	if (tfile->flags & TUN_FASYNC)
@@ -1668,6 +1659,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		tun->filter_attached = false;
 		tun->sndbuf = tfile->socket.sk->sk_sndbuf;
+		tun->rcvbuf = tfile->socket.sk->sk_rcvbuf;
 
 		spin_lock_init(&tun->lock);
 
@@ -1837,6 +1829,17 @@ static void tun_set_sndbuf(struct tun_struct *tun)
 	}
 }
 
+static void tun_set_rcvbuf(struct tun_struct *tun)
+{
+	struct tun_file *tfile;
+	int i;
+
+	for (i = 0; i < tun->numqueues; i++) {
+		tfile = rtnl_dereference(tun->tfiles[i]);
+		tfile->socket.sk->sk_sndbuf = tun->sndbuf;
+	}
+}
+
 static int tun_set_queue(struct file *file, struct ifreq *ifr)
 {
 	struct tun_file *tfile = file->private_data;
@@ -1878,7 +1881,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 	struct ifreq ifr;
 	kuid_t owner;
 	kgid_t group;
-	int sndbuf;
+	int sndbuf, rcvbuf;
 	int vnet_hdr_sz;
 	unsigned int ifindex;
 	int ret;
@@ -2061,6 +2064,22 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
 		tun_set_sndbuf(tun);
 		break;
 
+	case TUNGETRCVBUF:
+		rcvbuf = tfile->socket.sk->sk_rcvbuf;
+		if (copy_to_user(argp, &rcvbuf, sizeof(rcvbuf)))
+			ret = -EFAULT;
+		break;
+
+	case TUNSETRCVBUF:
+		if (copy_from_user(&rcvbuf, argp, sizeof(rcvbuf))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		tun->rcvbuf = rcvbuf;
+		tun_set_rcvbuf(tun);
+		break;
+
 	case TUNGETVNETHDRSZ:
 		vnet_hdr_sz = tun->vnet_hdr_sz;
 		if (copy_to_user(argp, &vnet_hdr_sz, sizeof(vnet_hdr_sz)))
@@ -2139,6 +2158,8 @@ static long tun_chr_compat_ioctl(struct file *file,
 	case TUNSETTXFILTER:
 	case TUNGETSNDBUF:
 	case TUNSETSNDBUF:
+	case TUNGETRCVBUF:
+	case TUNSETRCVBUF:
 	case SIOCGIFHWADDR:
 	case SIOCSIFHWADDR:
 		arg = (unsigned long)compat_ptr(arg);
@@ -2204,6 +2225,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 
 	tfile->sk.sk_write_space = tun_sock_write_space;
 	tfile->sk.sk_sndbuf = INT_MAX;
+	tfile->sk.sk_rcvbuf = TUN_RCVBUF;
 
 	file->private_data = tfile;
 	set_bit(SOCK_EXTERNALLY_ALLOCATED, &tfile->socket.flags);
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index e9502dd..8e04657 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -22,6 +22,7 @@
 
 /* Read queue size */
 #define TUN_READQ_SIZE	500
+#define TUN_RCVBUF	(512 * PAGE_SIZE)
 
 /* TUN device flags */
 #define TUN_TUN_DEV 	0x0001	
@@ -58,6 +59,8 @@
 #define TUNSETQUEUE  _IOW('T', 217, int)
 #define TUNSETIFINDEX	_IOW('T', 218, unsigned int)
 #define TUNGETFILTER _IOR('T', 219, struct sock_fprog)
+#define TUNGETRCVBUF   _IOR('T', 220, int)
+#define TUNSETRCVBUF   _IOW('T', 221, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN		0x0001
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 14/15] pktgen_dst_metrics[] can be static
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fengguang Wu <fengguang.wu@intel.com>

CC: Fan Du <fan.du@windriver.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/pktgen.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 45ba476..a37ec53 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2500,7 +2500,7 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
 
 
 #ifdef CONFIG_XFRM
-u32 pktgen_dst_metrics[RTAX_MAX + 1] = {
+static u32 pktgen_dst_metrics[RTAX_MAX + 1] = {
 
 	[RTAX_HOPLIMIT] = 0x5, /* Set a static hoplimit */
 };
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 15/15] {xfrm,pktgen} Fix compiling error when CONFIG_XFRM is not set
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

0-DAY kernel build testing backend reported below error:
All error/warnings:

   net/core/pktgen.c: In function 'pktgen_if_write':
>> >> net/core/pktgen.c:1487:10: error: 'struct pktgen_dev' has no member named 'spi'
>> >> net/core/pktgen.c:1488:43: error: 'struct pktgen_dev' has no member named 'spi'

Fix this by encapuslating the code with CONFIG_XFRM.

Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/pktgen.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index a37ec53..fa3e128 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1482,7 +1482,7 @@ static ssize_t pktgen_if_write(struct file *file,
 		sprintf(pg_result, "OK: flows=%u", pkt_dev->cflows);
 		return count;
 	}
-
+#ifdef CONFIG_XFRM
 	if (!strcmp(name, "spi")) {
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
@@ -1493,7 +1493,7 @@ static ssize_t pktgen_if_write(struct file *file,
 		sprintf(pg_result, "OK: spi=%u", pkt_dev->spi);
 		return count;
 	}
-
+#endif
 	if (!strcmp(name, "flowlen")) {
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 13/15] {pktgen, xfrm} Document IPsec usage in pktgen.txt
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

Update pktgen.txt for reference when using IPsec.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 Documentation/networking/pktgen.txt |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index 75e4fd7..5a61a240 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -108,7 +108,9 @@ Examples:
                               MPLS_RND, VID_RND, SVID_RND
                               QUEUE_MAP_RND # queue map random
                               QUEUE_MAP_CPU # queue map mirrors smp_processor_id()
+                              IPSEC # Make IPsec encapsulation for packet
 
+ pgset spi SPI_VALUE     Set specific SA used to transform packet.
 
  pgset "udp_src_min 9"   set UDP source port min, If < udp_src_max, then
                          cycle through the port range.
@@ -177,6 +179,18 @@ Note when adding devices to a specific CPU there good idea to also assign
 /proc/irq/XX/smp_affinity so the TX-interrupts gets bound to the same CPU.
 as this reduces cache bouncing when freeing skb's.
 
+Enable IPsec
+============
+Default IPsec transformation with ESP encapsulation plus Transport mode
+could be enabled by simply setting:
+
+pgset "flag IPSEC"
+pgset "flows 1"
+
+To avoid breaking existing testbed scripts for using AH type and tunnel mode,
+user could use "pgset spi SPI_VALUE" to specify which formal of transformation
+to employ.
+
 
 Current commands and configuration options
 ==========================================
@@ -225,6 +239,7 @@ flag
   UDPDST_RND
   MACSRC_RND
   MACDST_RND
+  IPSEC
 
 dst_min
 dst_max
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 10/15] {pktgen, xfrm} Construct skb dst for tunnel mode transformation
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

IPsec tunnel mode encapuslation needs to set outter ip header
with right protocol/ttl/id value with regard to skb->dst->child.

Looking up a rt in a standard way is absolutely wrong for every
packet transmission. In a simple way, construct a dst by setting
neccessary information to make tunnel mode encapuslation working.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/pktgen.c |   28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 8bc4ddd..628f7c5 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -390,6 +390,8 @@ struct pktgen_dev {
 	__u8	ipsmode;		/* IPSEC mode (config) */
 	__u8	ipsproto;		/* IPSEC type (config) */
 	__u32	spi;
+	struct dst_entry dst;
+	struct dst_ops dstops;
 #endif
 	char result[512];
 };
@@ -2487,6 +2489,11 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
 
 
 #ifdef CONFIG_XFRM
+u32 pktgen_dst_metrics[RTAX_MAX + 1] = {
+
+	[RTAX_HOPLIMIT] = 0x5, /* Set a static hoplimit */
+};
+
 static int pktgen_output_ipsec(struct sk_buff *skb, struct pktgen_dev *pkt_dev)
 {
 	struct xfrm_state *x = pkt_dev->flows[pkt_dev->curfl].x;
@@ -2497,10 +2504,18 @@ static int pktgen_output_ipsec(struct sk_buff *skb, struct pktgen_dev *pkt_dev)
 		return 0;
 	/* XXX: we dont support tunnel mode for now until
 	 * we resolve the dst issue */
-	if (x->props.mode != XFRM_MODE_TRANSPORT)
+	if ((x->props.mode != XFRM_MODE_TRANSPORT) && (pkt_dev->spi == 0))
 		return 0;
 
+	/* But when user specify an valid SPI, transformation
+	 * supports both transport/tunnel mode + ESP/AH type.
+	 */
+	if ((x->props.mode == XFRM_MODE_TUNNEL) && (pkt_dev->spi != 0))
+		skb->_skb_refdst = (unsigned long)&pkt_dev->dst | SKB_DST_NOREF;
+
+	rcu_read_lock_bh();
 	err = x->outer_mode->output(x, skb);
+	rcu_read_unlock_bh();
 	if (err) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEMODEERROR);
 		goto error;
@@ -3557,6 +3572,17 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 #ifdef CONFIG_XFRM
 	pkt_dev->ipsmode = XFRM_MODE_TRANSPORT;
 	pkt_dev->ipsproto = IPPROTO_ESP;
+
+	/* xfrm tunnel mode needs additional dst to extract outter
+	 * ip header protocol/ttl/id field, here creat a phony one.
+	 * instead of looking for a valid rt, which definitely hurting
+	 * performance under such circumstance.
+	 */
+	pkt_dev->dstops.family = AF_INET;
+	pkt_dev->dst.dev = pkt_dev->odev;
+	dst_init_metrics(&pkt_dev->dst, pktgen_dst_metrics, false);
+	pkt_dev->dst.child = &pkt_dev->dst;
+	pkt_dev->dst.ops = &pkt_dev->dstops;
 #endif
 
 	return add_dev_to_thread(t, pkt_dev);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 11/15] {pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h    |    2 ++
 net/core/pktgen.c     |   22 +++++++++++++++-------
 net/xfrm/xfrm_state.c |   22 ++++++++++++++++++++++
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b7635ef..cd7c46f 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1421,6 +1421,8 @@ struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark,
 				       xfrm_address_t *saddr,
 				       unsigned short family,
 				       u8 mode, u8 proto, u32 reqid);
+struct xfrm_state *xfrm_state_lookup_byspi(struct net *net, __be32 spi,
+					      unsigned short family);
 int xfrm_state_check_expire(struct xfrm_state *x);
 void xfrm_state_insert(struct xfrm_state *x);
 int xfrm_state_add(struct xfrm_state *x);
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 628f7c5..b553c36 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2247,13 +2247,21 @@ static void get_ipsec_sa(struct pktgen_dev *pkt_dev, int flow)
 	struct xfrm_state *x = pkt_dev->flows[flow].x;
 	struct pktgen_net *pn = net_generic(dev_net(pkt_dev->odev), pg_net_id);
 	if (!x) {
-		/*slow path: we dont already have xfrm_state*/
-		x = xfrm_stateonly_find(pn->net, DUMMY_MARK,
-					(xfrm_address_t *)&pkt_dev->cur_daddr,
-					(xfrm_address_t *)&pkt_dev->cur_saddr,
-					AF_INET,
-					pkt_dev->ipsmode,
-					pkt_dev->ipsproto, 0);
+
+		if (pkt_dev->spi) {
+			/* We need as quick as possible to find the right SA
+			 * Searching with minimum criteria to archieve this.
+			 */
+			x = xfrm_state_lookup_byspi(pn->net, htonl(pkt_dev->spi), AF_INET);
+		} else {
+			/* slow path: we dont already have xfrm_state */
+			x = xfrm_stateonly_find(pn->net, DUMMY_MARK,
+						(xfrm_address_t *)&pkt_dev->cur_daddr,
+						(xfrm_address_t *)&pkt_dev->cur_saddr,
+						AF_INET,
+						pkt_dev->ipsmode,
+						pkt_dev->ipsproto, 0);
+		}
 		if (x) {
 			pkt_dev->flows[flow].x = x;
 			set_pkt_overhead(pkt_dev);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3007440..6218148 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -915,6 +915,28 @@ xfrm_stateonly_find(struct net *net, u32 mark,
 }
 EXPORT_SYMBOL(xfrm_stateonly_find);
 
+struct xfrm_state *xfrm_state_lookup_byspi(struct net *net, __be32 spi,
+					      unsigned short family)
+{
+	struct xfrm_state *x;
+	struct xfrm_state_walk *w;
+
+	spin_lock_bh(&net->xfrm.xfrm_state_lock);
+	list_for_each_entry(w, &net->xfrm.state_all, all) {
+		x = container_of(w, struct xfrm_state, km);
+		if (x->props.family != family ||
+			x->id.spi != spi)
+			continue;
+
+		spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+		xfrm_state_hold(x);
+		return x;
+	}
+	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+	return NULL;
+}
+EXPORT_SYMBOL(xfrm_state_lookup_byspi);
+
 static void __xfrm_state_insert(struct xfrm_state *x)
 {
 	struct net *net = xs_net(x);
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 12/15] {pktgen, xfrm} Show spi value properly when ipsec turned on
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

If user run pktgen plus ipsec by using spi, show spi value
properly when cat /proc/net/pktgen/ethX

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/pktgen.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index b553c36..45ba476 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -657,8 +657,11 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
 	}
 
 #ifdef CONFIG_XFRM
-	if (pkt_dev->flags & F_IPSEC_ON)
+	if (pkt_dev->flags & F_IPSEC_ON) {
 		seq_printf(seq,  "IPSEC  ");
+		if (pkt_dev->spi)
+			seq_printf(seq, "spi:%u", pkt_dev->spi);
+	}
 #endif
 
 	if (pkt_dev->flags & F_MACSRC_RND)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 08/15] {pktgen, xfrm} Correct xfrm_state_lock usage in xfrm_stateonly_find
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

Acquiring xfrm_state_lock in process context is expected to turn BH off,
as this lock is also used in BH context, namely xfrm state timer handler.
Otherwise it surprises LOCKDEP with below messages.

[   81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74
[   81.725194]
[   81.725211] =========================================================
[   81.725212] [ INFO: possible irq lock inversion dependency detected ]
[   81.725215] 3.13.0-rc2+ #92 Not tainted
[   81.725216] ---------------------------------------------------------
[   81.725218] kpktgend_0/2780 just changed the state of lock:
[   81.725220]  (xfrm_state_lock){+.+...}, at: [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past:
[   81.725232]  (&(&x->lock)->rlock){+.-...}
[   81.725232]
[   81.725232] and interrupts could create inverse lock ordering between them.
[   81.725232]
[   81.725235]
[   81.725235] other info that might help us debug this:
[   81.725237]  Possible interrupt unsafe locking scenario:
[   81.725237]
[   81.725238]        CPU0                    CPU1
[   81.725240]        ----                    ----
[   81.725241]   lock(xfrm_state_lock);
[   81.725243]                                local_irq_disable();
[   81.725244]                                lock(&(&x->lock)->rlock);
[   81.725246]                                lock(xfrm_state_lock);
[   81.725248]   <Interrupt>
[   81.725249]     lock(&(&x->lock)->rlock);
[   81.725251]
[   81.725251]  *** DEADLOCK ***
[   81.725251]
[   81.725254] no locks held by kpktgend_0/2780.
[   81.725255]
[   81.725255] the shortest dependencies between 2nd lock and 1st lock:
[   81.725269]  -> (&(&x->lock)->rlock){+.-...} ops: 8 {
[   81.725274]     HARDIRQ-ON-W at:
[   81.725276]                       [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[   81.725282]                       [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725284]                       [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725289]                       [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   81.725292]                       [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   81.725300]                       [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   81.725303]                       [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   81.725305]                       [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   81.725308]                       [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   81.725313]                       [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   81.725316]                       [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   81.725329]                       [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   81.725333]                       [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[   81.725338]     IN-SOFTIRQ-W at:
[   81.725340]                       [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
[   81.725342]                       [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725344]                       [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725347]                       [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   81.725349]                       [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   81.725352]                       [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   81.725355]                       [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   81.725358]                       [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   81.725360]                       [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   81.725363]                       [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   81.725365]                       [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   81.725368]                       [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   81.725370]                       [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[   81.725373]     INITIAL USE at:
[   81.725375]                      [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[   81.725385]                      [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725388]                      [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725390]                      [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   81.725394]                      [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   81.725398]                      [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   81.725401]                      [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   81.725404]                      [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   81.725407]                      [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   81.725409]                      [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   81.725412]                      [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   81.725415]                      [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   81.725417]                      [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[   81.725420]   }
[   81.725421]   ... key      at: [<ffffffff8295b9c8>] __key.46349+0x0/0x8
[   81.725445]   ... acquired at:
[   81.725446]    [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725449]    [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725452]    [<ffffffff816dc057>] __xfrm_state_delete+0x37/0x140
[   81.725454]    [<ffffffff816dc18c>] xfrm_state_delete+0x2c/0x50
[   81.725456]    [<ffffffff816dc277>] xfrm_state_flush+0xc7/0x1b0
[   81.725458]    [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[   81.725465]    [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[   81.725468]    [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[   81.725471]    [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[   81.725476]    [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[   81.725479]    [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[   81.725482]    [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[   81.725484]
[   81.725486] -> (xfrm_state_lock){+.+...} ops: 11 {
[   81.725490]    HARDIRQ-ON-W at:
[   81.725493]                     [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[   81.725504]                     [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725507]                     [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[   81.725510]                     [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[   81.725513]                     [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[   81.725516]                     [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[   81.725519]                     [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[   81.725522]                     [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[   81.725525]                     [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[   81.725527]                     [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[   81.725530]                     [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[   81.725533]    SOFTIRQ-ON-W at:
[   81.725534]                     [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   81.725537]                     [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725539]                     [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725541]                     [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725544]                     [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[   81.725547]                     [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[   81.725550]                     [<ffffffff81078f84>] kthread+0xe4/0x100
[   81.725555]                     [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   81.725565]    INITIAL USE at:
[   81.725567]                    [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[   81.725569]                    [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725572]                    [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[   81.725574]                    [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[   81.725576]                    [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[   81.725580]                    [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[   81.725583]                    [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[   81.725586]                    [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[   81.725589]                    [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[   81.725594]                    [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[   81.725597]                    [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[   81.725599]  }
[   81.725600]  ... key      at: [<ffffffff81cadef8>] xfrm_state_lock+0x18/0x50
[   81.725606]  ... acquired at:
[   81.725607]    [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[   81.725609]    [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[   81.725611]    [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   81.725614]    [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725616]    [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725627]    [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725629]    [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[   81.725632]    [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[   81.725635]    [<ffffffff81078f84>] kthread+0xe4/0x100
[   81.725637]    [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   81.725640]
[   81.725641]
[   81.725641] stack backtrace:
[   81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92
[   81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[   81.725649]  ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007
[   81.725652]  ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80
[   81.725655]  ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8
[   81.725659] Call Trace:
[   81.725664]  [<ffffffff8176af37>] dump_stack+0x46/0x58
[   81.725667]  [<ffffffff81099358>] print_irq_inversion_bug.part.42+0x1e8/0x1f0
[   81.725670]  [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[   81.725672]  [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[   81.725675]  [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
[   81.725685]  [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   81.725691]  [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[   81.725694]  [<ffffffff81089b38>] ? sched_clock_cpu+0xa8/0x120
[   81.725697]  [<ffffffff8109a31a>] ? __lock_acquire+0x32a/0x1d70
[   81.725699]  [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[   81.725702]  [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725704]  [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[   81.725707]  [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[   81.725710]  [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725712]  [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[   81.725715]  [<ffffffff810971ec>] ? lock_release_holdtime.part.26+0x1c/0x1a0
[   81.725717]  [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725721]  [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[   81.725724]  [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[   81.725727]  [<ffffffffa008ba71>] ? pktgen_thread_worker+0xb11/0x1880 [pktgen]
[   81.725729]  [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
[   81.725733]  [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
[   81.725745]  [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
[   81.725751]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[   81.725753]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[   81.725757]  [<ffffffffa008af60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
[   81.725759]  [<ffffffff81078f84>] kthread+0xe4/0x100
[   81.725762]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
[   81.725765]  [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   81.725768]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 9e6a4d6..3007440 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -890,7 +890,7 @@ xfrm_stateonly_find(struct net *net, u32 mark,
 	unsigned int h;
 	struct xfrm_state *rx = NULL, *x = NULL;
 
-	spin_lock(&net->xfrm.xfrm_state_lock);
+	spin_lock_bh(&net->xfrm.xfrm_state_lock);
 	h = xfrm_dst_hash(net, daddr, saddr, reqid, family);
 	hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) {
 		if (x->props.family == family &&
@@ -908,7 +908,7 @@ xfrm_stateonly_find(struct net *net, u32 mark,
 
 	if (rx)
 		xfrm_state_hold(rx);
-	spin_unlock(&net->xfrm.xfrm_state_lock);
+	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
 
 
 	return rx;
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 09/15] {pktgen, xfrm} Using "pgset spi xxx" to spedifiy SA for a given flow
From: Steffen Klassert @ 2014-01-14  6:49 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1389682159-3260-1-git-send-email-steffen.klassert@secunet.com>

From: Fan Du <fan.du@windriver.com>

User could set specific SPI value to arm pktgen flow with IPsec
transformation, instead of looking up SA by sadr/daddr. The reaseon
to do so is because current state lookup scheme is both slow and, most
important of all, in fact pktgen doesn't need to match any SA state
addresses information, all it needs is the SA transfromation shell to
do the encapuslation.

And this option also provide user an alternative to using pktgen
test existing SA without creating new ones.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/pktgen.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 156d57b..8bc4ddd 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -389,6 +389,7 @@ struct pktgen_dev {
 #ifdef CONFIG_XFRM
 	__u8	ipsmode;		/* IPSEC mode (config) */
 	__u8	ipsproto;		/* IPSEC type (config) */
+	__u32	spi;
 #endif
 	char result[512];
 };
@@ -1477,6 +1478,17 @@ static ssize_t pktgen_if_write(struct file *file,
 		return count;
 	}
 
+	if (!strcmp(name, "spi")) {
+		len = num_arg(&user_buffer[i], 10, &value);
+		if (len < 0)
+			return len;
+
+		i += len;
+		pkt_dev->spi = value;
+		sprintf(pg_result, "OK: spi=%u", pkt_dev->spi);
+		return count;
+	}
+
 	if (!strcmp(name, "flowlen")) {
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
-- 
1.7.9.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox