Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 11/16] netvm: Propagate page->pfmemalloc from skb_alloc_page to skb
From: Mel Gorman @ 2012-06-27  8:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Neil Brown, Peter Zijlstra, Mike Christie, Eric B Munson,
	Eric Dumazet
In-Reply-To: <20120626201328.GI6509@breakpoint.cc>

On Tue, Jun 26, 2012 at 10:13:28PM +0200, Sebastian Andrzej Siewior wrote:
> On Fri, Jun 22, 2012 at 03:30:38PM +0100, Mel Gorman wrote:
> >  drivers/net/ethernet/chelsio/cxgb4/sge.c          |    2 +-
> >  drivers/net/ethernet/chelsio/cxgb4vf/sge.c        |    2 +-
> >  drivers/net/ethernet/intel/igb/igb_main.c         |    2 +-
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |    4 +-
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    3 +-
> >  drivers/net/usb/cdc-phonet.c                      |    2 +-
> >  drivers/usb/gadget/f_phonet.c                     |    2 +-
> 
> You did not touch all drivers which use alloc_page(s)() like e1000(e). Was
> this on purpose?
> 

Yes. The ones I changed were the semi-obvious ones and carried over from
when the patches were completely out of tree.  As the changelog notes
it is not critical that these annotation happens and can be fixed on a
per-driver basis if there are complains about network swapping being slow.

In the e1000 case, alloc_page is called from e1000_alloc_jumbo_rx_buffers
and I would not have paid quite as close attention to jumbo configurations
even though e1000 does not depend on high-order allocations like some
other drivers do. I can update e1000 if you like but it's not critical
to do so and in fact getting a bug reporting saying that network swap
was slow on e1000 would be useful to me in its own way :)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [net-next RFC V3 PATCH 4/6] tuntap: multiqueue support
From: Michael S. Tsirkin @ 2012-06-27  8:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: habanero, netdev, linux-kernel, krkumar2, tahm, akong, davem,
	shemminger, mashirle
In-Reply-To: <4FEA972E.9010104@redhat.com>

On Wed, Jun 27, 2012 at 01:16:30PM +0800, Jason Wang wrote:
> On 06/26/2012 06:42 PM, Michael S. Tsirkin wrote:
> >On Tue, Jun 26, 2012 at 11:42:17AM +0800, Jason Wang wrote:
> >>On 06/25/2012 04:25 PM, Michael S. Tsirkin wrote:
> >>>On Mon, Jun 25, 2012 at 02:10:18PM +0800, Jason Wang wrote:
> >>>>This patch adds multiqueue support for tap device. This is done by abstracting
> >>>>each queue as a file/socket and allowing multiple sockets to be attached to the
> >>>>tuntap device (an array of tun_file were stored in the tun_struct). Userspace
> >>>>could write and read from those files to do the parallel packet
> >>>>sending/receiving.
> >>>>
> >>>>Unlike the previous single queue implementation, the socket and device were
> >>>>loosely coupled, each of them were allowed to go away first. In order to let the
> >>>>tx path lockless, netif_tx_loch_bh() is replaced by RCU/NETIF_F_LLTX to
> >>>>synchronize between data path and system call.
> >>>Don't use LLTX/RCU. It's not worth it.
> >>>Use something like netif_set_real_num_tx_queues.
> >>>
> >>>>The tx queue selecting is first based on the recorded rxq index of an skb, it
> >>>>there's no such one, then choosing based on rx hashing (skb_get_rxhash()).
> >>>>
> >>>>Signed-off-by: Jason Wang<jasowang@redhat.com>
> >>>Interestingly macvtap switched to hashing first:
> >>>ef0002b577b52941fb147128f30bd1ecfdd3ff6d
> >>>(the commit log is corrupted but see what it
> >>>does in the patch).
> >>>Any idea why?
> >>Yes, so tap should be changed to behave same as macvtap. I remember
> >>the reason we do that is to make sure the packet of a single flow to
> >>be queued to a fixed socket/virtqueues. As 10g cards like ixgbe
> >>choose the rx queue for a flow based on the last tx queue where the
> >>packets of that flow comes. So if we are using recored rx queue in
> >>macvtap, the queue index of a flow would change as vhost thread
> >>moves amongs processors.
> >Hmm. OTOH if you override this, if TX is sent from VCPU0, RX might land
> >on VCPU1 in the guest, which is not good, right?
> 
> Yes, but better than making the rx moves between vcpus when we use
> recorded rx queue.

Why isn't this a problem with native TCP?
I think what happens is one of the following:
- moving between CPUs is more expensive with tun
  because it can queue so much data on xmit
- scheduler makes very bad decisions about VCPUs
  bouncing them around all the time

Could we isolate which it is? Does the problem
still happen if you pin VCPUs to host cpus?
If not it's the queue depth.

> Flow steering is needed to make sure the tx and
> rx on the same vcpu.

That involves IPI between processes, so it might be
very expensive for kvm.

> >>But during test tun/tap, one interesting thing I find is that even
> >>ixgbe has recorded the queue index during rx, it seems be lost when
> >>tap tries to transmit skbs to userspace.
> >dev_pick_tx does this I think but ndo_select_queue
> >should be able to get it without trouble.
> >
> >
> >>>>---
> >>>>  drivers/net/tun.c |  371 +++++++++++++++++++++++++++++++++--------------------
> >>>>  1 files changed, 232 insertions(+), 139 deletions(-)
> >>>>
> >>>>diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>>>index 8233b0a..5c26757 100644
> >>>>--- a/drivers/net/tun.c
> >>>>+++ b/drivers/net/tun.c
> >>>>@@ -107,6 +107,8 @@ struct tap_filter {
> >>>>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
> >>>>  };
> >>>>
> >>>>+#define MAX_TAP_QUEUES (NR_CPUS<   16 ? NR_CPUS : 16)
> >>>Why the limit? I am guessing you copied this from macvtap?
> >>>This is problematic for a number of reasons:
> >>>	- will not play well with migration
> >>>	- will not work well for a large guest
> >>>
> >>>Yes, macvtap needs to be fixed too.
> >>>
> >>>I am guessing what it is trying to prevent is queueing
> >>>up a huge number of packets?
> >>>So just divide the default tx queue limit by the # of queues.
> >>>
> >>>And by the way, for MQ applications maybe we can finally
> >>>ignore tx queue altogether and limit the total number
> >>>of bytes queued?
> >>>To avoid regressions we can make it large like 64M/# queues.
> >>>Could be a separate patch I think, and for a single queue
> >>>might need a compatible mode though I am not sure.
> >>>
> >>>>+
> >>>>  struct tun_file {
> >>>>  	struct sock sk;
> >>>>  	struct socket socket;
> >>>>@@ -114,16 +116,18 @@ struct tun_file {
> >>>>  	int vnet_hdr_sz;
> >>>>  	struct tap_filter txflt;
> >>>>  	atomic_t count;
> >>>>-	struct tun_struct *tun;
> >>>>+	struct tun_struct __rcu *tun;
> >>>>  	struct net *net;
> >>>>  	struct fasync_struct *fasync;
> >>>>  	unsigned int flags;
> >>>>+	u16 queue_index;
> >>>>  };
> >>>>
> >>>>  struct tun_sock;
> >>>>
> >>>>  struct tun_struct {
> >>>>-	struct tun_file		*tfile;
> >>>>+	struct tun_file		*tfiles[MAX_TAP_QUEUES];
> >>>>+	unsigned int            numqueues;
> >>>>  	unsigned int 		flags;
> >>>>  	uid_t			owner;
> >>>>  	gid_t			group;
> >>>>@@ -138,80 +142,159 @@ struct tun_struct {
> >>>>  #endif
> >>>>  };
> >>>>
> >>>>-static int tun_attach(struct tun_struct *tun, struct file *file)
> >>>>+static DEFINE_SPINLOCK(tun_lock);
> >>>>+
> >>>>+/*
> >>>>+ * tun_get_queue(): calculate the queue index
> >>>>+ *     - if skbs comes from mq nics, we can just borrow
> >>>>+ *     - if not, calculate from the hash
> >>>>+ */
> >>>>+static struct tun_file *tun_get_queue(struct net_device *dev,
> >>>>+				      struct sk_buff *skb)
> >>>>  {
> >>>>-	struct tun_file *tfile = file->private_data;
> >>>>-	int err;
> >>>>+	struct tun_struct *tun = netdev_priv(dev);
> >>>>+	struct tun_file *tfile = NULL;
> >>>>+	int numqueues = tun->numqueues;
> >>>>+	__u32 rxq;
> >>>>
> >>>>-	ASSERT_RTNL();
> >>>>+	BUG_ON(!rcu_read_lock_held());
> >>>>
> >>>>-	netif_tx_lock_bh(tun->dev);
> >>>>+	if (!numqueues)
> >>>>+		goto out;
> >>>>
> >>>>-	err = -EINVAL;
> >>>>-	if (tfile->tun)
> >>>>+	if (numqueues == 1) {
> >>>>+		tfile = rcu_dereference(tun->tfiles[0]);
> >>>Instead of hacks like this, you can ask for an MQ
> >>>flag to be set in SETIFF. Then you won't need to
> >>>handle attach/detach at random times.
> >>>And most of the scary num_queues checks can go away.
> >>>You can then also ask userspace about the max # of queues
> >>>to expect if you want to save some memory.
> >>>
> >>>
> >>>>  		goto out;
> >>>>+	}
> >>>>
> >>>>-	err = -EBUSY;
> >>>>-	if (tun->tfile)
> >>>>+	if (likely(skb_rx_queue_recorded(skb))) {
> >>>>+		rxq = skb_get_rx_queue(skb);
> >>>>+
> >>>>+		while (unlikely(rxq>= numqueues))
> >>>>+			rxq -= numqueues;
> >>>>+
> >>>>+		tfile = rcu_dereference(tun->tfiles[rxq]);
> >>>>  		goto out;
> >>>>+	}
> >>>>
> >>>>-	err = 0;
> >>>>-	tfile->tun = tun;
> >>>>-	tun->tfile = tfile;
> >>>>-	netif_carrier_on(tun->dev);
> >>>>-	dev_hold(tun->dev);
> >>>>-	sock_hold(&tfile->sk);
> >>>>-	atomic_inc(&tfile->count);
> >>>>+	/* Check if we can use flow to select a queue */
> >>>>+	rxq = skb_get_rxhash(skb);
> >>>>+	if (rxq) {
> >>>>+		u32 idx = ((u64)rxq * numqueues)>>   32;
> >>>This completely confuses me. What's the logic here?
> >>>How do we even know it's in range?
> >>>
> >>>>+		tfile = rcu_dereference(tun->tfiles[idx]);
> >>>>+		goto out;
> >>>>+	}
> >>>>
> >>>>+	tfile = rcu_dereference(tun->tfiles[0]);
> >>>>  out:
> >>>>-	netif_tx_unlock_bh(tun->dev);
> >>>>-	return err;
> >>>>+	return tfile;
> >>>>  }
> >>>>
> >>>>-static void __tun_detach(struct tun_struct *tun)
> >>>>+static int tun_detach(struct tun_file *tfile, bool clean)
> >>>>  {
> >>>>-	struct tun_file *tfile = tun->tfile;
> >>>>-	/* Detach from net device */
> >>>>-	netif_tx_lock_bh(tun->dev);
> >>>>-	netif_carrier_off(tun->dev);
> >>>>-	tun->tfile = NULL;
> >>>>-	netif_tx_unlock_bh(tun->dev);
> >>>>-
> >>>>-	/* Drop read queue */
> >>>>-	skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
> >>>>-
> >>>>-	/* Drop the extra count on the net device */
> >>>>-	dev_put(tun->dev);
> >>>>-}
> >>>>+	struct tun_struct *tun;
> >>>>+	struct net_device *dev = NULL;
> >>>>+	bool destroy = false;
> >>>>
> >>>>-static void tun_detach(struct tun_struct *tun)
> >>>>-{
> >>>>-	rtnl_lock();
> >>>>-	__tun_detach(tun);
> >>>>-	rtnl_unlock();
> >>>>-}
> >>>>+	spin_lock(&tun_lock);
> >>>>
> >>>>-static struct tun_struct *__tun_get(struct tun_file *tfile)
> >>>>-{
> >>>>-	struct tun_struct *tun = NULL;
> >>>>+	tun = rcu_dereference_protected(tfile->tun,
> >>>>+					lockdep_is_held(&tun_lock));
> >>>>+	if (tun) {
> >>>>+		u16 index = tfile->queue_index;
> >>>>+		BUG_ON(index>= tun->numqueues);
> >>>>+		dev = tun->dev;
> >>>>+
> >>>>+		rcu_assign_pointer(tun->tfiles[index],
> >>>>+				   tun->tfiles[tun->numqueues - 1]);
> >>>>+		tun->tfiles[index]->queue_index = index;
> >>>>+		rcu_assign_pointer(tfile->tun, NULL);
> >>>>+		--tun->numqueues;
> >>>>+		sock_put(&tfile->sk);
> >>>>
> >>>>-	if (atomic_inc_not_zero(&tfile->count))
> >>>>-		tun = tfile->tun;
> >>>>+		if (tun->numqueues == 0&&   !(tun->flags&   TUN_PERSIST))
> >>>>+			destroy = true;
> >>>Please don't use flags like that. Use dedicated labels and goto there on error.
> >>>
> >>>
> >>>>+	}
> >>>>
> >>>>-	return tun;
> >>>>+	spin_unlock(&tun_lock);
> >>>>+
> >>>>+	synchronize_rcu();
> >>>>+	if (clean)
> >>>>+		sock_put(&tfile->sk);
> >>>>+
> >>>>+	if (destroy) {
> >>>>+		rtnl_lock();
> >>>>+		if (dev->reg_state == NETREG_REGISTERED)
> >>>>+			unregister_netdevice(dev);
> >>>>+		rtnl_unlock();
> >>>>+	}
> >>>>+
> >>>>+	return 0;
> >>>>  }
> >>>>
> >>>>-static struct tun_struct *tun_get(struct file *file)
> >>>>+static void tun_detach_all(struct net_device *dev)
> >>>>  {
> >>>>-	return __tun_get(file->private_data);
> >>>>+	struct tun_struct *tun = netdev_priv(dev);
> >>>>+	struct tun_file *tfile, *tfile_list[MAX_TAP_QUEUES];
> >>>>+	int i, j = 0;
> >>>>+
> >>>>+	spin_lock(&tun_lock);
> >>>>+
> >>>>+	for (i = 0; i<   MAX_TAP_QUEUES&&   tun->numqueues; i++) {
> >>>>+		tfile = rcu_dereference_protected(tun->tfiles[i],
> >>>>+						lockdep_is_held(&tun_lock));
> >>>>+		BUG_ON(!tfile);
> >>>>+		wake_up_all(&tfile->wq.wait);
> >>>>+		tfile_list[j++] = tfile;
> >>>>+		rcu_assign_pointer(tfile->tun, NULL);
> >>>>+		--tun->numqueues;
> >>>>+	}
> >>>>+	BUG_ON(tun->numqueues != 0);
> >>>>+	/* guarantee that any future tun_attach will fail */
> >>>>+	tun->numqueues = MAX_TAP_QUEUES;
> >>>>+	spin_unlock(&tun_lock);
> >>>>+
> >>>>+	synchronize_rcu();
> >>>>+	for (--j; j>= 0; j--)
> >>>>+		sock_put(&tfile_list[j]->sk);
> >>>>  }
> >>>>
> >>>>-static void tun_put(struct tun_struct *tun)
> >>>>+static int tun_attach(struct tun_struct *tun, struct file *file)
> >>>>  {
> >>>>-	struct tun_file *tfile = tun->tfile;
> >>>>+	struct tun_file *tfile = file->private_data;
> >>>>+	int err;
> >>>>+
> >>>>+	ASSERT_RTNL();
> >>>>+
> >>>>+	spin_lock(&tun_lock);
> >>>>
> >>>>-	if (atomic_dec_and_test(&tfile->count))
> >>>>-		tun_detach(tfile->tun);
> >>>>+	err = -EINVAL;
> >>>>+	if (rcu_dereference_protected(tfile->tun, lockdep_is_held(&tun_lock)))
> >>>>+		goto out;
> >>>>+
> >>>>+	err = -EBUSY;
> >>>>+	if (!(tun->flags&   TUN_TAP_MQ)&&   tun->numqueues == 1)
> >>>>+		goto out;
> >>>>+
> >>>>+	if (tun->numqueues == MAX_TAP_QUEUES)
> >>>>+		goto out;
> >>>>+
> >>>>+	err = 0;
> >>>>+	tfile->queue_index = tun->numqueues;
> >>>>+	rcu_assign_pointer(tfile->tun, tun);
> >>>>+	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
> >>>>+	sock_hold(&tfile->sk);
> >>>>+	tun->numqueues++;
> >>>>+
> >>>>+	if (tun->numqueues == 1)
> >>>>+		netif_carrier_on(tun->dev);
> >>>>+
> >>>>+	/* device is allowed to go away first, so no need to hold extra
> >>>>+	 * refcnt. */
> >>>>+
> >>>>+out:
> >>>>+	spin_unlock(&tun_lock);
> >>>>+	return err;
> >>>>  }
> >>>>
> >>>>  /* TAP filtering */
> >>>>@@ -331,16 +414,7 @@ static const struct ethtool_ops tun_ethtool_ops;
> >>>>  /* Net device detach from fd. */
> >>>>  static void tun_net_uninit(struct net_device *dev)
> >>>>  {
> >>>>-	struct tun_struct *tun = netdev_priv(dev);
> >>>>-	struct tun_file *tfile = tun->tfile;
> >>>>-
> >>>>-	/* Inform the methods they need to stop using the dev.
> >>>>-	 */
> >>>>-	if (tfile) {
> >>>>-		wake_up_all(&tfile->wq.wait);
> >>>>-		if (atomic_dec_and_test(&tfile->count))
> >>>>-			__tun_detach(tun);
> >>>>-	}
> >>>>+	tun_detach_all(dev);
> >>>>  }
> >>>>
> >>>>  /* Net device open. */
> >>>>@@ -360,10 +434,10 @@ static int tun_net_close(struct net_device *dev)
> >>>>  /* Net device start xmit */
> >>>>  static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>  {
> >>>>-	struct tun_struct *tun = netdev_priv(dev);
> >>>>-	struct tun_file *tfile = tun->tfile;
> >>>>+	struct tun_file *tfile = NULL;
> >>>>
> >>>>-	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
> >>>>+	rcu_read_lock();
> >>>>+	tfile = tun_get_queue(dev, skb);
> >>>>
> >>>>  	/* Drop packet if interface is not attached */
> >>>>  	if (!tfile)
> >>>>@@ -381,7 +455,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>
> >>>>  	if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
> >>>>  	>= dev->tx_queue_len) {
> >>>>-		if (!(tun->flags&   TUN_ONE_QUEUE)) {
> >>>>+		if (!(tfile->flags&   TUN_ONE_QUEUE)&&
> >>>Which patch moved flags from tun to tfile?
> >>>
> >>>>+		    !(tfile->flags&   TUN_TAP_MQ)) {
> >>>>  			/* Normal queueing mode. */
> >>>>  			/* Packet scheduler handles dropping of further packets. */
> >>>>  			netif_stop_queue(dev);
> >>>>@@ -390,7 +465,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>  			 * error is more appropriate. */
> >>>>  			dev->stats.tx_fifo_errors++;
> >>>>  		} else {
> >>>>-			/* Single queue mode.
> >>>>+			/* Single queue mode or multi queue mode.
> >>>>  			 * Driver handles dropping of all packets itself. */
> >>>Please don't do this. Stop the queue on overrun as appropriate.
> >>>ONE_QUEUE is a legacy hack.
> >>>
> >>>BTW we really should stop queue before we start dropping packets,
> >>>but that can be a separate patch.
> >>>
> >>>>  			goto drop;
> >>>>  		}
> >>>>@@ -408,9 +483,11 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>  		kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
> >>>>  	wake_up_interruptible_poll(&tfile->wq.wait, POLLIN |
> >>>>  				   POLLRDNORM | POLLRDBAND);
> >>>>+	rcu_read_unlock();
> >>>>  	return NETDEV_TX_OK;
> >>>>
> >>>>  drop:
> >>>>+	rcu_read_unlock();
> >>>>  	dev->stats.tx_dropped++;
> >>>>  	kfree_skb(skb);
> >>>>  	return NETDEV_TX_OK;
> >>>>@@ -527,16 +604,22 @@ static void tun_net_init(struct net_device *dev)
> >>>>  static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
> >>>>  {
> >>>>  	struct tun_file *tfile = file->private_data;
> >>>>-	struct tun_struct *tun = __tun_get(tfile);
> >>>>+	struct tun_struct *tun = NULL;
> >>>>  	struct sock *sk;
> >>>>  	unsigned int mask = 0;
> >>>>
> >>>>-	if (!tun)
> >>>>+	if (!tfile)
> >>>>  		return POLLERR;
> >>>>
> >>>>-	sk = tfile->socket.sk;
> >>>>+	rcu_read_lock();
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>+	if (!tun) {
> >>>>+		rcu_read_unlock();
> >>>>+		return POLLERR;
> >>>>+	}
> >>>>+	rcu_read_unlock();
> >>>>
> >>>>-	tun_debug(KERN_INFO, tun, "tun_chr_poll\n");
> >>>>+	sk =&tfile->sk;
> >>>>
> >>>>  	poll_wait(file,&tfile->wq.wait, wait);
> >>>>
> >>>>@@ -548,10 +631,12 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
> >>>>  	     sock_writeable(sk)))
> >>>>  		mask |= POLLOUT | POLLWRNORM;
> >>>>
> >>>>-	if (tun->dev->reg_state != NETREG_REGISTERED)
> >>>>+	rcu_read_lock();
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>+	if (!tun || tun->dev->reg_state != NETREG_REGISTERED)
> >>>>  		mask = POLLERR;
> >>>>+	rcu_read_unlock();
> >>>>
> >>>>-	tun_put(tun);
> >>>>  	return mask;
> >>>>  }
> >>>>
> >>>>@@ -708,9 +793,12 @@ static ssize_t tun_get_user(struct tun_file *tfile,
> >>>>  		skb_shinfo(skb)->gso_segs = 0;
> >>>>  	}
> >>>>
> >>>>-	tun = __tun_get(tfile);
> >>>>-	if (!tun)
> >>>>+	rcu_read_lock();
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>+	if (!tun) {
> >>>>+		rcu_read_unlock();
> >>>>  		return -EBADFD;
> >>>>+	}
> >>>>
> >>>>  	switch (tfile->flags&   TUN_TYPE_MASK) {
> >>>>  	case TUN_TUN_DEV:
> >>>>@@ -720,26 +808,30 @@ static ssize_t tun_get_user(struct tun_file *tfile,
> >>>>  		skb->protocol = eth_type_trans(skb, tun->dev);
> >>>>  		break;
> >>>>  	}
> >>>>-
> >>>>-	netif_rx_ni(skb);
> >>>>  	tun->dev->stats.rx_packets++;
> >>>>  	tun->dev->stats.rx_bytes += len;
> >>>>-	tun_put(tun);
> >>>>+	rcu_read_unlock();
> >>>>+
> >>>>+	netif_rx_ni(skb);
> >>>>+
> >>>>  	return count;
> >>>>
> >>>>  err_free:
> >>>>  	count = -EINVAL;
> >>>>  	kfree_skb(skb);
> >>>>  err:
> >>>>-	tun = __tun_get(tfile);
> >>>>-	if (!tun)
> >>>>+	rcu_read_lock();
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>+	if (!tun) {
> >>>>+		rcu_read_unlock();
> >>>>  		return -EBADFD;
> >>>>+	}
> >>>>
> >>>>  	if (drop)
> >>>>  		tun->dev->stats.rx_dropped++;
> >>>>  	if (error)
> >>>>  		tun->dev->stats.rx_frame_errors++;
> >>>>-	tun_put(tun);
> >>>>+	rcu_read_unlock();
> >>>>  	return count;
> >>>>  }
> >>>>
> >>>>@@ -833,12 +925,13 @@ static ssize_t tun_put_user(struct tun_file *tfile,
> >>>>  	skb_copy_datagram_const_iovec(skb, 0, iv, total, len);
> >>>>  	total += skb->len;
> >>>>
> >>>>-	tun = __tun_get(tfile);
> >>>>+	rcu_read_lock();
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>  	if (tun) {
> >>>>  		tun->dev->stats.tx_packets++;
> >>>>  		tun->dev->stats.tx_bytes += len;
> >>>>-		tun_put(tun);
> >>>>  	}
> >>>>+	rcu_read_unlock();
> >>>>
> >>>>  	return total;
> >>>>  }
> >>>>@@ -869,28 +962,31 @@ static ssize_t tun_do_read(struct tun_file *tfile,
> >>>>  				break;
> >>>>  			}
> >>>>
> >>>>-			tun = __tun_get(tfile);
> >>>>+			rcu_read_lock();
> >>>>+			tun = rcu_dereference(tfile->tun);
> >>>>  			if (!tun) {
> >>>>-				ret = -EIO;
> >>>>+				ret = -EBADFD;
> >>>BADFD is for when you get passed something like -1 fd.
> >>>Here fd is OK, it's just in a bad state so you can not do IO.
> >>>
> >>>
> >>>>+				rcu_read_unlock();
> >>>>  				break;
> >>>>  			}
> >>>>  			if (tun->dev->reg_state != NETREG_REGISTERED) {
> >>>>  				ret = -EIO;
> >>>>-				tun_put(tun);
> >>>>+				rcu_read_unlock();
> >>>>  				break;
> >>>>  			}
> >>>>-			tun_put(tun);
> >>>>+			rcu_read_unlock();
> >>>>
> >>>>  			/* Nothing to read, let's sleep */
> >>>>  			schedule();
> >>>>  			continue;
> >>>>  		}
> >>>>
> >>>>-		tun = __tun_get(tfile);
> >>>>+		rcu_read_lock();
> >>>>+		tun = rcu_dereference(tfile->tun);
> >>>>  		if (tun) {
> >>>>  			netif_wake_queue(tun->dev);
> >>>>-			tun_put(tun);
> >>>>  		}
> >>>>+		rcu_read_unlock();
> >>>>
> >>>>  		ret = tun_put_user(tfile, skb, iv, len);
> >>>>  		kfree_skb(skb);
> >>>>@@ -1038,6 +1134,9 @@ static int tun_flags(struct tun_struct *tun)
> >>>>  	if (tun->flags&   TUN_VNET_HDR)
> >>>>  		flags |= IFF_VNET_HDR;
> >>>>
> >>>>+	if (tun->flags&   TUN_TAP_MQ)
> >>>>+		flags |= IFF_MULTI_QUEUE;
> >>>>+
> >>>>  	return flags;
> >>>>  }
> >>>>
> >>>>@@ -1097,8 +1196,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >>>>  		err = tun_attach(tun, file);
> >>>>  		if (err<   0)
> >>>>  			return err;
> >>>>-	}
> >>>>-	else {
> >>>>+	} else {
> >>>>  		char *name;
> >>>>  		unsigned long flags = 0;
> >>>>
> >>>>@@ -1142,6 +1240,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >>>>  		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> >>>>  			TUN_USER_FEATURES;
> >>>>  		dev->features = dev->hw_features;
> >>>>+		if (ifr->ifr_flags&   IFF_MULTI_QUEUE)
> >>>>+			dev->features |= NETIF_F_LLTX;
> >>>>
> >>>>  		err = register_netdevice(tun->dev);
> >>>>  		if (err<   0)
> >>>>@@ -1154,7 +1254,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >>>>
> >>>>  		err = tun_attach(tun, file);
> >>>>  		if (err<   0)
> >>>>-			goto failed;
> >>>>+			goto err_free_dev;
> >>>>  	}
> >>>>
> >>>>  	tun_debug(KERN_INFO, tun, "tun_set_iff\n");
> >>>>@@ -1174,6 +1274,11 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >>>>  	else
> >>>>  		tun->flags&= ~TUN_VNET_HDR;
> >>>>
> >>>>+	if (ifr->ifr_flags&   IFF_MULTI_QUEUE)
> >>>>+		tun->flags |= TUN_TAP_MQ;
> >>>>+	else
> >>>>+		tun->flags&= ~TUN_TAP_MQ;
> >>>>+
> >>>>  	/* Cache flags from tun device */
> >>>>  	tfile->flags = tun->flags;
> >>>>  	/* Make sure persistent devices do not get stuck in
> >>>>@@ -1187,7 +1292,6 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
> >>>>
> >>>>  err_free_dev:
> >>>>  	free_netdev(dev);
> >>>>-failed:
> >>>>  	return err;
> >>>>  }
> >>>>
> >>>>@@ -1264,38 +1368,40 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
> >>>>  				(unsigned int __user*)argp);
> >>>>  	}
> >>>>
> >>>>-	rtnl_lock();
> >>>>-
> >>>>-	tun = __tun_get(tfile);
> >>>>-	if (cmd == TUNSETIFF&&   !tun) {
> >>>>+	ret = 0;
> >>>>+	if (cmd == TUNSETIFF) {
> >>>>+		rtnl_lock();
> >>>>  		ifr.ifr_name[IFNAMSIZ-1] = '\0';
> >>>>-
> >>>>  		ret = tun_set_iff(tfile->net, file,&ifr);
> >>>>-
> >>>>+		rtnl_unlock();
> >>>>  		if (ret)
> >>>>-			goto unlock;
> >>>>-
> >>>>+			return ret;
> >>>>  		if (copy_to_user(argp,&ifr, ifreq_len))
> >>>>-			ret = -EFAULT;
> >>>>-		goto unlock;
> >>>>+			return -EFAULT;
> >>>>+		return ret;
> >>>>  	}
> >>>>
> >>>>+	rtnl_lock();
> >>>>+
> >>>>+	rcu_read_lock();
> >>>>+
> >>>>  	ret = -EBADFD;
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>  	if (!tun)
> >>>>  		goto unlock;
> >>>>+	else
> >>>>+		ret = 0;
> >>>>
> >>>>-	tun_debug(KERN_INFO, tun, "tun_chr_ioctl cmd %d\n", cmd);
> >>>>-
> >>>>-	ret = 0;
> >>>>  	switch (cmd) {
> >>>>  	case TUNGETIFF:
> >>>>  		ret = tun_get_iff(current->nsproxy->net_ns, tun,&ifr);
> >>>>+		rcu_read_unlock();
> >>>>  		if (ret)
> >>>>-			break;
> >>>>+			goto out;
> >>>>
> >>>>  		if (copy_to_user(argp,&ifr, ifreq_len))
> >>>>  			ret = -EFAULT;
> >>>>-		break;
> >>>>+		goto out;
> >>>>
> >>>>  	case TUNSETNOCSUM:
> >>>>  		/* Disable/Enable checksum */
> >>>>@@ -1357,9 +1463,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
> >>>>  		/* Get hw address */
> >>>>  		memcpy(ifr.ifr_hwaddr.sa_data, tun->dev->dev_addr, ETH_ALEN);
> >>>>  		ifr.ifr_hwaddr.sa_family = tun->dev->type;
> >>>>+		rcu_read_unlock();
> >>>>  		if (copy_to_user(argp,&ifr, ifreq_len))
> >>>>  			ret = -EFAULT;
> >>>>-		break;
> >>>>+		goto out;
> >>>>
> >>>>  	case SIOCSIFHWADDR:
> >>>>  		/* Set hw address */
> >>>>@@ -1375,9 +1482,9 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
> >>>>  	}
> >>>>
> >>>>  unlock:
> >>>>+	rcu_read_unlock();
> >>>>+out:
> >>>>  	rtnl_unlock();
> >>>>-	if (tun)
> >>>>-		tun_put(tun);
> >>>>  	return ret;
> >>>>  }
> >>>>
> >>>>@@ -1517,6 +1624,11 @@ out:
> >>>>  	return ret;
> >>>>  }
> >>>>
> >>>>+static void tun_sock_destruct(struct sock *sk)
> >>>>+{
> >>>>+	skb_queue_purge(&sk->sk_receive_queue);
> >>>>+}
> >>>>+
> >>>>  static int tun_chr_open(struct inode *inode, struct file * file)
> >>>>  {
> >>>>  	struct net *net = current->nsproxy->net_ns;
> >>>>@@ -1540,6 +1652,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
> >>>>  	sock_init_data(&tfile->socket,&tfile->sk);
> >>>>
> >>>>  	tfile->sk.sk_write_space = tun_sock_write_space;
> >>>>+	tfile->sk.sk_destruct = tun_sock_destruct;
> >>>>  	tfile->sk.sk_sndbuf = INT_MAX;
> >>>>  	file->private_data = tfile;
> >>>>
> >>>>@@ -1549,31 +1662,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
> >>>>  static int tun_chr_close(struct inode *inode, struct file *file)
> >>>>  {
> >>>>  	struct tun_file *tfile = file->private_data;
> >>>>-	struct tun_struct *tun;
> >>>>-
> >>>>-	tun = __tun_get(tfile);
> >>>>-	if (tun) {
> >>>>-		struct net_device *dev = tun->dev;
> >>>>-
> >>>>-		tun_debug(KERN_INFO, tun, "tun_chr_close\n");
> >>>>-
> >>>>-		__tun_detach(tun);
> >>>>-
> >>>>-		/* If desirable, unregister the netdevice. */
> >>>>-		if (!(tun->flags&   TUN_PERSIST)) {
> >>>>-			rtnl_lock();
> >>>>-			if (dev->reg_state == NETREG_REGISTERED)
> >>>>-				unregister_netdevice(dev);
> >>>>-			rtnl_unlock();
> >>>>-		}
> >>>>
> >>>>-		/* drop the reference that netdevice holds */
> >>>>-		sock_put(&tfile->sk);
> >>>>-
> >>>>-	}
> >>>>-
> >>>>-	/* drop the reference that file holds */
> >>>>-	sock_put(&tfile->sk);
> >>>>+	tun_detach(tfile, true);
> >>>>
> >>>>  	return 0;
> >>>>  }
> >>>>@@ -1700,14 +1790,17 @@ static void tun_cleanup(void)
> >>>>   * holding a reference to the file for as long as the socket is in use. */
> >>>>  struct socket *tun_get_socket(struct file *file)
> >>>>  {
> >>>>-	struct tun_struct *tun;
> >>>>+	struct tun_struct *tun = NULL;
> >>>>  	struct tun_file *tfile = file->private_data;
> >>>>  	if (file->f_op !=&tun_fops)
> >>>>  		return ERR_PTR(-EINVAL);
> >>>>-	tun = tun_get(file);
> >>>>-	if (!tun)
> >>>>+	rcu_read_lock();
> >>>>+	tun = rcu_dereference(tfile->tun);
> >>>>+	if (!tun) {
> >>>>+		rcu_read_unlock();
> >>>>  		return ERR_PTR(-EBADFD);
> >>>>-	tun_put(tun);
> >>>>+	}
> >>>>+	rcu_read_unlock();
> >>>>  	return&tfile->socket;
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(tun_get_socket);
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: avoid tx starvation by SYNACK packets
From: Eric Dumazet @ 2012-06-27  8:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, hans.schillstrom, subramanian.vijay, dave.taht,
	netdev, ncardwell, therbert, mph
In-Reply-To: <1340785275.2028.151.camel@localhost>

On Wed, 2012-06-27 at 10:21 +0200, Jesper Dangaard Brouer wrote:

> It works because we have a spinlock problem in the code... Perhaps, they
> did it, because we have have locking/contention problem, not the other
> way around ;-)  How about fixing the code instead? ;-)))

The socket lock is currently mandatory.

It's really _hard_ to remove it, your attempts added a lot of races.

I want to do it properly, adding needed RCU and array of spinlocks were
appropriate.

Hopefully, its easier than the RCU conversion I did for the lookups of
ESTABLISHED/TIMEWAIT sockets.

^ permalink raw reply

* Re: [PATCH 1/2] net: flexcan: clock-frequency is optional for device tree probe
From: Marc Kleine-Budde @ 2012-06-27  8:47 UTC (permalink / raw)
  To: Shawn Guo; +Cc: David S. Miller, netdev, linux-arm-kernel
In-Reply-To: <1340700563-8386-2-git-send-email-shawn.guo@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

On 06/26/2012 10:49 AM, Shawn Guo wrote:
> The property clock-frequency is optional for device tree probe.  When
> it's absent, the flexcan driver will try to get the frequency from clk
> system by calling clk_get_rate.
> 
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>

As Oliver pointed out, this doesn't go through the net tree.

Marc

> ---
>  .../devicetree/bindings/net/can/fsl-flexcan.txt    |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
> index f31b686..8ff324e 100644
> --- a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
> +++ b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
> @@ -11,6 +11,9 @@ Required properties:
>  
>  - reg : Offset and length of the register set for this device
>  - interrupts : Interrupt tuple for this device
> +
> +Optional properties:
> +
>  - clock-frequency : The oscillator frequency driving the flexcan device
>  
>  Example:


-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: avoid tx starvation by SYNACK packets
From: David Miller @ 2012-06-27  8:48 UTC (permalink / raw)
  To: eric.dumazet
  Cc: hans.schillstrom, subramanian.vijay, dave.taht, netdev, ncardwell,
	therbert, brouer
In-Reply-To: <1340786400.26242.27.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Jun 2012 10:40:00 +0200

> On Wed, 2012-06-27 at 01:22 -0700, David Miller wrote:
>> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
>> Date: Wed, 27 Jun 2012 07:23:03 +0200
>> 
>> > On Tuesday 26 June 2012 19:02:36 Eric Dumazet wrote:
>> >> With David patch using jhash instead of SHA, I reach ~315.000 SYN per
>> >> second.
>> > 
>> > I have similar results from ~170k to ~199k synack/sec.
>> 
>> Eric and Hans, I'm going to add Tested-by: tags for both of you
>> when I commit this patch if you don't mind. :-)
> 
> Well, please send your complete patch before (with IPv6 part) ?

Indeed, I'll do that soon, thanks Eric.

^ permalink raw reply

* sua caixa de correio
From: System Administrator @ 2012-06-27  8:18 UTC (permalink / raw)




ATENÇÃO;

Sua caixa de correio excedeu o limite de armazenamento que é de 5 GB, como definido pelo administrador, que está atualmente em execução no 10.9GB, você pode não ser capaz de enviar ou receber novas mensagens até que você re-validar a sua caixa de correio. Para revalidar sua caixa postal, envie os seguintes dados abaixo:

nome:
Nome de usuário:
senha:
Confirme a Senha:
E-mail:

Se você não conseguir revalidar sua caixa de correio, o correio será desactivado!

obrigado
Administrador do Sistema

^ permalink raw reply

* Re: [PATCH 01/13] netfilter: fix problem with proto register
From: Pablo Neira Ayuso @ 2012-06-27  8:53 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <4FEA6424.90605@cn.fujitsu.com>

On Wed, Jun 27, 2012 at 09:38:44AM +0800, Gao feng wrote:
> 于 2012年06月26日 22:36, Pablo Neira Ayuso 写道:
> > On Tue, Jun 26, 2012 at 11:40:14AM +0800, Gao feng wrote:
> >> Hi Pablo:
> >>
> >> 于 2012年06月25日 19:12, Pablo Neira Ayuso 写道:
> >>> On Thu, Jun 21, 2012 at 10:36:38PM +0800, Gao feng wrote:
> >>>> before commit 2c352f444ccfa966a1aa4fd8e9ee29381c467448
> >>>> (netfilter: nf_conntrack: prepare namespace support for
> >>>> l4 protocol trackers), we register sysctl before register
> >>>> protos, so if sysctl is registered faild, the protos will
> >>>> not be registered.
> >>>>
> >>>> but now, we register protos first, and when register
> >>>> sysctl failed, we can use protos too, it's different
> >>>> from before.
> >>>
> >>> No, this has to be an all-or-nothing game. If one fails, everything
> >>> else that you've registered has to be unregistered.
> >>
> >> indeed,this is an all-or-nothing game right now,please look at the ipv4_net_init,
> >> when we register nf_conntrack_l3proto_ipv4 failed,we will unregister the already
> >> registered l4protoes, and in nf_conntrack_l4proto_unregister,we will call
> >> nf_ct_l4proto_unregister_sysctl to free the sysctl table.
> > 
> > I see proto->init_net allocates in->ctl_table, then
> > nf_ct_l3proto_register_sysctl release it if it fails. I got confused
> > because I did not see where that memory was being freed. Then, it's
> > good.
> > 
> > Still one more thing:
> > 
> >>>> so change to register sysctl before register protos.
> >>>>
> >>>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> >>>> ---
> >>>>  net/netfilter/nf_conntrack_proto.c |   36 +++++++++++++++++++++++-------------
> >>>>  1 files changed, 23 insertions(+), 13 deletions(-)
> >>>>
> >>>> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
> >>>> index 1ea9194..9bd88aa 100644
> >>>> --- a/net/netfilter/nf_conntrack_proto.c
> >>>> +++ b/net/netfilter/nf_conntrack_proto.c
> >>>> @@ -253,18 +253,23 @@ int nf_conntrack_l3proto_register(struct net *net,
> >>>>  {
> >>>>  	int ret = 0;
> >>>>  
> >>>> -	if (net == &init_net)
> >>>> -		ret = nf_conntrack_l3proto_register_net(proto);
> >>>> +	if (proto->init_net) {
> > 
> > I think proto->init_net has to be mandatory since all protocol support
> > pernet already. We can add BUG_ON at the beginning of the function if
> > proto->init_net is not defined.
> > 
> 
> we can add BUG_ON at nf_conntrack_l4proto_register,because all of the l4protoes
> have the init_net function.
> 
> BUT nf_conntrack_l3proto_ipv6 doesn't have init_net function,because this proto
> doesn't have pernet data, and nf_conntrack_l3proto_ipv4 has pernet data only when
> CONFIG_NF_CONNTRACK_PROC_COMPAT is configured.

OK, thanks for the clarification.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] net: flexcan: clock-frequency is optional for device tree probe
From: Shawn Guo @ 2012-06-27  8:56 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: netdev, David S. Miller, linux-arm-kernel
In-Reply-To: <4FEAC89C.3000405@pengutronix.de>

On Wed, Jun 27, 2012 at 10:47:24AM +0200, Marc Kleine-Budde wrote:
> On 06/26/2012 10:49 AM, Shawn Guo wrote:
> > The property clock-frequency is optional for device tree probe.  When
> > it's absent, the flexcan driver will try to get the frequency from clk
> > system by calling clk_get_rate.
> > 
> > Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>

Thanks.

> 
> As Oliver pointed out, this doesn't go through the net tree.
> 
>From what I have seen, device tree maintainers are generally fine with
having binding document updates go through subsystem tree.

-- 
Regards,
Shawn

^ permalink raw reply

* Re: [PATCH 1/2] net: flexcan: clock-frequency is optional for device tree probe
From: Marc Kleine-Budde @ 2012-06-27  8:59 UTC (permalink / raw)
  To: Shawn Guo; +Cc: David S. Miller, netdev, linux-arm-kernel
In-Reply-To: <20120627085600.GB9787@S2101-09.ap.freescale.net>

[-- Attachment #1: Type: text/plain, Size: 1005 bytes --]

On 06/27/2012 10:56 AM, Shawn Guo wrote:
> On Wed, Jun 27, 2012 at 10:47:24AM +0200, Marc Kleine-Budde wrote:
>> On 06/26/2012 10:49 AM, Shawn Guo wrote:
>>> The property clock-frequency is optional for device tree probe.  When
>>> it's absent, the flexcan driver will try to get the frequency from clk
>>> system by calling clk_get_rate.
>>>
>>> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
>> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
> 
> Thanks.
> 
>>
>> As Oliver pointed out, this doesn't go through the net tree.
>>
> From what I have seen, device tree maintainers are generally fine with
> having binding document updates go through subsystem tree.

Okay. Then I'll take that patch.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* [patch -resend] 9p: fix min_t() casting in p9pdu_vwritef()
From: Dan Carpenter @ 2012-06-27  9:01 UTC (permalink / raw)
  To: Eric Van Hensbergen
  Cc: David S. Miller, Aneesh Kumar K.V, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <20120627085800.GA3007@mwanda>

I don't think we're actually likely to hit this limit but if we do
then the comparison should be done as size_t.  The original code
is equivalent to:
        len = strlen(sptr) % USHRT_MAX;

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
I was told this patch "has already made it upstream via the v9fs pull."
but it must have been dropped accidentally.  Originally sent on Sat,
Jan 15, 2011.

diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 9ee48cb..3d33ecf 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -368,7 +368,7 @@ p9pdu_vwritef(struct p9_fcall *pdu, int proto_version, const char *fmt,
 				const char *sptr = va_arg(ap, const char *);
 				uint16_t len = 0;
 				if (sptr)
-					len = min_t(uint16_t, strlen(sptr),
+					len = min_t(size_t, strlen(sptr),
 								USHRT_MAX);
 
 				errcode = p9pdu_writef(pdu, proto_version,

^ permalink raw reply related

* Re: [PATCH 04/13] netfilter: regard users as refcount for l4proto's per-net data
From: Pablo Neira Ayuso @ 2012-06-27  9:05 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, netfilter-devel
In-Reply-To: <4FEA6309.5060305@cn.fujitsu.com>

On Wed, Jun 27, 2012 at 09:34:01AM +0800, Gao feng wrote:
[...]
> >>> I think that this change is similar to patch 1/1, I think you should
> >>> send it as a separated patch.
> >>>
> >>
> >> Yes, It looks better.
> >> should I change this and rebase whole patchset or
> >> maybe you just apply this patchset and then I send a cleanup patch to do this?
> > 
> > This patch includes changes that are not included in the description,
> > so you have two choices:
> > 
> > 1) You resend me this patch with appropriate description (including
> > the fact that you're fixing the same thing that patch 1/1 does). This
> > option still I don't like too much, since making two different things
> > in one single patch is nasty, but well if you push me...
> 
> Sorry, I don't know which the same thing I fixed in this patch and 1/13 patch.
> the 1/13 patch only change the proto's registration order. and this patch doesn't
> change the registration order.
> 
> This patch is used to try to make variable "users" clear.

Never mind, I'll take the patchset. Thanks Gao.

^ permalink raw reply

* RE: [PATCH] net: added support for 40GbE link.
From: Parav.Pandit @ 2012-06-27  9:08 UTC (permalink / raw)
  To: bhutchings, Parav.Pandit; +Cc: davem, netdev
In-Reply-To: <1340115094.2692.10.camel@bwh-desktop.uk.solarflarecom.com>

o.k. I am sending PATCH v1 with suggested fixes in short while for ethtool and kernel both.

Parav

> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Tuesday, June 19, 2012 7:42 PM
> To: Pandit, Parav
> Cc: davem@davemloft.net; netdev@vger.kernel.org
> Subject: RE: [PATCH] net: added support for 40GbE link.
> 
> On Tue, 2012-06-19 at 07:42 +0000, Parav.Pandit@Emulex.Com wrote:
> > > -----Original Message-----
> > > From: David Miller [mailto:davem@davemloft.net]
> > > Sent: Tuesday, June 19, 2012 1:05 PM
> > > To: Pandit, Parav
> > > Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
> > > Subject: Re: [PATCH] net: added support for 40GbE link.
> > >
> > > From: <Parav.Pandit@Emulex.Com>
> > > Date: Tue, 19 Jun 2012 07:33:12 +0000
> > >
> > > > Should eventually all net driver should remove using SPEED_xxxxxx
> > > > and
> > > start using hard coded value of 10, 100, 1000, 20000?
> > >
> > > No, the ones that exist can stay, just no new ones.
> > >
> > So driver which supports 40Gpbs, 100Gbps should hardcode to 40000,
> 100000 respectively?
> 
> Right.
> 
> > > > That means ethtool_cmd_speed() should not be called in this function?
> > >
> > > Ben said that it must be called, what are you talking about?
> >
> > Sorry, I wanted to ask - Do you need switch case for speed like below new
> code or its should be speed independent code?
> >                 switch (ethtool_cmd_speed()) {
> >                 case SPEED_100:
> >                 case SPEED_10:
> >                         return DEFAULT_PRB_RETIRE_TOV;
> >                 default:
> >                         msec = 1;
> >                         div = ethtool_cmd_speed() / 1000;
> >                         break;
> >                 /*
> >                 }
> 
> I was thinking of something like:
> 
> 		u64 speed = ethtool_cmd_speed(&ecmd);
> 		if (speed < 1000 || speed == SPEED_UNKNOWN)
> 			return DEFAULT_PRB_RETIRE_TOV;
> 		msec = 1;
> 		div = speed / 1000;
> 
> Ben.
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer;
> that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH net-next] ipv4: tcp: dont cache unconfirmed intput dst
From: Eric Dumazet @ 2012-06-27  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Hans Schillstrom

From: Eric Dumazet <edumazet@google.com>

DDOS synflood attacks hit badly IP route cache.

On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.

rt_garbage_collect() triggers and tries to cleanup things.

Eventually route cache is disabled but machine is under fire and might
OOM and crash.

This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.

This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)

SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/protocol.h |    2 +-
 include/net/route.h    |    8 ++++----
 include/net/tcp.h      |    2 +-
 net/ipv4/arp.c         |    2 +-
 net/ipv4/ip_fragment.c |    2 +-
 net/ipv4/ip_input.c    |    5 +++--
 net/ipv4/route.c       |    8 +++++---
 net/ipv4/tcp_ipv4.c    |    4 +++-
 net/ipv4/xfrm4_input.c |    2 +-
 9 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index 967b926..7cfc8f7 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -37,7 +37,7 @@
 
 /* This is used to register protocols. */
 struct net_protocol {
-	int			(*early_demux)(struct sk_buff *skb);
+	int			(*early_demux)(struct sk_buff *skb, bool *nocache);
 	int			(*handler)(struct sk_buff *skb);
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			(*gso_send_check)(struct sk_buff *skb);
diff --git a/include/net/route.h b/include/net/route.h
index 47eb25a..6361f93 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -201,18 +201,18 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
 }
 
 extern int ip_route_input_common(struct sk_buff *skb, __be32 dst, __be32 src,
-				 u8 tos, struct net_device *devin, bool noref);
+				 u8 tos, struct net_device *devin, bool noref, bool nocache);
 
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 				 u8 tos, struct net_device *devin)
 {
-	return ip_route_input_common(skb, dst, src, tos, devin, false);
+	return ip_route_input_common(skb, dst, src, tos, devin, false, false);
 }
 
 static inline int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
-				       u8 tos, struct net_device *devin)
+				       u8 tos, struct net_device *devin, bool nocache)
 {
-	return ip_route_input_common(skb, dst, src, tos, devin, true);
+	return ip_route_input_common(skb, dst, src, tos, devin, true, nocache);
 }
 
 extern void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6660ffc..917ed2e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -325,7 +325,7 @@ extern void tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void tcp_shutdown (struct sock *sk, int how);
 
-extern int tcp_v4_early_demux(struct sk_buff *skb);
+extern int tcp_v4_early_demux(struct sk_buff *skb, bool *nocache);
 extern int tcp_v4_rcv(struct sk_buff *skb);
 
 extern struct inet_peer *tcp_v4_get_peer(struct sock *sk);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 2e560f0..6a97959 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -828,7 +828,7 @@ static int arp_process(struct sk_buff *skb)
 	}
 
 	if (arp->ar_op == htons(ARPOP_REQUEST) &&
-	    ip_route_input_noref(skb, tip, sip, 0, dev) == 0) {
+	    ip_route_input_noref(skb, tip, sip, 0, dev, false) == 0) {
 
 		rt = skb_rtable(skb);
 		addr_type = rt->rt_type;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 8d07c97..978d55f 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -259,7 +259,7 @@ static void ip_expire(unsigned long arg)
 		skb_dst_drop(head);
 		iph = ip_hdr(head);
 		err = ip_route_input_noref(head, iph->daddr, iph->saddr,
-					   iph->tos, head->dev);
+					   iph->tos, head->dev, false);
 		if (err)
 			goto out_rcu_unlock;
 
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 2a39204..7be54c8 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -326,6 +326,7 @@ static int ip_rcv_finish(struct sk_buff *skb)
 	 */
 	if (skb_dst(skb) == NULL) {
 		int err = -ENOENT;
+		bool nocache = false;
 
 		if (sysctl_ip_early_demux) {
 			const struct net_protocol *ipprot;
@@ -334,13 +335,13 @@ static int ip_rcv_finish(struct sk_buff *skb)
 			rcu_read_lock();
 			ipprot = rcu_dereference(inet_protos[protocol]);
 			if (ipprot && ipprot->early_demux)
-				err = ipprot->early_demux(skb);
+				err = ipprot->early_demux(skb, &nocache);
 			rcu_read_unlock();
 		}
 
 		if (err) {
 			err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-						   iph->tos, skb->dev);
+						   iph->tos, skb->dev, nocache);
 			if (unlikely(err)) {
 				if (err == -EXDEV)
 					NET_INC_STATS_BH(dev_net(skb->dev),
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 81533e3..fdc7900 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2214,7 +2214,7 @@ static int ip_mkroute_input(struct sk_buff *skb,
  */
 
 static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-			       u8 tos, struct net_device *dev)
+			       u8 tos, struct net_device *dev, bool nocache)
 {
 	struct fib_result res;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
@@ -2353,6 +2353,8 @@ local_input:
 		rth->dst.error= -err;
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
+	if (nocache)
+		rth->dst.flags |= DST_NOCACHE;
 	hash = rt_hash(daddr, saddr, fl4.flowi4_iif, rt_genid(net));
 	rth = rt_intern_hash(hash, rth, skb, fl4.flowi4_iif);
 	err = 0;
@@ -2395,7 +2397,7 @@ martian_source_keep_err:
 }
 
 int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-			   u8 tos, struct net_device *dev, bool noref)
+			   u8 tos, struct net_device *dev, bool noref, bool nocache)
 {
 	struct rtable	*rth;
 	unsigned int	hash;
@@ -2471,7 +2473,7 @@ skip_cache:
 		rcu_read_unlock();
 		return -EINVAL;
 	}
-	res = ip_route_input_slow(skb, daddr, saddr, tos, dev);
+	res = ip_route_input_slow(skb, daddr, saddr, tos, dev, nocache);
 	rcu_read_unlock();
 	return res;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1781dc6..33aabd4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1673,7 +1673,7 @@ csum_err:
 }
 EXPORT_SYMBOL(tcp_v4_do_rcv);
 
-int tcp_v4_early_demux(struct sk_buff *skb)
+int tcp_v4_early_demux(struct sk_buff *skb, bool *no_dst_cache)
 {
 	struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph;
@@ -1719,6 +1719,8 @@ int tcp_v4_early_demux(struct sk_buff *skb)
 				}
 			}
 		}
+	} else {
+		*no_dst_cache = true;
 	}
 
 out_err:
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 06814b6..eee636b 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -28,7 +28,7 @@ static inline int xfrm4_rcv_encap_finish(struct sk_buff *skb)
 		const struct iphdr *iph = ip_hdr(skb);
 
 		if (ip_route_input_noref(skb, iph->daddr, iph->saddr,
-					 iph->tos, skb->dev))
+					 iph->tos, skb->dev, false))
 			goto drop;
 	}
 	return dst_input(skb);

^ permalink raw reply related

* Re: [PATCH v2 net-next] tcp: avoid tx starvation by SYNACK packets
From: Jesper Dangaard Brouer @ 2012-06-27  9:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, hans.schillstrom, subramanian.vijay, dave.taht,
	netdev, ncardwell, therbert, mph
In-Reply-To: <1340786710.26242.37.camel@edumazet-glaptop>

On Wed, 2012-06-27 at 10:45 +0200, Eric Dumazet wrote:
> On Wed, 2012-06-27 at 10:21 +0200, Jesper Dangaard Brouer wrote:
> 
> > It works because we have a spinlock problem in the code... Perhaps, they
> > did it, because we have have locking/contention problem, not the other
> > way around ;-)  How about fixing the code instead? ;-)))
> 
> The socket lock is currently mandatory.
>
> It's really _hard_ to remove it, your attempts added a lot of races.

Yes, its really hard to remove completely.  That's why I choose _only_
to handle the SYN cookie "overload" case, and leave the rest locked, and
I also introduced extra locking in the latest patches.  I know it was
not perfect, hence the RFC tag, but I hope I didn't add that many races.


> I want to do it properly, adding needed RCU and array of spinlocks were
> appropriate.

I really appreciate that you will attempt to fix this properly.  Like a
real network ninja ;-).

^ permalink raw reply

* pull-request: can 2012-06-27
From: Marc Kleine-Budde @ 2012-06-27  9:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can

Hello David,

here's a patch intended for v3.5, targeting net/master. Hui Wang has
found and fixed an endianness problem in the device tree handling in
the flexcan driver.

regards, Marc

---

The following changes since commit 6bc96d047fe32d76ef79f3195c52a542edf7c705:

  xen/netfront: teardown the device before unregistering it. (2012-06-27 01:25:41 -0700)

are available in the git repository at:

  git://gitorious.org/linux-can/linux-can.git for-davem

for you to fetch changes up to 85f2f834e85517307f13e30e630a5fc86f757cb5:

  can: flexcan: use be32_to_cpup to handle the value of dt entry (2012-06-27 11:12:07 +0200)

----------------------------------------------------------------
Hui Wang (1):
      can: flexcan: use be32_to_cpup to handle the value of dt entry

 drivers/net/can/flexcan.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

^ permalink raw reply

* [PATCH] can: flexcan: use be32_to_cpup to handle the value of dt entry
From: Marc Kleine-Budde @ 2012-06-27  9:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-can, Hui Wang, Shawn Guo, Marc Kleine-Budde
In-Reply-To: <1340789236-28266-1-git-send-email-mkl@pengutronix.de>

From: Hui Wang <jason77.wang@gmail.com>

The freescale arm i.MX series platform can support this driver, and
usually the arm cpu works in the little endian mode by default, while
device tree entry value is stored in big endian format, we should use
be32_to_cpup() to handle them, after modification, it can work well
both on the le cpu and be cpu.

Cc: stable <stable@vger.kernel.org> # v3.2+
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/flexcan.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 38c0690..81d4741 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -939,12 +939,12 @@ static int __devinit flexcan_probe(struct platform_device *pdev)
 		return PTR_ERR(pinctrl);
 
 	if (pdev->dev.of_node) {
-		const u32 *clock_freq_p;
+		const __be32 *clock_freq_p;
 
 		clock_freq_p = of_get_property(pdev->dev.of_node,
 						"clock-frequency", NULL);
 		if (clock_freq_p)
-			clock_freq = *clock_freq_p;
+			clock_freq = be32_to_cpup(clock_freq_p);
 	}
 
 	if (!clock_freq) {
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH net-next] em_canid: Ematch rule to match CAN frames according to their CAN IDs
From: Kurt Van Dijck @ 2012-06-27  9:33 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Rostislav Lisovy, Eric Dumazet, netdev, linux-can, lartc, pisa,
	sojkam1
In-Reply-To: <4FEA169C.1070709@hartkopp.net>

Oliver, Rostislav,

I was just looking into this. I think the matching itself
may be simplified by eliminating checks 'that have already been made'.

On Tue, Jun 26, 2012 at 10:07:56PM +0200, Oliver Hartkopp wrote:
> 
> > +static int em_canid_match(struct sk_buff *skb, struct tcf_ematch *m,
> > +			 struct tcf_pkt_info *info)
> > +{
> > +	struct canid_match *cm = em_canid_priv(m);
> > +	canid_t can_id;
> > +	unsigned int match = false;
> > +	int i;
> > +
> > +	can_id = em_canid_get_id(skb);
> > +
> > +	if (can_id & CAN_EFF_FLAG) {
> > +		can_id &= CAN_EFF_MASK;
Why clear the CAN_EFF_FLAG?
It needs an extra read-modify-write, and the CAN_EFF_FLAG is set in
the match rule too.
IMHO, you could leave can_id as it is.
> > +
> > +		for (i = 0; i < cm->eff_rules_count; i++) {

to speed things up manually, I tend to use a 'const struct can_filter *lp'
and do:
		for (i = 0, lp = cm->rules_raw; i < cm->eff_rules_count;
				++i, ++lp) {
The advantage depends on the compiler's optimizations, which I'm not really
aware of.
The next statement would then be:

			if (!((lp->can_id ^ can_id) & lp->can_mask)) {

for stripping & CAN_EFF_MASK, see below :-)


> > +			if (!(((cm->rules_raw[i].can_id ^ can_id) &
> > +			    cm->rules_raw[i].can_mask) & CAN_EFF_MASK)) {
> 
> 
> Looks really tricky. I'm still pondering if it does what it should do ...

I think it does, since:
	cm->rules_raw[i].can_id ^ can_id
gives an value where the bits that differ are set.
then:
	& cm->rules_raw[i].can_mask
will strip bits that you don't care.

But '& CAN_EFF_MASK' is not needed, since:
	* can_id will have CAN_EFF_FLAG (see comment above)
	* cm->rules_raw[i].can_id has CAN_EFF_FLAG, otherwise it would
	  not be in the list
	* can_id will not have CAN_ERR_FLAG
	* cm->rules_raw should not have CAN_ERR_FLAG
	  you can always clear CAN_ERR_FLAG from the rules during
	  em_canid_change below
	* maybe distinguishing on CAN_RTR_FLAG makes sense during
	  classification.
> 
> > +				match = true;
> > +				break;
> > +			}
> > +		}
> > +	} else { /* SFF */
> > +		can_id &= CAN_SFF_MASK;
> > +		match = test_bit(can_id, cm->match_sff);
> > +	}
> > +
> 
> 
> return match;
> 
> 
> > +}
> > +

^ permalink raw reply

* [PATCH 0/7] Get rid of RTA*() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev

This patchset gets rid of all the RTA_PUT() and RTA_GET()
macros and makes use of the type-safe netlink API variants
where applicable.

I did my best to test these patches but I do not own any
decnet hardware so the decnet part is compile tested only.

Thomas Graf (7):
  unix_diag: Do not use RTA_PUT() macros
  sock_diag: Do not use RTA_PUT() macros
  inet_diag: Do not use RTA_PUT() macros
  ipmr: Do not use RTA_PUT() macros
  ip6mr: Do not use RTA_PUT() macros
  decnet: Do not use RTA_PUT() macros
  netlink: Get rid of obsolete rtnetlink macros

 include/linux/rtnetlink.h |  129 ---------------------------------------------
 net/core/rtnetlink.c      |   13 -----
 net/core/sock_diag.c      |   12 +---
 net/decnet/dn_fib.c       |    8 +++
 net/decnet/dn_route.c     |   61 +++++++++++++--------
 net/decnet/dn_table.c     |   69 ++++++++++++++----------
 net/ipv4/inet_diag.c      |  112 ++++++++++++++++++--------------------
 net/ipv4/ipmr.c           |   28 +++++-----
 net/ipv6/ip6mr.c          |    5 +-
 net/unix/diag.c           |   80 ++++++++++++----------------
 10 files changed, 193 insertions(+), 324 deletions(-)

-- 
1.7.7.6

^ permalink raw reply

* [PATCH 1/7] unix_diag: Do not use RTA_PUT() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Also, no need to trim on nlmsg_put() failure, nothing has been added
yet.  We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/unix/diag.c |   80 ++++++++++++++++++++++--------------------------------
 1 files changed, 33 insertions(+), 47 deletions(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index 977ca31..a74864e 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -8,40 +8,31 @@
 #include <net/af_unix.h>
 #include <net/tcp_states.h>
 
-#define UNIX_DIAG_PUT(skb, attrtype, attrlen) \
-	RTA_DATA(__RTA_PUT(skb, attrtype, attrlen))
-
 static int sk_diag_dump_name(struct sock *sk, struct sk_buff *nlskb)
 {
 	struct unix_address *addr = unix_sk(sk)->addr;
-	char *s;
-
-	if (addr) {
-		s = UNIX_DIAG_PUT(nlskb, UNIX_DIAG_NAME, addr->len - sizeof(short));
-		memcpy(s, addr->name->sun_path, addr->len - sizeof(short));
-	}
 
-	return 0;
+	if (!addr)
+		return 0;
 
-rtattr_failure:
-	return -EMSGSIZE;
+	return nla_put(nlskb, UNIX_DIAG_NAME, addr->len - sizeof(short),
+		       addr->name->sun_path);
 }
 
 static int sk_diag_dump_vfs(struct sock *sk, struct sk_buff *nlskb)
 {
 	struct dentry *dentry = unix_sk(sk)->path.dentry;
-	struct unix_diag_vfs *uv;
 
 	if (dentry) {
-		uv = UNIX_DIAG_PUT(nlskb, UNIX_DIAG_VFS, sizeof(*uv));
-		uv->udiag_vfs_ino = dentry->d_inode->i_ino;
-		uv->udiag_vfs_dev = dentry->d_sb->s_dev;
+		struct unix_diag_vfs uv = {
+			.udiag_vfs_ino = dentry->d_inode->i_ino,
+			.udiag_vfs_dev = dentry->d_sb->s_dev,
+		};
+
+		return nla_put(nlskb, UNIX_DIAG_VFS, sizeof(uv), &uv);
 	}
 
 	return 0;
-
-rtattr_failure:
-	return -EMSGSIZE;
 }
 
 static int sk_diag_dump_peer(struct sock *sk, struct sk_buff *nlskb)
@@ -56,24 +47,28 @@ static int sk_diag_dump_peer(struct sock *sk, struct sk_buff *nlskb)
 		unix_state_unlock(peer);
 		sock_put(peer);
 
-		RTA_PUT_U32(nlskb, UNIX_DIAG_PEER, ino);
+		return nla_put_u32(nlskb, UNIX_DIAG_PEER, ino);
 	}
 
 	return 0;
-rtattr_failure:
-	return -EMSGSIZE;
 }
 
 static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb)
 {
 	struct sk_buff *skb;
+	struct nlattr *attr;
 	u32 *buf;
 	int i;
 
 	if (sk->sk_state == TCP_LISTEN) {
 		spin_lock(&sk->sk_receive_queue.lock);
-		buf = UNIX_DIAG_PUT(nlskb, UNIX_DIAG_ICONS,
-				sk->sk_receive_queue.qlen * sizeof(u32));
+
+		attr = nla_reserve(nlskb, UNIX_DIAG_ICONS,
+				   sk->sk_receive_queue.qlen * sizeof(u32));
+		if (!attr)
+			goto errout;
+
+		buf = nla_data(attr);
 		i = 0;
 		skb_queue_walk(&sk->sk_receive_queue, skb) {
 			struct sock *req, *peer;
@@ -94,45 +89,38 @@ static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb)
 
 	return 0;
 
-rtattr_failure:
+errout:
 	spin_unlock(&sk->sk_receive_queue.lock);
 	return -EMSGSIZE;
 }
 
 static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb)
 {
-	struct unix_diag_rqlen *rql;
-
-	rql = UNIX_DIAG_PUT(nlskb, UNIX_DIAG_RQLEN, sizeof(*rql));
+	struct unix_diag_rqlen rql;
 
 	if (sk->sk_state == TCP_LISTEN) {
-		rql->udiag_rqueue = sk->sk_receive_queue.qlen;
-		rql->udiag_wqueue = sk->sk_max_ack_backlog;
+		rql.udiag_rqueue = sk->sk_receive_queue.qlen;
+		rql.udiag_wqueue = sk->sk_max_ack_backlog;
 	} else {
-		rql->udiag_rqueue = (__u32)unix_inq_len(sk);
-		rql->udiag_wqueue = (__u32)unix_outq_len(sk);
+		rql.udiag_rqueue = (u32) unix_inq_len(sk);
+		rql.udiag_wqueue = (u32) unix_outq_len(sk);
 	}
 
-	return 0;
-
-rtattr_failure:
-	return -EMSGSIZE;
+	return nla_put(nlskb, UNIX_DIAG_RQLEN, sizeof(rql), &rql);
 }
 
 static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_req *req,
 		u32 pid, u32 seq, u32 flags, int sk_ino)
 {
-	unsigned char *b = skb_tail_pointer(skb);
 	struct nlmsghdr *nlh;
 	struct unix_diag_msg *rep;
 
-	nlh = nlmsg_put(skb, pid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*rep), 0);
+	nlh = nlmsg_put(skb, pid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*rep),
+			flags);
 	if (!nlh)
-		goto out_nlmsg_trim;
-	nlh->nlmsg_flags = flags;
+		return -EMSGSIZE;
 
 	rep = nlmsg_data(nlh);
-
 	rep->udiag_family = AF_UNIX;
 	rep->udiag_type = sk->sk_type;
 	rep->udiag_state = sk->sk_state;
@@ -163,11 +151,10 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r
 	    sock_diag_put_meminfo(sk, skb, UNIX_DIAG_MEMINFO))
 		goto out_nlmsg_trim;
 
-	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
-	return skb->len;
+	return nlmsg_end(skb, nlh);
 
 out_nlmsg_trim:
-	nlmsg_trim(skb, b);
+	nlmsg_cancel(skb, nlh);
 	return -EMSGSIZE;
 }
 
@@ -272,15 +259,14 @@ static int unix_diag_get_exact(struct sk_buff *in_skb,
 	extra_len = 256;
 again:
 	err = -ENOMEM;
-	rep = alloc_skb(NLMSG_SPACE((sizeof(struct unix_diag_msg) + extra_len)),
-			GFP_KERNEL);
+	rep = nlmsg_new(sizeof(struct unix_diag_msg) + extra_len, GFP_KERNEL);
 	if (!rep)
 		goto out;
 
 	err = sk_diag_fill(sk, rep, req, NETLINK_CB(in_skb).pid,
 			   nlh->nlmsg_seq, 0, req->udiag_ino);
 	if (err < 0) {
-		kfree_skb(rep);
+		nlmsg_free(rep);
 		extra_len += 256;
 		if (extra_len >= PAGE_SIZE)
 			goto out;
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/7] sock_diag: Do not use RTA_PUT() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/core/sock_diag.c |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 0d934ce..ff2967a 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -4,7 +4,6 @@
 #include <net/netlink.h>
 #include <net/net_namespace.h>
 #include <linux/module.h>
-#include <linux/rtnetlink.h>
 #include <net/sock.h>
 
 #include <linux/inet_diag.h>
@@ -35,9 +34,7 @@ EXPORT_SYMBOL_GPL(sock_diag_save_cookie);
 
 int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attrtype)
 {
-	__u32 *mem;
-
-	mem = RTA_DATA(__RTA_PUT(skb, attrtype, SK_MEMINFO_VARS * sizeof(__u32)));
+	u32 mem[SK_MEMINFO_VARS];
 
 	mem[SK_MEMINFO_RMEM_ALLOC] = sk_rmem_alloc_get(sk);
 	mem[SK_MEMINFO_RCVBUF] = sk->sk_rcvbuf;
@@ -48,10 +45,7 @@ int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attrtype)
 	mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc);
 	mem[SK_MEMINFO_BACKLOG] = sk->sk_backlog.len;
 
-	return 0;
-
-rtattr_failure:
-	return -EMSGSIZE;
+	return nla_put(skb, attrtype, sizeof(mem), &mem);
 }
 EXPORT_SYMBOL_GPL(sock_diag_put_meminfo);
 
@@ -121,7 +115,7 @@ static inline void sock_diag_unlock_handler(const struct sock_diag_handler *h)
 static int __sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	int err;
-	struct sock_diag_req *req = NLMSG_DATA(nlh);
+	struct sock_diag_req *req = nlmsg_data(nlh);
 	const struct sock_diag_handler *hndl;
 
 	if (nlmsg_len(nlh) < sizeof(*req))
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 3/7] inet_diag: Do not use RTA_PUT() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Also, no need to trim on nlmsg_put() failure, nothing has been added
yet.  We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/ipv4/inet_diag.c |  112 +++++++++++++++++++++++--------------------------
 1 files changed, 53 insertions(+), 59 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 27640e7..38064a2 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -46,9 +46,6 @@ struct inet_diag_entry {
 	u16 userlocks;
 };
 
-#define INET_DIAG_PUT(skb, attrtype, attrlen) \
-	RTA_DATA(__RTA_PUT(skb, attrtype, attrlen))
-
 static DEFINE_MUTEX(inet_diag_table_mutex);
 
 static const struct inet_diag_handler *inet_diag_lock_handler(int proto)
@@ -78,28 +75,22 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 	const struct inet_sock *inet = inet_sk(sk);
 	struct inet_diag_msg *r;
 	struct nlmsghdr  *nlh;
+	struct nlattr *attr;
 	void *info = NULL;
-	struct inet_diag_meminfo  *minfo = NULL;
-	unsigned char	 *b = skb_tail_pointer(skb);
 	const struct inet_diag_handler *handler;
 	int ext = req->idiag_ext;
 
 	handler = inet_diag_table[req->sdiag_protocol];
 	BUG_ON(handler == NULL);
 
-	nlh = nlmsg_put(skb, pid, seq, unlh->nlmsg_type, sizeof(*r), 0);
-	if (!nlh) {
-		nlmsg_trim(skb, b);
+	nlh = nlmsg_put(skb, pid, seq, unlh->nlmsg_type, sizeof(*r),
+			nlmsg_flags);
+	if (!nlh)
 		return -EMSGSIZE;
-	}
-	nlh->nlmsg_flags = nlmsg_flags;
 
 	r = nlmsg_data(nlh);
 	BUG_ON(sk->sk_state == TCP_TIME_WAIT);
 
-	if (ext & (1 << (INET_DIAG_MEMINFO - 1)))
-		minfo = INET_DIAG_PUT(skb, INET_DIAG_MEMINFO, sizeof(*minfo));
-
 	r->idiag_family = sk->sk_family;
 	r->idiag_state = sk->sk_state;
 	r->idiag_timer = 0;
@@ -117,7 +108,8 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 	 * hence this needs to be included regardless of socket family.
 	 */
 	if (ext & (1 << (INET_DIAG_TOS - 1)))
-		RTA_PUT_U8(skb, INET_DIAG_TOS, inet->tos);
+		if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
+			goto errout;
 
 #if IS_ENABLED(CONFIG_IPV6)
 	if (r->idiag_family == AF_INET6) {
@@ -125,24 +117,31 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 
 		*(struct in6_addr *)r->id.idiag_src = np->rcv_saddr;
 		*(struct in6_addr *)r->id.idiag_dst = np->daddr;
+
 		if (ext & (1 << (INET_DIAG_TCLASS - 1)))
-			RTA_PUT_U8(skb, INET_DIAG_TCLASS, np->tclass);
+			if (nla_put_u8(skb, INET_DIAG_TCLASS, np->tclass) < 0)
+				goto errout;
 	}
 #endif
 
 	r->idiag_uid = sock_i_uid(sk);
 	r->idiag_inode = sock_i_ino(sk);
 
-	if (minfo) {
-		minfo->idiag_rmem = sk_rmem_alloc_get(sk);
-		minfo->idiag_wmem = sk->sk_wmem_queued;
-		minfo->idiag_fmem = sk->sk_forward_alloc;
-		minfo->idiag_tmem = sk_wmem_alloc_get(sk);
+	if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
+		struct inet_diag_meminfo minfo = {
+			.idiag_rmem = sk_rmem_alloc_get(sk),
+			.idiag_wmem = sk->sk_wmem_queued,
+			.idiag_fmem = sk->sk_forward_alloc,
+			.idiag_tmem = sk_wmem_alloc_get(sk),
+		};
+
+		if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
+			goto errout;
 	}
 
 	if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
 		if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
-			goto rtattr_failure;
+			goto errout;
 
 	if (icsk == NULL) {
 		handler->idiag_get_info(sk, r, NULL);
@@ -169,16 +168,20 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 	}
 #undef EXPIRES_IN_MS
 
-	if (ext & (1 << (INET_DIAG_INFO - 1)))
-		info = INET_DIAG_PUT(skb, INET_DIAG_INFO, sizeof(struct tcp_info));
+	if (ext & (1 << (INET_DIAG_INFO - 1))) {
+		attr = nla_reserve(skb, INET_DIAG_INFO,
+				   sizeof(struct tcp_info));
+		if (!attr)
+			goto errout;
 
-	if ((ext & (1 << (INET_DIAG_CONG - 1))) && icsk->icsk_ca_ops) {
-		const size_t len = strlen(icsk->icsk_ca_ops->name);
-
-		strcpy(INET_DIAG_PUT(skb, INET_DIAG_CONG, len + 1),
-		       icsk->icsk_ca_ops->name);
+		info = nla_data(attr);
 	}
 
+	if ((ext & (1 << (INET_DIAG_CONG - 1))) && icsk->icsk_ca_ops)
+		if (nla_put_string(skb, INET_DIAG_CONG,
+				   icsk->icsk_ca_ops->name) < 0)
+			goto errout;
+
 	handler->idiag_get_info(sk, r, info);
 
 	if (sk->sk_state < TCP_TIME_WAIT &&
@@ -186,11 +189,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 		icsk->icsk_ca_ops->get_info(sk, ext, skb);
 
 out:
-	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
-	return skb->len;
+	return nlmsg_end(skb, nlh);
 
-rtattr_failure:
-	nlmsg_trim(skb, b);
+errout:
+	nlmsg_cancel(skb, nlh);
 	return -EMSGSIZE;
 }
 EXPORT_SYMBOL_GPL(inet_sk_diag_fill);
@@ -211,20 +213,16 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw,
 {
 	long tmo;
 	struct inet_diag_msg *r;
-	const unsigned char *previous_tail = skb_tail_pointer(skb);
-	struct nlmsghdr *nlh = nlmsg_put(skb, pid, seq,
-					 unlh->nlmsg_type, sizeof(*r), 0);
+	struct nlmsghdr *nlh;
 
-	if (!nlh) {
-		nlmsg_trim(skb, previous_tail);
+	nlh = nlmsg_put(skb, pid, seq, unlh->nlmsg_type, sizeof(*r),
+			nlmsg_flags);
+	if (!nlh)
 		return -EMSGSIZE;
-	}
 
 	r = nlmsg_data(nlh);
 	BUG_ON(tw->tw_state != TCP_TIME_WAIT);
 
-	nlh->nlmsg_flags = nlmsg_flags;
-
 	tmo = tw->tw_ttd - jiffies;
 	if (tmo < 0)
 		tmo = 0;
@@ -253,8 +251,8 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw,
 		*(struct in6_addr *)r->id.idiag_dst = tw6->tw_v6_daddr;
 	}
 #endif
-	nlh->nlmsg_len = skb_tail_pointer(skb) - previous_tail;
-	return skb->len;
+
+	return nlmsg_end(skb, nlh);
 }
 
 static int sk_diag_fill(struct sock *sk, struct sk_buff *skb,
@@ -303,20 +301,20 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *in_s
 	if (err)
 		goto out;
 
-	err = -ENOMEM;
-	rep = alloc_skb(NLMSG_SPACE((sizeof(struct inet_diag_msg) +
-				     sizeof(struct inet_diag_meminfo) +
-				     sizeof(struct tcp_info) + 64)),
-			GFP_KERNEL);
-	if (!rep)
+	rep = nlmsg_new(sizeof(struct inet_diag_msg) +
+			sizeof(struct inet_diag_meminfo) +
+			sizeof(struct tcp_info) + 64, GFP_KERNEL);
+	if (!rep) {
+		err = -ENOMEM;
 		goto out;
+	}
 
 	err = sk_diag_fill(sk, rep, req,
 			   NETLINK_CB(in_skb).pid,
 			   nlh->nlmsg_seq, 0, nlh);
 	if (err < 0) {
 		WARN_ON(err == -EMSGSIZE);
-		kfree_skb(rep);
+		nlmsg_free(rep);
 		goto out;
 	}
 	err = netlink_unicast(sock_diag_nlsk, rep, NETLINK_CB(in_skb).pid,
@@ -597,19 +595,16 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
 {
 	const struct inet_request_sock *ireq = inet_rsk(req);
 	struct inet_sock *inet = inet_sk(sk);
-	unsigned char *b = skb_tail_pointer(skb);
 	struct inet_diag_msg *r;
 	struct nlmsghdr *nlh;
 	long tmo;
 
-	nlh = nlmsg_put(skb, pid, seq, unlh->nlmsg_type, sizeof(*r), 0);
-	if (!nlh) {
-		nlmsg_trim(skb, b);
-		return -1;
-	}
-	nlh->nlmsg_flags = NLM_F_MULTI;
-	r = nlmsg_data(nlh);
+	nlh = nlmsg_put(skb, pid, seq, unlh->nlmsg_type, sizeof(*r),
+			NLM_F_MULTI);
+	if (!nlh)
+		return -EMSGSIZE;
 
+	r = nlmsg_data(nlh);
 	r->idiag_family = sk->sk_family;
 	r->idiag_state = TCP_SYN_RECV;
 	r->idiag_timer = 1;
@@ -637,9 +632,8 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
 		*(struct in6_addr *)r->id.idiag_dst = inet6_rsk(req)->rmt_addr;
 	}
 #endif
-	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
 
-	return skb->len;
+	return nlmsg_end(skb, nlh);
 }
 
 static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 4/7] ipmr: Do not use RTA_PUT() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Also fix a needless skb tailroom check for a 4 bytes area
after after each rtnexthop block.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/ipv4/ipmr.c |   28 ++++++++++++++--------------
 1 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index c94bbc6..b4ac39f 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2006,37 +2006,37 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 {
 	int ct;
 	struct rtnexthop *nhp;
-	u8 *b = skb_tail_pointer(skb);
-	struct rtattr *mp_head;
+	struct nlattr *mp_attr;
 
 	/* If cache is unresolved, don't try to parse IIF and OIF */
 	if (c->mfc_parent >= MAXVIFS)
 		return -ENOENT;
 
-	if (VIF_EXISTS(mrt, c->mfc_parent))
-		RTA_PUT(skb, RTA_IIF, 4, &mrt->vif_table[c->mfc_parent].dev->ifindex);
+	if (VIF_EXISTS(mrt, c->mfc_parent) &&
+	    nla_put_u32(skb, RTA_IIF, mrt->vif_table[c->mfc_parent].dev->ifindex) < 0)
+		return -EMSGSIZE;
 
-	mp_head = (struct rtattr *)skb_put(skb, RTA_LENGTH(0));
+	if (!(mp_attr = nla_nest_start(skb, RTA_MULTIPATH)))
+		return -EMSGSIZE;
 
 	for (ct = c->mfc_un.res.minvif; ct < c->mfc_un.res.maxvif; ct++) {
 		if (VIF_EXISTS(mrt, ct) && c->mfc_un.res.ttls[ct] < 255) {
-			if (skb_tailroom(skb) < RTA_ALIGN(RTA_ALIGN(sizeof(*nhp)) + 4))
-				goto rtattr_failure;
-			nhp = (struct rtnexthop *)skb_put(skb, RTA_ALIGN(sizeof(*nhp)));
+			if (!(nhp = nla_reserve_nohdr(skb, sizeof(*nhp)))) {
+				nla_nest_cancel(skb, mp_attr);
+				return -EMSGSIZE;
+			}
+
 			nhp->rtnh_flags = 0;
 			nhp->rtnh_hops = c->mfc_un.res.ttls[ct];
 			nhp->rtnh_ifindex = mrt->vif_table[ct].dev->ifindex;
 			nhp->rtnh_len = sizeof(*nhp);
 		}
 	}
-	mp_head->rta_type = RTA_MULTIPATH;
-	mp_head->rta_len = skb_tail_pointer(skb) - (u8 *)mp_head;
+
+	nla_nest_end(skb, mp_attr);
+
 	rtm->rtm_type = RTN_MULTICAST;
 	return 1;
-
-rtattr_failure:
-	nlmsg_trim(skb, b);
-	return -EMSGSIZE;
 }
 
 int ipmr_get_route(struct net *net, struct sk_buff *skb,
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 5/7] ip6mr: Do not use RTA_PUT() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/ipv6/ip6mr.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 461e47c..4532973 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2104,8 +2104,9 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
 	if (c->mf6c_parent >= MAXMIFS)
 		return -ENOENT;
 
-	if (MIF_EXISTS(mrt, c->mf6c_parent))
-		RTA_PUT(skb, RTA_IIF, 4, &mrt->vif6_table[c->mf6c_parent].dev->ifindex);
+	if (MIF_EXISTS(mrt, c->mf6c_parent) &&
+	    nla_put_u32(skb, RTA_IIF, mrt->vif6_table[c->mf6c_parent].dev->ifindex) < 0)
+		return -EMSGSIZE;
 
 	mp_head = (struct rtattr *)skb_put(skb, RTA_LENGTH(0));
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 6/7] decnet: Do not use RTA_PUT() macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Also, no need to trim on nlmsg_put() failure, nothing has been added
yet.  We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/decnet/dn_route.c |   61 +++++++++++++++++++++++++++----------------
 net/decnet/dn_table.c |   69 +++++++++++++++++++++++++++++--------------------
 2 files changed, 79 insertions(+), 51 deletions(-)

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index cd584f7..2493ed5 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1515,56 +1515,68 @@ static int dn_rt_fill_info(struct sk_buff *skb, u32 pid, u32 seq,
 	struct dn_route *rt = (struct dn_route *)skb_dst(skb);
 	struct rtmsg *r;
 	struct nlmsghdr *nlh;
-	unsigned char *b = skb_tail_pointer(skb);
 	long expires;
 
 	nlh = nlmsg_put(skb, pid, seq, event, sizeof(*r), flags);
 	if (!nlh)
-		goto out_nlmsg_trim;
+		return -EMSGSIZE;
+
 	r = nlmsg_data(nlh);
 	r->rtm_family = AF_DECnet;
 	r->rtm_dst_len = 16;
 	r->rtm_src_len = 0;
 	r->rtm_tos = 0;
 	r->rtm_table = RT_TABLE_MAIN;
-	RTA_PUT_U32(skb, RTA_TABLE, RT_TABLE_MAIN);
 	r->rtm_type = rt->rt_type;
 	r->rtm_flags = (rt->rt_flags & ~0xFFFF) | RTM_F_CLONED;
 	r->rtm_scope = RT_SCOPE_UNIVERSE;
 	r->rtm_protocol = RTPROT_UNSPEC;
+
 	if (rt->rt_flags & RTCF_NOTIFY)
 		r->rtm_flags |= RTM_F_NOTIFY;
-	RTA_PUT(skb, RTA_DST, 2, &rt->rt_daddr);
+
+	if (nla_put_u32(skb, RTA_TABLE, RT_TABLE_MAIN) < 0 ||
+	    nla_put_le16(skb, RTA_DST, rt->rt_daddr) < 0)
+		goto errout;
+
 	if (rt->fld.saddr) {
 		r->rtm_src_len = 16;
-		RTA_PUT(skb, RTA_SRC, 2, &rt->fld.saddr);
+		if (nla_put_le16(skb, RTA_SRC, rt->fld.saddr) < 0)
+			goto errout;
 	}
-	if (rt->dst.dev)
-		RTA_PUT(skb, RTA_OIF, sizeof(int), &rt->dst.dev->ifindex);
+	if (rt->dst.dev &&
+	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex) < 0)
+		goto errout;
+
 	/*
 	 * Note to self - change this if input routes reverse direction when
 	 * they deal only with inputs and not with replies like they do
 	 * currently.
 	 */
-	RTA_PUT(skb, RTA_PREFSRC, 2, &rt->rt_local_src);
-	if (rt->rt_daddr != rt->rt_gateway)
-		RTA_PUT(skb, RTA_GATEWAY, 2, &rt->rt_gateway);
+	if (nla_put_le16(skb, RTA_PREFSRC, rt->rt_local_src) < 0)
+		goto errout;
+
+	if (rt->rt_daddr != rt->rt_gateway &&
+	    nla_put_le16(skb, RTA_GATEWAY, rt->rt_gateway) < 0)
+		goto errout;
+
 	if (rtnetlink_put_metrics(skb, dst_metrics_ptr(&rt->dst)) < 0)
-		goto rtattr_failure;
+		goto errout;
+
 	expires = rt->dst.expires ? rt->dst.expires - jiffies : 0;
 	if (rtnl_put_cacheinfo(skb, &rt->dst, 0, 0, 0, expires,
 			       rt->dst.error) < 0)
-		goto rtattr_failure;
-	if (dn_is_input_route(rt))
-		RTA_PUT(skb, RTA_IIF, sizeof(int), &rt->fld.flowidn_iif);
+		goto errout;
 
-	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
-	return skb->len;
+	if (dn_is_input_route(rt) &&
+	    nla_put_u32(skb, RTA_IIF, rt->fld.flowidn_iif) < 0)
+		goto errout;
 
-out_nlmsg_trim:
-rtattr_failure:
-	nlmsg_trim(skb, b);
-	return -1;
+	return nlmsg_end(skb, nlh);
+
+errout:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
 }
 
 /*
@@ -1587,7 +1599,7 @@ static int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
 	memset(&fld, 0, sizeof(fld));
 	fld.flowidn_proto = DNPROTO_NSP;
 
-	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
 	if (skb == NULL)
 		return -ENOBUFS;
 	skb_reset_mac_header(skb);
@@ -1665,13 +1677,16 @@ int dn_cache_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	struct dn_route *rt;
 	int h, s_h;
 	int idx, s_idx;
+	struct rtmsg *rtm;
 
 	if (!net_eq(net, &init_net))
 		return 0;
 
-	if (NLMSG_PAYLOAD(cb->nlh, 0) < sizeof(struct rtmsg))
+	if (nlmsg_len(cb->nlh) < sizeof(struct rtmsg))
 		return -EINVAL;
-	if (!(((struct rtmsg *)nlmsg_data(cb->nlh))->rtm_flags&RTM_F_CLONED))
+
+	rtm = nlmsg_data(cb->nlh);
+	if (!(rtm->rtm_flags & RTM_F_CLONED))
 		return 0;
 
 	s_h = cb->args[0];
diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
index 92ec741..16c986a 100644
--- a/net/decnet/dn_table.c
+++ b/net/decnet/dn_table.c
@@ -297,62 +297,75 @@ static int dn_fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
 {
 	struct rtmsg *rtm;
 	struct nlmsghdr *nlh;
-	unsigned char *b = skb_tail_pointer(skb);
 
 	nlh = nlmsg_put(skb, pid, seq, event, sizeof(*rtm), flags);
 	if (!nlh)
-		goto out_nlmsg_trim;
+		return -EMSGSIZE;
+
 	rtm = nlmsg_data(nlh);
 	rtm->rtm_family = AF_DECnet;
 	rtm->rtm_dst_len = dst_len;
 	rtm->rtm_src_len = 0;
 	rtm->rtm_tos = 0;
 	rtm->rtm_table = tb_id;
-	RTA_PUT_U32(skb, RTA_TABLE, tb_id);
 	rtm->rtm_flags = fi->fib_flags;
 	rtm->rtm_scope = scope;
 	rtm->rtm_type  = type;
-	if (rtm->rtm_dst_len)
-		RTA_PUT(skb, RTA_DST, 2, dst);
 	rtm->rtm_protocol = fi->fib_protocol;
-	if (fi->fib_priority)
-		RTA_PUT(skb, RTA_PRIORITY, 4, &fi->fib_priority);
+
+	if (nla_put_u32(skb, RTA_TABLE, tb_id) < 0)
+		goto errout;
+
+	if (rtm->rtm_dst_len &&
+	    nla_put(skb, RTA_DST, 2, dst) < 0)
+		goto errout;
+
+	if (fi->fib_priority &&
+	    nla_put_u32(skb, RTA_PRIORITY, fi->fib_priority) < 0)
+		goto errout;
+
 	if (rtnetlink_put_metrics(skb, fi->fib_metrics) < 0)
-		goto rtattr_failure;
+		goto errout;
+
 	if (fi->fib_nhs == 1) {
-		if (fi->fib_nh->nh_gw)
-			RTA_PUT(skb, RTA_GATEWAY, 2, &fi->fib_nh->nh_gw);
-		if (fi->fib_nh->nh_oif)
-			RTA_PUT(skb, RTA_OIF, sizeof(int), &fi->fib_nh->nh_oif);
+		if (fi->fib_nh->nh_gw &&
+		    nla_put_le16(skb, RTA_GATEWAY, fi->fib_nh->nh_gw) < 0)
+			goto errout;
+
+		if (fi->fib_nh->nh_oif &&
+		    nla_put_u32(skb, RTA_OIF, fi->fib_nh->nh_oif) < 0)
+			goto errout;
 	}
+
 	if (fi->fib_nhs > 1) {
 		struct rtnexthop *nhp;
-		struct rtattr *mp_head;
-		if (skb_tailroom(skb) <= RTA_SPACE(0))
-			goto rtattr_failure;
-		mp_head = (struct rtattr *)skb_put(skb, RTA_SPACE(0));
+		struct nlattr *mp_head;
+
+		if (!(mp_head = nla_nest_start(skb, RTA_MULTIPATH)))
+			goto errout;
 
 		for_nexthops(fi) {
-			if (skb_tailroom(skb) < RTA_ALIGN(RTA_ALIGN(sizeof(*nhp)) + 4))
-				goto rtattr_failure;
-			nhp = (struct rtnexthop *)skb_put(skb, RTA_ALIGN(sizeof(*nhp)));
+			if (!(nhp = nla_reserve_nohdr(skb, sizeof(*nhp))))
+				goto errout;
+
 			nhp->rtnh_flags = nh->nh_flags & 0xFF;
 			nhp->rtnh_hops = nh->nh_weight - 1;
 			nhp->rtnh_ifindex = nh->nh_oif;
-			if (nh->nh_gw)
-				RTA_PUT(skb, RTA_GATEWAY, 2, &nh->nh_gw);
+
+			if (nh->nh_gw &&
+			    nla_put_le16(skb, RTA_GATEWAY, nh->nh_gw) < 0)
+				goto errout;
+
 			nhp->rtnh_len = skb_tail_pointer(skb) - (unsigned char *)nhp;
 		} endfor_nexthops(fi);
-		mp_head->rta_type = RTA_MULTIPATH;
-		mp_head->rta_len = skb_tail_pointer(skb) - (u8 *)mp_head;
+
+		nla_nest_end(skb, mp_head);
 	}
 
-	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
-	return skb->len;
+	return nlmsg_end(skb, nlh);
 
-out_nlmsg_trim:
-rtattr_failure:
-	nlmsg_trim(skb, b);
+errout:
+	nlmsg_cancel(skb, nlh);
 	return -EMSGSIZE;
 }
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 7/7] netlink: Get rid of obsolete rtnetlink macros
From: Thomas Graf @ 2012-06-27  9:36 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1340788373.git.tgraf@suug.ch>

Removes all RTA_GET*() and RTA_PUT*() variations, as well as the
the unused rtattr_strcmp(). Get rid of rtm_get_table() by moving
it to its only user decnet.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/linux/rtnetlink.h |  129 ---------------------------------------------
 net/core/rtnetlink.c      |   13 -----
 net/decnet/dn_fib.c       |    8 +++
 3 files changed, 8 insertions(+), 142 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2c1de89..ea60b08 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -612,12 +612,6 @@ struct tcamsg {
 #include <linux/mutex.h>
 #include <linux/netdevice.h>
 
-static __inline__ int rtattr_strcmp(const struct rtattr *rta, const char *str)
-{
-	int len = strlen(str) + 1;
-	return len > rta->rta_len || memcmp(RTA_DATA(rta), str, len);
-}
-
 extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 group, int echo);
 extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid);
 extern void rtnl_notify(struct sk_buff *skb, struct net *net, u32 pid,
@@ -628,122 +622,6 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
 			      u32 id, u32 ts, u32 tsage, long expires,
 			      u32 error);
 
-extern void __rta_fill(struct sk_buff *skb, int attrtype, int attrlen, const void *data);
-
-#define RTA_PUT(skb, attrtype, attrlen, data) \
-({	if (unlikely(skb_tailroom(skb) < (int)RTA_SPACE(attrlen))) \
-		 goto rtattr_failure; \
-   	__rta_fill(skb, attrtype, attrlen, data); }) 
-
-#define RTA_APPEND(skb, attrlen, data) \
-({	if (unlikely(skb_tailroom(skb) < (int)(attrlen))) \
-		goto rtattr_failure; \
-	memcpy(skb_put(skb, attrlen), data, attrlen); })
-
-#define RTA_PUT_NOHDR(skb, attrlen, data) \
-({	RTA_APPEND(skb, RTA_ALIGN(attrlen), data); \
-	memset(skb_tail_pointer(skb) - (RTA_ALIGN(attrlen) - attrlen), 0, \
-	       RTA_ALIGN(attrlen) - attrlen); })
-
-#define RTA_PUT_U8(skb, attrtype, value) \
-({	u8 _tmp = (value); \
-	RTA_PUT(skb, attrtype, sizeof(u8), &_tmp); })
-
-#define RTA_PUT_U16(skb, attrtype, value) \
-({	u16 _tmp = (value); \
-	RTA_PUT(skb, attrtype, sizeof(u16), &_tmp); })
-
-#define RTA_PUT_U32(skb, attrtype, value) \
-({	u32 _tmp = (value); \
-	RTA_PUT(skb, attrtype, sizeof(u32), &_tmp); })
-
-#define RTA_PUT_U64(skb, attrtype, value) \
-({	u64 _tmp = (value); \
-	RTA_PUT(skb, attrtype, sizeof(u64), &_tmp); })
-
-#define RTA_PUT_SECS(skb, attrtype, value) \
-	RTA_PUT_U64(skb, attrtype, (value) / HZ)
-
-#define RTA_PUT_MSECS(skb, attrtype, value) \
-	RTA_PUT_U64(skb, attrtype, jiffies_to_msecs(value))
-
-#define RTA_PUT_STRING(skb, attrtype, value) \
-	RTA_PUT(skb, attrtype, strlen(value) + 1, value)
-
-#define RTA_PUT_FLAG(skb, attrtype) \
-	RTA_PUT(skb, attrtype, 0, NULL);
-
-#define RTA_NEST(skb, type) \
-({	struct rtattr *__start = (struct rtattr *)skb_tail_pointer(skb); \
-	RTA_PUT(skb, type, 0, NULL); \
-	__start;  })
-
-#define RTA_NEST_END(skb, start) \
-({	(start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \
-	(skb)->len; })
-
-#define RTA_NEST_COMPAT(skb, type, attrlen, data) \
-({	struct rtattr *__start = (struct rtattr *)skb_tail_pointer(skb); \
-	RTA_PUT(skb, type, attrlen, data); \
-	RTA_NEST(skb, type); \
-	__start; })
-
-#define RTA_NEST_COMPAT_END(skb, start) \
-({	struct rtattr *__nest = (void *)(start) + NLMSG_ALIGN((start)->rta_len); \
-	(start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \
-	RTA_NEST_END(skb, __nest); \
-	(skb)->len; })
-
-#define RTA_NEST_CANCEL(skb, start) \
-({	if (start) \
-		skb_trim(skb, (unsigned char *) (start) - (skb)->data); \
-	-1; })
-
-#define RTA_GET_U8(rta) \
-({	if (!rta || RTA_PAYLOAD(rta) < sizeof(u8)) \
-		goto rtattr_failure; \
-	*(u8 *) RTA_DATA(rta); })
-
-#define RTA_GET_U16(rta) \
-({	if (!rta || RTA_PAYLOAD(rta) < sizeof(u16)) \
-		goto rtattr_failure; \
-	*(u16 *) RTA_DATA(rta); })
-
-#define RTA_GET_U32(rta) \
-({	if (!rta || RTA_PAYLOAD(rta) < sizeof(u32)) \
-		goto rtattr_failure; \
-	*(u32 *) RTA_DATA(rta); })
-
-#define RTA_GET_U64(rta) \
-({	u64 _tmp; \
-	if (!rta || RTA_PAYLOAD(rta) < sizeof(u64)) \
-		goto rtattr_failure; \
-	memcpy(&_tmp, RTA_DATA(rta), sizeof(_tmp)); \
-	_tmp; })
-
-#define RTA_GET_FLAG(rta) (!!(rta))
-
-#define RTA_GET_SECS(rta) ((unsigned long) RTA_GET_U64(rta) * HZ)
-#define RTA_GET_MSECS(rta) (msecs_to_jiffies((unsigned long) RTA_GET_U64(rta)))
-		
-static inline struct rtattr *
-__rta_reserve(struct sk_buff *skb, int attrtype, int attrlen)
-{
-	struct rtattr *rta;
-	int size = RTA_LENGTH(attrlen);
-
-	rta = (struct rtattr*)skb_put(skb, RTA_ALIGN(size));
-	rta->rta_type = attrtype;
-	rta->rta_len = size;
-	memset(RTA_DATA(rta) + attrlen, 0, RTA_ALIGN(size) - size);
-	return rta;
-}
-
-#define __RTA_PUT(skb, attrtype, attrlen) \
-({ 	if (unlikely(skb_tailroom(skb) < (int)RTA_SPACE(attrlen))) \
-		goto rtattr_failure; \
-   	__rta_reserve(skb, attrtype, attrlen); })
-
 extern void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change);
 
 /* RTNL is used as a global lock for all changes to network configuration  */
@@ -794,13 +672,6 @@ extern void __rtnl_unlock(void);
 	} \
 } while(0)
 
-static inline u32 rtm_get_table(struct rtattr **rta, u8 table)
-{
-	return RTA_GET_U32(rta[RTA_TABLE-1]);
-rtattr_failure:
-	return table;
-}
-
 extern int ndo_dflt_fdb_dump(struct sk_buff *skb,
 			     struct netlink_callback *cb,
 			     struct net_device *dev,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 21318d1..bc8a1cd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -541,19 +541,6 @@ static const int rta_max[RTM_NR_FAMILIES] =
 	[RTM_FAM(RTM_NEWACTION)]    = TCAA_MAX,
 };
 
-void __rta_fill(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
-{
-	struct rtattr *rta;
-	int size = RTA_LENGTH(attrlen);
-
-	rta = (struct rtattr *)skb_put(skb, RTA_ALIGN(size));
-	rta->rta_type = attrtype;
-	rta->rta_len = size;
-	memcpy(RTA_DATA(rta), data, attrlen);
-	memset(RTA_DATA(rta) + attrlen, 0, RTA_ALIGN(size) - size);
-}
-EXPORT_SYMBOL(__rta_fill);
-
 int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, unsigned int group, int echo)
 {
 	struct sock *rtnl = net->rtnl;
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 7eaf987..102d610 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -505,6 +505,14 @@ static int dn_fib_check_attr(struct rtmsg *r, struct rtattr **rta)
 	return 0;
 }
 
+static inline u32 rtm_get_table(struct rtattr **rta, u8 table)
+{
+	if (rta[RTA_TABLE - 1])
+		table = nla_get_u32((struct nlattr *) rta[RTA_TABLE - 1]);
+
+	return table;
+}
+
 static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
-- 
1.7.7.6

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox