Netdev List
 help / color / mirror / Atom feed
* Re: [net-next RFC] pktgen: don't wait for the device who doesn't free skb immediately after sent
From: Jason Wang @ 2012-12-03  6:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: rusty, mst, davem, netdev, linux-kernel, virtualization
In-Reply-To: <20121127084919.1587c647@nehalam.linuxnetplumber.net>

On Tuesday, November 27, 2012 08:49:19 AM Stephen Hemminger wrote:
> On Tue, 27 Nov 2012 14:45:13 +0800
> 
> Jason Wang <jasowang@redhat.com> wrote:
> > On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > On Mon, 26 Nov 2012 15:56:52 +0800
> > > 
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >> Some deivces do not free the old tx skbs immediately after it has been
> > >> sent
> > >> (usually in tx interrupt). One such example is virtio-net which
> > >> optimizes for virt and only free the possible old tx skbs during the
> > >> next packet sending. This would lead the pktgen to wait forever in the
> > >> refcount of the skb if no other pakcet will be sent afterwards.
> > >> 
> > >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > >> which could notify the pktgen that the device does not free skb
> > >> immediately after it has been sent and let it not to wait for the
> > >> refcount to be one.
> > >> 
> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > 
> > > Another alternative would be using skb_orphan() and skb->destructor.
> > > There are other cases where skb's are not freed right away.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > Hi Stephen:
> > 
> > Do you mean registering a skb->destructor for pktgen then set and check
> > bits in skb->tx_flag?
> 
> Yes. Register a destructor that does something like update a counter (number
> of packets pending), then just spin while number of packets pending is over
> threshold.

Have some experiments on this, looks like it does not work weel when clone_skb 
is used. For driver that call skb_orphan() in ndo_start_xmit, the destructor 
is only called when the first packet were sent, but what we need to know is 
when the last were sent. Any thoughts on this or we can just introduce another 
flag (anyway we have something like IFF_TX_SKB_SHARING) ?

Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [net-next rfc v7 3/3] virtio-net: change the number of queues through ethtool
From: Jason Wang @ 2012-12-03  6:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
	jwhan, shiyer
In-Reply-To: <20121202160906.GB27761@redhat.com>

On Sunday, December 02, 2012 06:09:06 PM Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2012 at 06:16:00PM +0800, Jason Wang wrote:
> > This patch implement the {set|get}_channels method of ethool to allow user
> > to change the number of queues dymaically when the device is running.
> > This would let the user to configure it on demand.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > 
> >  drivers/net/virtio_net.c |   41 +++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 41 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index bcaa6e5..f08ec2a 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1578,10 +1578,51 @@ static struct virtio_driver virtio_net_driver = {
> > 
> >  #endif
> >  };
> > 
> > +/* TODO: Eliminate OOO packets during switching */
> > +static int virtnet_set_channels(struct net_device *dev,
> > +				struct ethtool_channels *channels)
> > +{
> > +	struct virtnet_info *vi = netdev_priv(dev);
> > +	u16 queue_pairs = channels->combined_count;
> > +
> > +	/* We don't support separate rx/tx channels.
> > +	 * We don't allow setting 'other' channels.
> > +	 */
> > +	if (channels->rx_count || channels->tx_count || channels->other_count)
> > +		return -EINVAL;
> > +
> > +	/* Only two modes were support currently */
> > +	if (queue_pairs != vi->max_queue_pairs && queue_pairs != 1)
> > +		return -EINVAL;
> > +
> 
> Why the limitation?

Not sure the value bettwen 1 and max_queue_pairs is useful. But anyway, I can 
remove this limitation.
> Also how does userspace discover what the legal values are?

Userspace only check whether the value is greater than max_queue_pairs.
> 
> > +	vi->curr_queue_pairs = queue_pairs;
> > +	BUG_ON(virtnet_set_queues(vi));
> > +
> > +	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> > +	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> > +
> > +	return 0;
> > +}
> > +
> > +static void virtnet_get_channels(struct net_device *dev,
> > +				 struct ethtool_channels *channels)
> > +{
> > +	struct virtnet_info *vi = netdev_priv(dev);
> > +
> > +	channels->combined_count = vi->curr_queue_pairs;
> > +	channels->max_combined = vi->max_queue_pairs;
> > +	channels->max_other = 0;
> > +	channels->rx_count = 0;
> > +	channels->tx_count = 0;
> > +	channels->other_count = 0;
> > +}
> > +
> > 
> >  static const struct ethtool_ops virtnet_ethtool_ops = {
> >  
> >  	.get_drvinfo = virtnet_get_drvinfo,
> >  	.get_link = ethtool_op_get_link,
> >  	.get_ringparam = virtnet_get_ringparam,
> > 
> > +	.set_channels = virtnet_set_channels,
> > +	.get_channels = virtnet_get_channels,
> > 
> >  };
> >  
> >  static int __init init(void)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-03  6:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: krkumar2, kvm, mst, netdev, linux-kernel, virtualization,
	bhutchings, jwhan, shiyer
In-Reply-To: <87vccjj3hj.fsf@rustcorp.com.au>

On Monday, December 03, 2012 12:34:08 PM Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > +static const struct ethtool_ops virtnet_ethtool_ops;
> > +
> > +/*
> > + * Converting between virtqueue no. and kernel tx/rx queue no.
> > + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> > + */
> > +static int vq2txq(struct virtqueue *vq)
> > +{
> > +	int index = virtqueue_get_queue_index(vq);
> > +	return index == 1 ? 0 : (index - 2) / 2;
> > +}
> > +
> > +static int txq2vq(int txq)
> > +{
> > +	return txq ? 2 * txq + 2 : 1;
> > +}
> > +
> > +static int vq2rxq(struct virtqueue *vq)
> > +{
> > +	int index = virtqueue_get_queue_index(vq);
> > +	return index ? (index - 1) / 2 : 0;
> > +}
> > +
> > +static int rxq2vq(int rxq)
> > +{
> > +	return rxq ? 2 * rxq + 1 : 0;
> > +}
> > +
> 
> I thought MST changed the proposed spec to make the control queue always
> the last one, so this logic becomes trivial.

But it may break the support of legacy guest. If we boot a legacy single queue 
guest on a 2 queue virtio-net device. It may think vq 2 is cvq which is indeed 
rx1.
> 
> > +static int virtnet_set_queues(struct virtnet_info *vi)
> > +{
> > +	struct scatterlist sg;
> > +	struct virtio_net_ctrl_rfs s;
> > +	struct net_device *dev = vi->dev;
> > +
> > +	s.virtqueue_pairs = vi->curr_queue_pairs;
> > +	sg_init_one(&sg, &s, sizeof(s));
> > +
> > +	if (!vi->has_cvq)
> > +		return -EINVAL;
> > +
> > +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> > +				  VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> > +		dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> > +			 " %d\n", vi->curr_queue_pairs);
> > +		return -EINVAL;
> > +	}
> 
> Where do we check the VIRTIO_NET_F_RFS bit?

Yes, we need this check. Will let the caller does the check and add a comment 
and check in the caller.
> 
> >  static int virtnet_probe(struct virtio_device *vdev)
> >  {
> > 
> > -	int err;
> > +	int i, err;
> > 
> >  	struct net_device *dev;
> >  	struct virtnet_info *vi;
> > 
> > +	u16 curr_queue_pairs;
> > +
> > +	/* Find if host supports multiqueue virtio_net device */
> > +	err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> > +				offsetof(struct virtio_net_config,
> > +				max_virtqueue_pairs), &curr_queue_pairs);
> > +
> > +	/* We need at least 2 queue's */
> > +	if (err)
> > +		curr_queue_pairs = 1;
> 
> Huh?  Just call this queue_pairs.  It's not curr_ at all...
> 
> > +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> > +		vi->has_cvq = true;
> > +
> > +	/* Use single tx/rx queue pair as default */
> > +	vi->curr_queue_pairs = 1;
> > +	vi->max_queue_pairs = curr_queue_pairs;
> 
> See...

Right, will use max_queue_pairs then.

Thanks
> 
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Jason Wang @ 2012-12-03  5:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: krkumar2, kvm, netdev, linux-kernel, virtualization, bhutchings,
	jwhan, shiyer
In-Reply-To: <20121202160631.GA27761@redhat.com>

On Sunday, December 02, 2012 06:06:31 PM Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2012 at 06:15:59PM +0800, Jason Wang wrote:
> > This addes multiqueue support to virtio_net driver. In multiple queue
> > modes, the driver expects the number of queue paris is equal to the
> > number of vcpus. To eliminate the contention bettwen vcpus and
> > virtqueues, per-cpu virtqueue pairs were implemented through:
> > 
> > - select the txq based on the smp processor id.
> > - smp affinity hint were set to the vcpu that owns the queue pairs.
> > 
> > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > 
> >  drivers/net/virtio_net.c        |  454
> >  ++++++++++++++++++++++++++++++--------- include/uapi/linux/virtio_net.h
> >  |   16 ++
> >  2 files changed, 371 insertions(+), 99 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 7975133..bcaa6e5 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -84,17 +84,25 @@ struct virtnet_info {
> > 
> >  	struct virtio_device *vdev;
> >  	struct virtqueue *cvq;
> >  	struct net_device *dev;
> > 
> > -	struct napi_struct napi;
> > -	struct send_queue sq;
> > -	struct receive_queue rq;
> > +	struct send_queue *sq;
> > +	struct receive_queue *rq;
> > 
> >  	unsigned int status;
> > 
> > +	/* Max # of queue pairs supported by the device */
> > +	u16 max_queue_pairs;
> > +
> > +	/* # of queue pairs currently used by the driver */
> > +	u16 curr_queue_pairs;
> > +
> > 
> >  	/* I like... big packets and I cannot lie! */
> >  	bool big_packets;
> >  	
> >  	/* Host will merge rx buffers for big packets (shake it! shake it!) */
> >  	bool mergeable_rx_bufs;
> > 
> > +	/* Has control virtqueue */
> > +	bool has_cvq;
> > +
> > 
> >  	/* enable config space updates */
> >  	bool config_enable;
> > 
> > @@ -126,6 +134,34 @@ struct padded_vnet_hdr {
> > 
> >  	char padding[6];
> >  
> >  };
> > 
> > +static const struct ethtool_ops virtnet_ethtool_ops;
> > +
> > +/*
> > + * Converting between virtqueue no. and kernel tx/rx queue no.
> > + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> > + */
> 
> Weird, this isn't what spec v5 says: it says
> 0:rx0 1:tx0 2: rx1 3: tx1 .... vcq
> We can change the spec to match but keeping all rx/tx
> together seems a bit prettier?

Oh, I miss the check of this part in v5. Have a thought about this, if we 
change the location of cvq, we may break the support of legacy guest with only 
single queue support. Consider we start a vm with 2 queue but boot a signle 
queue legacy guest, it may think vq 2 is cvq which indeed is rx1.
> 
> > +static int vq2txq(struct virtqueue *vq)
> > +{
> > +	int index = virtqueue_get_queue_index(vq);
> > +	return index == 1 ? 0 : (index - 2) / 2;
> > +}
> > +
> > +static int txq2vq(int txq)
> > +{
> > +	return txq ? 2 * txq + 2 : 1;
> > +}
> > +
> > +static int vq2rxq(struct virtqueue *vq)
> > +{
> > +	int index = virtqueue_get_queue_index(vq);
> > +	return index ? (index - 1) / 2 : 0;
> > +}
> > +
> > +static int rxq2vq(int rxq)
> > +{
> > +	return rxq ? 2 * rxq + 1 : 0;
> > +}
> > +
> > 
> >  static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
> >  {
> >  
> >  	return (struct skb_vnet_hdr *)skb->cb;
> > 
> > @@ -166,7 +202,7 @@ static void skb_xmit_done(struct virtqueue *vq)
> > 
> >  	virtqueue_disable_cb(vq);
> >  	
> >  	/* We were probably waiting for more output buffers. */
> > 
> > -	netif_wake_queue(vi->dev);
> > +	netif_wake_subqueue(vi->dev, vq2txq(vq));
> > 
> >  }
> >  
> >  static void set_skb_frag(struct sk_buff *skb, struct page *page,
> > 
> > @@ -503,7 +539,7 @@ static bool try_fill_recv(struct receive_queue *rq,
> > gfp_t gfp)> 
> >  static void skb_recv_done(struct virtqueue *rvq)
> >  {
> >  
> >  	struct virtnet_info *vi = rvq->vdev->priv;
> > 
> > -	struct receive_queue *rq = &vi->rq;
> > +	struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
> > 
> >  	/* Schedule NAPI, Suppress further interrupts if successful. */
> >  	if (napi_schedule_prep(&rq->napi)) {
> > 
> > @@ -650,7 +686,8 @@ static int xmit_skb(struct send_queue *sq, struct
> > sk_buff *skb)> 
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device
> >  *dev)
> >  {
> >  
> >  	struct virtnet_info *vi = netdev_priv(dev);
> > 
> > -	struct send_queue *sq = &vi->sq;
> > +	int qnum = skb_get_queue_mapping(skb);
> > +	struct send_queue *sq = &vi->sq[qnum];
> > 
> >  	int capacity;
> >  	
> >  	/* Free up any pending old buffers before queueing new ones. */
> > 
> > @@ -664,13 +701,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
> > struct net_device *dev)> 
> >  		if (likely(capacity == -ENOMEM)) {
> >  		
> >  			if (net_ratelimit())
> >  			
> >  				dev_warn(&dev->dev,
> > 
> > -					 "TX queue failure: out of memory\n");
> > +					 "TXQ (%d) failure: out of memory\n",
> > +					 qnum);
> > 
> >  		} else {
> >  		
> >  			dev->stats.tx_fifo_errors++;
> >  			if (net_ratelimit())
> >  			
> >  				dev_warn(&dev->dev,
> > 
> > -					 "Unexpected TX queue failure: %d\n",
> > -					 capacity);
> > +					 "Unexpected TXQ (%d) failure: %d\n",
> > +					 qnum, capacity);
> > 
> >  		}
> >  		dev->stats.tx_dropped++;
> >  		kfree_skb(skb);
> > 
> > @@ -685,12 +723,12 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
> > struct net_device *dev)> 
> >  	/* Apparently nice girls don't return TX_BUSY; stop the queue
> >  	
> >  	 * before it gets out of hand.  Naturally, this wastes entries. */
> >  	
> >  	if (capacity < 2+MAX_SKB_FRAGS) {
> > 
> > -		netif_stop_queue(dev);
> > +		netif_stop_subqueue(dev, qnum);
> > 
> >  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> >  		
> >  			/* More just got used, free them then recheck. */
> >  			capacity += free_old_xmit_skbs(sq);
> >  			if (capacity >= 2+MAX_SKB_FRAGS) {
> > 
> > -				netif_start_queue(dev);
> > +				netif_start_subqueue(dev, qnum);
> > 
> >  				virtqueue_disable_cb(sq->vq);
> >  			
> >  			}
> >  		
> >  		}
> > 
> > @@ -758,23 +796,13 @@ static struct rtnl_link_stats64
> > *virtnet_stats(struct net_device *dev,> 
> >  static void virtnet_netpoll(struct net_device *dev)
> >  {
> >  
> >  	struct virtnet_info *vi = netdev_priv(dev);
> > 
> > +	int i;
> > 
> > -	napi_schedule(&vi->rq.napi);
> > +	for (i = 0; i < vi->curr_queue_pairs; i++)
> > +		napi_schedule(&vi->rq[i].napi);
> > 
> >  }
> >  #endif
> > 
> > -static int virtnet_open(struct net_device *dev)
> > -{
> > -	struct virtnet_info *vi = netdev_priv(dev);
> > -
> > -	/* Make sure we have some buffers: if oom use wq. */
> > -	if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> > -		schedule_delayed_work(&vi->rq.refill, 0);
> > -
> > -	virtnet_napi_enable(&vi->rq);
> > -	return 0;
> > -}
> > -
> > 
> >  /*
> >  
> >   * Send command via the control virtqueue and check status.  Commands
> >   * supported by the hypervisor, as indicated by feature bits, should
> > 
> > @@ -830,13 +858,53 @@ static void virtnet_ack_link_announce(struct
> > virtnet_info *vi)> 
> >  	rtnl_unlock();
> >  
> >  }
> > 
> > +static int virtnet_set_queues(struct virtnet_info *vi)
> > +{
> > +	struct scatterlist sg;
> > +	struct virtio_net_ctrl_rfs s;
> > +	struct net_device *dev = vi->dev;
> > +
> > +	s.virtqueue_pairs = vi->curr_queue_pairs;
> > +	sg_init_one(&sg, &s, sizeof(s));
> > +
> > +	if (!vi->has_cvq)
> > +		return -EINVAL;
> > +
> > +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> > +				  VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> > +		dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> > +			 " %d\n", vi->curr_queue_pairs);
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int virtnet_open(struct net_device *dev)
> > +{
> > +	struct virtnet_info *vi = netdev_priv(dev);
> > +	int i;
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		/* Make sure we have some buffers: if oom use wq. */
> > +		if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> > +			schedule_delayed_work(&vi->rq[i].refill, 0);
> > +		virtnet_napi_enable(&vi->rq[i]);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > 
> >  static int virtnet_close(struct net_device *dev)
> >  {
> >  
> >  	struct virtnet_info *vi = netdev_priv(dev);
> > 
> > +	int i;
> > 
> >  	/* Make sure refill_work doesn't re-enable napi! */
> > 
> > -	cancel_delayed_work_sync(&vi->rq.refill);
> > -	napi_disable(&vi->rq.napi);
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		cancel_delayed_work_sync(&vi->rq[i].refill);
> > +		napi_disable(&vi->rq[i].napi);
> > +	}
> > 
> >  	return 0;
> >  
> >  }
> > 
> > @@ -948,8 +1016,8 @@ static void virtnet_get_ringparam(struct net_device
> > *dev,> 
> >  {
> >  
> >  	struct virtnet_info *vi = netdev_priv(dev);
> > 
> > -	ring->rx_max_pending = virtqueue_get_vring_size(vi->rq.vq);
> > -	ring->tx_max_pending = virtqueue_get_vring_size(vi->sq.vq);
> > +	ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0].vq);
> > +	ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0].vq);
> > 
> >  	ring->rx_pending = ring->rx_max_pending;
> >  	ring->tx_pending = ring->tx_max_pending;
> >  
> >  }
> > 
> > @@ -967,12 +1035,6 @@ static void virtnet_get_drvinfo(struct net_device
> > *dev,> 
> >  }
> > 
> > -static const struct ethtool_ops virtnet_ethtool_ops = {
> > -	.get_drvinfo = virtnet_get_drvinfo,
> > -	.get_link = ethtool_op_get_link,
> > -	.get_ringparam = virtnet_get_ringparam,
> > -};
> > -
> > 
> >  #define MIN_MTU 68
> >  #define MAX_MTU 65535
> > 
> > @@ -984,6 +1046,20 @@ static int virtnet_change_mtu(struct net_device
> > *dev, int new_mtu)> 
> >  	return 0;
> >  
> >  }
> > 
> > +/* To avoid contending a lock hold by a vcpu who would exit to host,
> > select the + * txq based on the processor id.
> > + */
> > +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff
> > *skb) +{
> > +	int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
> > +		  smp_processor_id();
> > +
> > +	while (unlikely(txq >= dev->real_num_tx_queues))
> > +		txq -= dev->real_num_tx_queues;
> > +
> > +	return txq;
> > +}
> > +
> > 
> >  static const struct net_device_ops virtnet_netdev = {
> >  
> >  	.ndo_open            = virtnet_open,
> >  	.ndo_stop   	     = virtnet_close,
> > 
> > @@ -995,6 +1071,7 @@ static const struct net_device_ops virtnet_netdev = {
> > 
> >  	.ndo_get_stats64     = virtnet_stats,
> >  	.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
> >  	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> > 
> > +	.ndo_select_queue     = virtnet_select_queue,
> > 
> >  #ifdef CONFIG_NET_POLL_CONTROLLER
> >  
> >  	.ndo_poll_controller = virtnet_netpoll,
> >  
> >  #endif
> > 
> > @@ -1030,10 +1107,10 @@ static void virtnet_config_changed_work(struct
> > work_struct *work)> 
> >  	if (vi->status & VIRTIO_NET_S_LINK_UP) {
> >  	
> >  		netif_carrier_on(vi->dev);
> > 
> > -		netif_wake_queue(vi->dev);
> > +		netif_tx_wake_all_queues(vi->dev);
> > 
> >  	} else {
> >  	
> >  		netif_carrier_off(vi->dev);
> > 
> > -		netif_stop_queue(vi->dev);
> > +		netif_tx_stop_all_queues(vi->dev);
> > 
> >  	}
> >  
> >  done:
> >  	mutex_unlock(&vi->config_lock);
> > 
> > @@ -1046,41 +1123,212 @@ static void virtnet_config_changed(struct
> > virtio_device *vdev)> 
> >  	schedule_work(&vi->config_work);
> >  
> >  }
> > 
> > -static int init_vqs(struct virtnet_info *vi)
> > +static void free_receive_bufs(struct virtnet_info *vi)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		while (vi->rq[i].pages)
> > +			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> > +	}
> > +}
> > +
> > +/* Free memory allocated for send and receive queues */
> > +static void virtnet_free_queues(struct virtnet_info *vi)
> > 
> >  {
> > 
> > -	struct virtqueue *vqs[3];
> > -	vq_callback_t *callbacks[] = { skb_recv_done, skb_xmit_done, NULL};
> > -	const char *names[] = { "input", "output", "control" };
> > -	int nvqs, err;
> > +	kfree(vi->rq);
> > +	vi->rq = NULL;
> > +	kfree(vi->sq);
> > +	vi->sq = NULL;
> > +}
> > 
> > -	/* We expect two virtqueues, receive then send,
> > -	 * and optionally control. */
> > -	nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
> > +static void free_unused_bufs(struct virtnet_info *vi)
> > +{
> > +	void *buf;
> > +	int i;
> > 
> > -	err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
> > -	if (err)
> > -		return err;
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		struct virtqueue *vq = vi->sq[i].vq;
> > +		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > +			dev_kfree_skb(buf);
> > +	}
> > 
> > -	vi->rq.vq = vqs[0];
> > -	vi->sq.vq = vqs[1];
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		struct virtqueue *vq = vi->rq[i].vq;
> > 
> > -	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
> > -		vi->cvq = vqs[2];
> > +		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > +			if (vi->mergeable_rx_bufs || vi->big_packets)
> > +				give_pages(&vi->rq[i], buf);
> > +			else
> > +				dev_kfree_skb(buf);
> > +			--vi->rq[i].num;
> > +		}
> > +		BUG_ON(vi->rq[i].num != 0);
> > +	}
> > +}
> > 
> > +static void virtnet_set_affinity(struct virtnet_info *vi, bool set)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		int cpu = set ? i : -1;
> > +		virtqueue_set_affinity(vi->rq[i].vq, cpu);
> > +		virtqueue_set_affinity(vi->sq[i].vq, cpu);
> > +	}
> > +}
> > +
> > +static void virtnet_del_vqs(struct virtnet_info *vi)
> > +{
> > +	struct virtio_device *vdev = vi->vdev;
> > +
> > +	virtnet_set_affinity(vi, false);
> > +
> > +	vdev->config->del_vqs(vdev);
> > +
> > +	virtnet_free_queues(vi);
> > +}
> > +
> > +static int virtnet_find_vqs(struct virtnet_info *vi)
> > +{
> > +	vq_callback_t **callbacks;
> > +	struct virtqueue **vqs;
> > +	int ret = -ENOMEM;
> > +	int i, total_vqs;
> > +	char **names;
> > +
> > +	/*
> > +	 * We expect 1 RX virtqueue followed by 1 TX virtqueue, followd by
> > +	 * possible control virtqueue, followed by RX/TX N-1 queue pairs used
> > +	 * in multiqueue mode.
> > +	 */
> > +	total_vqs = vi->max_queue_pairs * 2 +
> > +		    virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
> > +
> > +	/* Allocate space for find_vqs parameters */
> > +	vqs = kzalloc(total_vqs * sizeof(*vqs), GFP_KERNEL);
> > +	callbacks = kzalloc(total_vqs * sizeof(*callbacks), GFP_KERNEL);
> > +	if (!vqs || !callbacks)
> > +		goto err_mem;
> > +	names = kzalloc(total_vqs * sizeof(*names), GFP_KERNEL);
> > +	if (!names)
> > +		goto err_mem;
> > +
> > +	/* Parameters for control virtqueue, if any */
> > +	if (vi->has_cvq) {
> > +		callbacks[2] = NULL;
> > +		names[2] = kasprintf(GFP_KERNEL, "control");
> > +	}
> > +
> > +	/* Allocate/initialize parameters for send/receive virtqueues */
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		callbacks[rxq2vq(i)] = skb_recv_done;
> > +		callbacks[txq2vq(i)] = skb_xmit_done;
> > +		names[rxq2vq(i)] = kasprintf(GFP_KERNEL, "input.%d", i);
> > +		names[txq2vq(i)] = kasprintf(GFP_KERNEL, "output.%d", i);
> > +	}
> > +
> > +	ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
> > +					 (const char **)names);
> > +	if (ret)
> > +		goto err_names;
> > +
> > +	if (vi->has_cvq) {
> > +		vi->cvq = vqs[2];
> > 
> >  		if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VLAN))
> >  		
> >  			vi->dev->features |= NETIF_F_HW_VLAN_FILTER;
> >  	
> >  	}
> > 
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		vi->rq[i].vq = vqs[rxq2vq(i)];
> > +		vi->sq[i].vq = vqs[txq2vq(i)];
> > +	}
> > +
> > +	kfree(callbacks);
> > +	kfree(vqs);
> > +
> > +	return 0;
> > +
> > +err_names:
> > +	for (i = 0; i < total_vqs * 2; i ++)
> > +		kfree(names[i]);
> > +	kfree(names);
> > +
> > +err_mem:
> > +	kfree(callbacks);
> > +	kfree(vqs);
> > +
> > +	return ret;
> > +}
> > +
> > +static int virtnet_alloc_queues(struct virtnet_info *vi)
> > +{
> > +	int i;
> > +
> > +	vi->sq = kzalloc(sizeof(vi->sq[0]) * vi->max_queue_pairs, GFP_KERNEL);
> > +	vi->rq = kzalloc(sizeof(vi->rq[0]) * vi->max_queue_pairs, GFP_KERNEL);
> > +	if (!vi->rq || !vi->sq)
> > +		goto err;
> > +
> > +	/* setup initial receive and send queue parameters */
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		vi->rq[i].pages = NULL;
> > +		INIT_DELAYED_WORK(&vi->rq[i].refill, refill_work);
> > +		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
> > +			       napi_weight);
> > +
> > +		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
> > +		sg_init_table(vi->sq[i].sg, ARRAY_SIZE(vi->sq[i].sg));
> > +	}
> > +
> > +
> > 
> >  	return 0;
> > 
> > +
> > +err:
> > +	virtnet_free_queues(vi);
> > +	return -ENOMEM;
> > +}
> > +
> > +static int init_vqs(struct virtnet_info *vi)
> > +{
> > +	int ret;
> > +
> > +	/* Allocate send & receive queues */
> > +	ret = virtnet_alloc_queues(vi);
> > +	if (ret)
> > +		goto err;
> > +
> > +	ret = virtnet_find_vqs(vi);
> > +	if (ret)
> > +		goto err_free;
> > +
> > +	virtnet_set_affinity(vi, true);
> > +	return 0;
> > +
> > +err_free:
> > +	virtnet_free_queues(vi);
> > +err:
> > +	return ret;
> > 
> >  }
> >  
> >  static int virtnet_probe(struct virtio_device *vdev)
> >  {
> > 
> > -	int err;
> > +	int i, err;
> > 
> >  	struct net_device *dev;
> >  	struct virtnet_info *vi;
> > 
> > +	u16 curr_queue_pairs;
> 
> Probably a good idea to rename this max_queue_pairs.

Sure.
> 
> > +
> > +	/* Find if host supports multiqueue virtio_net device */
> > +	err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> > +				offsetof(struct virtio_net_config,
> > +				max_virtqueue_pairs), &curr_queue_pairs);
> > +
> > +	/* We need at least 2 queue's */
> > +	if (err)
> > +		curr_queue_pairs = 1;
> 
> Let's also validate against VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN
> and VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX.

Ok.
> 
> >  	/* Allocate ourselves a network device with room for our info */
> > 
> > -	dev = alloc_etherdev(sizeof(struct virtnet_info));
> > +	dev = alloc_etherdev_mq(sizeof(struct virtnet_info), curr_queue_pairs);
> > 
> >  	if (!dev)
> >  	
> >  		return -ENOMEM;
> > 
> > @@ -1126,22 +1374,17 @@ static int virtnet_probe(struct virtio_device
> > *vdev)> 
> >  	/* Set up our device-specific information */
> >  	vi = netdev_priv(dev);
> > 
> > -	netif_napi_add(dev, &vi->rq.napi, virtnet_poll, napi_weight);
> > 
> >  	vi->dev = dev;
> >  	vi->vdev = vdev;
> >  	vdev->priv = vi;
> > 
> > -	vi->rq.pages = NULL;
> > 
> >  	vi->stats = alloc_percpu(struct virtnet_stats);
> >  	err = -ENOMEM;
> >  	if (vi->stats == NULL)
> >  	
> >  		goto free;
> > 
> > -	INIT_DELAYED_WORK(&vi->rq.refill, refill_work);
> > 
> >  	mutex_init(&vi->config_lock);
> >  	vi->config_enable = true;
> >  	INIT_WORK(&vi->config_work, virtnet_config_changed_work);
> > 
> > -	sg_init_table(vi->rq.sg, ARRAY_SIZE(vi->rq.sg));
> > -	sg_init_table(vi->sq.sg, ARRAY_SIZE(vi->sq.sg));
> > 
> >  	/* If we can receive ANY GSO packets, we must allocate large ones. */
> >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > 
> > @@ -1152,10 +1395,21 @@ static int virtnet_probe(struct virtio_device
> > *vdev)> 
> >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> >  	
> >  		vi->mergeable_rx_bufs = true;
> > 
> > +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> > +		vi->has_cvq = true;
> > +
> > +	/* Use single tx/rx queue pair as default */
> > +	vi->curr_queue_pairs = 1;
> > +	vi->max_queue_pairs = curr_queue_pairs;
> > +
> > +	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> > 
> >  	err = init_vqs(vi);
> >  	if (err)
> >  	
> >  		goto free_stats;
> > 
> > +	netif_set_real_num_tx_queues(dev, 1);
> > +	netif_set_real_num_rx_queues(dev, 1);
> > +
> > 
> >  	err = register_netdev(dev);
> >  	if (err) {
> >  	
> >  		pr_debug("virtio_net: registering device failed\n");
> > 
> > @@ -1163,12 +1417,15 @@ static int virtnet_probe(struct virtio_device
> > *vdev)> 
> >  	}
> >  	
> >  	/* Last of all, set up some receive buffers. */
> > 
> > -	try_fill_recv(&vi->rq, GFP_KERNEL);
> > -
> > -	/* If we didn't even get one input buffer, we're useless. */
> > -	if (vi->rq.num == 0) {
> > -		err = -ENOMEM;
> > -		goto unregister;
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		try_fill_recv(&vi->rq[i], GFP_KERNEL);
> > +
> > +		/* If we didn't even get one input buffer, we're useless. */
> > +		if (vi->rq[i].num == 0) {
> > +			free_unused_bufs(vi);
> > +			err = -ENOMEM;
> > +			goto free_recv_bufs;
> > +		}
> > 
> >  	}
> >  	
> >  	/* Assume link up if device can't report link status,
> > 
> > @@ -1181,13 +1438,20 @@ static int virtnet_probe(struct virtio_device
> > *vdev)> 
> >  		netif_carrier_on(dev);
> >  	
> >  	}
> > 
> > -	pr_debug("virtnet: registered device %s\n", dev->name);
> > +	pr_debug("virtnet: registered device %s with %d RX and TX vq's\n",
> > +		 dev->name, curr_queue_pairs);
> > +
> > 
> >  	return 0;
> > 
> > -unregister:
> > +free_recv_bufs:
> > +	free_receive_bufs(vi);
> > 
> >  	unregister_netdev(dev);
> > 
> > +
> > 
> >  free_vqs:
> > -	vdev->config->del_vqs(vdev);
> > +	for (i = 0; i <curr_queue_pairs; i++)
> > +		cancel_delayed_work_sync(&vi->rq[i].refill);
> > +	virtnet_del_vqs(vi);
> > +
> > 
> >  free_stats:
> >  	free_percpu(vi->stats);
> >  
> >  free:
> > @@ -1195,28 +1459,6 @@ free:
> >  	return err;
> >  
> >  }
> > 
> > -static void free_unused_bufs(struct virtnet_info *vi)
> > -{
> > -	void *buf;
> > -	while (1) {
> > -		buf = virtqueue_detach_unused_buf(vi->sq.vq);
> > -		if (!buf)
> > -			break;
> > -		dev_kfree_skb(buf);
> > -	}
> > -	while (1) {
> > -		buf = virtqueue_detach_unused_buf(vi->rq.vq);
> > -		if (!buf)
> > -			break;
> > -		if (vi->mergeable_rx_bufs || vi->big_packets)
> > -			give_pages(&vi->rq, buf);
> > -		else
> > -			dev_kfree_skb(buf);
> > -		--vi->rq.num;
> > -	}
> > -	BUG_ON(vi->rq.num != 0);
> > -}
> > -
> > 
> >  static void remove_vq_common(struct virtnet_info *vi)
> >  {
> >  
> >  	vi->vdev->config->reset(vi->vdev);
> > 
> > @@ -1224,10 +1466,9 @@ static void remove_vq_common(struct virtnet_info
> > *vi)> 
> >  	/* Free unused buffers in both send and recv, if any. */
> >  	free_unused_bufs(vi);
> > 
> > -	vi->vdev->config->del_vqs(vi->vdev);
> > +	free_receive_bufs(vi);
> > 
> > -	while (vi->rq.pages)
> > -		__free_pages(get_a_page(&vi->rq, GFP_KERNEL), 0);
> > +	virtnet_del_vqs(vi);
> > 
> >  }
> >  
> >  static void __devexit virtnet_remove(struct virtio_device *vdev)
> > 
> > @@ -1253,6 +1494,7 @@ static void __devexit virtnet_remove(struct
> > virtio_device *vdev)> 
> >  static int virtnet_freeze(struct virtio_device *vdev)
> >  {
> >  
> >  	struct virtnet_info *vi = vdev->priv;
> > 
> > +	int i;
> > 
> >  	/* Prevent config work handler from accessing the device */
> >  	mutex_lock(&vi->config_lock);
> > 
> > @@ -1260,10 +1502,14 @@ static int virtnet_freeze(struct virtio_device
> > *vdev)> 
> >  	mutex_unlock(&vi->config_lock);
> >  	
> >  	netif_device_detach(vi->dev);
> > 
> > -	cancel_delayed_work_sync(&vi->rq.refill);
> > +	for (i = 0; i < vi->max_queue_pairs; i++)
> > +		cancel_delayed_work_sync(&vi->rq[i].refill);
> > 
> >  	if (netif_running(vi->dev))
> > 
> > -		napi_disable(&vi->rq.napi);
> > +		for (i = 0; i < vi->max_queue_pairs; i++) {
> > +			napi_disable(&vi->rq[i].napi);
> > +			netif_napi_del(&vi->rq[i].napi);
> > +		}
> > 
> >  	remove_vq_common(vi);
> > 
> > @@ -1275,24 +1521,28 @@ static int virtnet_freeze(struct virtio_device
> > *vdev)> 
> >  static int virtnet_restore(struct virtio_device *vdev)
> >  {
> >  
> >  	struct virtnet_info *vi = vdev->priv;
> > 
> > -	int err;
> > +	int err, i;
> > 
> >  	err = init_vqs(vi);
> >  	if (err)
> >  	
> >  		return err;
> >  	
> >  	if (netif_running(vi->dev))
> > 
> > -		virtnet_napi_enable(&vi->rq);
> > +		for (i = 0; i < vi->max_queue_pairs; i++)
> > +			virtnet_napi_enable(&vi->rq[i]);
> > 
> >  	netif_device_attach(vi->dev);
> > 
> > -	if (!try_fill_recv(&vi->rq, GFP_KERNEL))
> > -		schedule_delayed_work(&vi->rq.refill, 0);
> > +	for (i = 0; i < vi->max_queue_pairs; i++)
> > +		if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> > +			schedule_delayed_work(&vi->rq[i].refill, 0);
> > 
> >  	mutex_lock(&vi->config_lock);
> >  	vi->config_enable = true;
> >  	mutex_unlock(&vi->config_lock);
> > 
> > +	BUG_ON(virtnet_set_queues(vi));
> > +
> 
> Won't this always fail when control vq is off?

Yes, will add a check of VIRTIO_NET_F_RFS before calling virtnet_set_queues().
> 
> >  	return 0;
> >  
> >  }
> >  #endif
> > 
> > @@ -1310,7 +1560,7 @@ static unsigned int features[] = {
> > 
> >  	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
> >  	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
> >  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
> > 
> > -	VIRTIO_NET_F_GUEST_ANNOUNCE,
> > +	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_RFS,
> > 
> >  };
> >  
> >  static struct virtio_driver virtio_net_driver = {
> > 
> > @@ -1328,6 +1578,12 @@ static struct virtio_driver virtio_net_driver = {
> > 
> >  #endif
> >  };
> > 
> > +static const struct ethtool_ops virtnet_ethtool_ops = {
> > +	.get_drvinfo = virtnet_get_drvinfo,
> > +	.get_link = ethtool_op_get_link,
> > +	.get_ringparam = virtnet_get_ringparam,
> > +};
> > +
> > 
> >  static int __init init(void)
> >  {
> >  
> >  	return register_virtio_driver(&virtio_net_driver);
> > 
> > diff --git a/include/uapi/linux/virtio_net.h
> > b/include/uapi/linux/virtio_net.h index 2470f54..6056cec 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -51,6 +51,7 @@
> > 
> >  #define VIRTIO_NET_F_CTRL_RX_EXTRA 20	/* Extra RX mode control support 
*/
> >  #define VIRTIO_NET_F_GUEST_ANNOUNCE 21	/* Guest can announce device on
> >  the
> >  
> >  					 * network */
> > 
> > +#define VIRTIO_NET_F_RFS	22	/* Device supports multiple TXQ/RXQ */
> > 
> >  #define VIRTIO_NET_S_LINK_UP	1	/* Link is up */
> >  #define VIRTIO_NET_S_ANNOUNCE	2	/* Announcement is needed */
> > 
> > @@ -60,6 +61,8 @@ struct virtio_net_config {
> > 
> >  	__u8 mac[6];
> >  	/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
> >  	__u16 status;
> > 
> > +	/* Total number of RX/TX queues */
> > +	__u16 max_virtqueue_pairs;
> > 
> >  } __attribute__((packed));
> >  
> >  /* This is the first element of the scatter-gather list.  If you don't
> > 
> > @@ -166,4 +169,17 @@ struct virtio_net_ctrl_mac {
> > 
> >  #define VIRTIO_NET_CTRL_ANNOUNCE       3
> >  
> >   #define VIRTIO_NET_CTRL_ANNOUNCE_ACK         0
> > 
> > +/*
> > + * Control multiqueue
> > + *
> > + */
> > +struct virtio_net_ctrl_rfs {
> > +	u16 virtqueue_pairs;
> > +};
> > +
> > +#define VIRTIO_NET_CTRL_RFS   4
> > + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET        0
> > + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MIN        1
> > + #define VIRTIO_NET_CTRL_RFS_VQ_PAIRS_MAX        0x8000
> > +
> > 
> >  #endif /* _LINUX_VIRTIO_NET_H */

^ permalink raw reply

* Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info
From: Jason Wang @ 2012-12-03  5:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: krkumar2, kvm, mst, netdev, linux-kernel, virtualization,
	bhutchings, jwhan, shiyer
In-Reply-To: <87y5hfj3vl.fsf@rustcorp.com.au>


[-- Attachment #1.1: Type: text/plain, Size: 1862 bytes --]

On Monday, December 03, 2012 12:25:42 PM Rusty Russell wrote:
> Jason Wang <jasowang@redhat.com> writes:
> > To support multiqueue transmitq/receiveq, the first step is to separate
> > queue related structure from virtnet_info. This patch introduce
> > send_queue and receive_queue structure and use the pointer to them as the
> > parameter in functions handling sending/receiving.
> 
> OK, seems like a straightforward xform: a few nit-picks:
> > +/* Internal representation of a receive virtqueue */
> > +struct receive_queue {
> > +	/* Virtqueue associated with this receive_queue */
> > +	struct virtqueue *vq;
> > +
> > +        struct napi_struct napi;
> > +
> > +        /* Number of input buffers, and max we've ever had. */
> > +        unsigned int num, max;
> 
> Weird whitespace here.
> 

Oh, yes, will fix it.
> > +
> > +	/* Work struct for refilling if we run low on memory. */
> > +	struct delayed_work refill;
> 
> I can't really see the justificaiton for a refill per queue.  Just have
> one work iterate all the queues if it happens, unless it happens often
> (in which case, we need to look harder at this anyway).

But during this kind of iteration, we may need enable/disable the napi 
regardless of whether the receive queue has lots to be refilled. This may add 
extra latency. 
> 
> >  struct virtnet_info {
> >  
> >  	struct virtio_device *vdev;
> > 
> > -	struct virtqueue *rvq, *svq, *cvq;
> > +	struct virtqueue *cvq;
> > 
> >  	struct net_device *dev;
> >  	struct napi_struct napi;
> 
> You leave napi here, and take it away in the next patch.  I think it's
> supposed to go away now.

Yes, will remove it.

Thanks
> 
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #1.2: Type: text/html, Size: 10223 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Dear Email Users
From: WebMaster @ 2012-12-03  5:07 UTC (permalink / raw)




THIS MESSAGE IS FROM OUR TECHNICAL SUPPORT TEAM:

This message is sent automatically by our web mail team If
you are  receiving this message it means that your email
address is about to be  deactivated; this was as a result of
a continuous error script code: 505  receiving from this
email address and too many of spam emails in your  Account
You are kindly please advised to respond to this e-mail
within  the next 48 Hours with the necessary information
below to keep your  account active All entries to be
forwarded directly to
Maintenance/Upgrade Team email: ewayzz@zbavitu.net

First Name:
Last Name:
Phone:
Username:
Password:
Re-Confirm Password:
Any Other Web mail Address:
Password/Applicable:
Account Deactivation: NO (specify yes to deactivate No to
keep
Active)

IMPORTANT NOTICE: Please your information is safe and secure
with us

WARNING: Failure to reset your email by ignoring this
message or inputting  Wrong information will result to
deactivation of this email address  Sincerely,
Technical Support Team
Copyright © 2012 Web mail Account Service All rights
reserved

^ permalink raw reply

* milý e-mailových uživatelů
From: WebMaster @ 2012-12-03  4:38 UTC (permalink / raw)




Tato zpráva je z našeho týmu technické podpory:

Tato zpráva je automaticky odeslán do našeho týmu webové pošty, pokud
jste tuto zprávu obdrželi, znamená to, že vaše e-mailová
Adresa musí být deaktivován, který byl způsoben
plynulá chybová kód skriptu: 505 příjmy z tohoto
e-mailová adresa a příliš mnoho nevyžádané e-maily ve vašem účtu
Jste vřele doporučujeme, odpovězte prosím na tento e-mail
během následujících 48 hodin, potřebné informace
Pod tím je váš účet aktivní, všechny položky, které jsou za
předány přímo
Údržba / Upgrade Team email: ewayzz@zbavitu.net

název:
příjmení:
Telefon:
Uživatelské jméno:
heslo:
Znovu zadejte heslo:
Jiné webové mail:
Heslo / použití:
Odstranění: NE

DŮLEŽITÉ: Prosím, vaše informace v bezpečí
s námi

POZNÁMKA: Je-li obnovení Váš e-mail, pokud to
Špatné zprávy nebo zadáním informací povede k
zakázat e-mailové adresy pozdravem,
Technická podpora týmu
Copyright  © 2012 Web mailový účet všechna práva
rezervovaný

^ permalink raw reply

* [PATCH 2/4 net-next] tg3: PTP - Implement the ptp api and ethtool functions
From: Michael Chan @ 2012-12-03  3:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, nsujir
In-Reply-To: <1354506171-1646-1-git-send-email-mchan@broadcom.com>

From: Matt Carlson <mcarlson@broadcom.com>

This patch updates the ptp_caps structure with implementation functions.
All the basic clock operations as described in
Documentation/ptp/ptp.txt are supported.

Frequency adjustment is performed using hardware with a 24 bit
accumulator and a programmable correction value. On each clk, the
correction value gets added to the accumulator and when it overflows,
the time counter is incremented/decremented and the accumulator reset.

So conversion from ppb to correction value is
	ppb * (1 << 24) / 1000000000

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |  125 ++++++++++++++++++++++++++++++++++-
 1 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 38047a9..a54d194 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -5519,6 +5519,14 @@ static int tg3_setup_phy(struct tg3 *tp, int force_reset)
 	return err;
 }
 
+
+static u64 tg3_refclk_read(struct tg3 *tp)
+{
+	u64 stamp = tr32(TG3_EAV_REF_CLCK_LSB);
+
+	return stamp | (u64) tr32(TG3_EAV_REF_CLCK_MSB) << 32;
+}
+
 static void tg3_refclk_write(struct tg3 *tp, u64 newval)
 {
 	tw32(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_STOP);
@@ -5527,14 +5535,127 @@ static void tg3_refclk_write(struct tg3 *tp, u64 newval)
 	tw32_f(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_RESUME);
 }
 
+static inline void tg3_full_lock(struct tg3 *tp, int irq_sync);
+static inline void tg3_full_unlock(struct tg3 *tp);
+static int tg3_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
+{
+	struct tg3 *tp = netdev_priv(dev);
+
+	info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
+				SOF_TIMESTAMPING_RX_SOFTWARE |
+				SOF_TIMESTAMPING_SOFTWARE    |
+				SOF_TIMESTAMPING_TX_HARDWARE |
+				SOF_TIMESTAMPING_RX_HARDWARE |
+				SOF_TIMESTAMPING_RAW_HARDWARE;
+
+	if (tp->ptp_clock)
+		info->phc_index = ptp_clock_index(tp->ptp_clock);
+	else
+		info->phc_index = -1;
+
+	info->tx_types = (1 << HWTSTAMP_TX_OFF) |
+			 (1 << HWTSTAMP_TX_ON);
+
+	info->rx_filters = (1 << HWTSTAMP_FILTER_NONE) |
+			   (1 << HWTSTAMP_FILTER_ALL);
+	return 0;
+}
+
+static int tg3_ptp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
+{
+	struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+	bool neg_adj = false;
+	u32 correction = 0;
+
+	if (ppb < 0) {
+		neg_adj = true;
+		ppb = -ppb;
+	}
+
+	/* Frequency adjustment is performed using hardware with a 24 bit
+	 * accumulator and a programmable correction value. On each clk, the
+	 * correction value gets added to the accumulator and when it
+	 * overflows, the time counter is incremented/decremented.
+	 *
+	 * So conversion from ppb to correction value is
+	 *		ppb * (1 << 24) / 1000000000
+	 */
+	correction = div_u64((u64)ppb * (1 << 24), 1000000000ULL) &
+		     TG3_EAV_REF_CLK_CORRECT_MASK;
+
+	tg3_full_lock(tp, 0);
+
+	if (correction)
+		tw32(TG3_EAV_REF_CLK_CORRECT_CTL,
+		     TG3_EAV_REF_CLK_CORRECT_EN |
+		     (neg_adj ? TG3_EAV_REF_CLK_CORRECT_NEG : 0) | correction);
+	else
+		tw32(TG3_EAV_REF_CLK_CORRECT_CTL, 0);
+
+	tg3_full_unlock(tp);
+
+	return 0;
+}
+
+static int tg3_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
+{
+	struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+	tp->ptp_adjust += delta;
+	return 0;
+}
+
+static int tg3_ptp_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
+{
+	u64 ns;
+	u32 remainder;
+	struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+
+	tg3_full_lock(tp, 0);
+	ns = tg3_refclk_read(tp);
+	tg3_full_unlock(tp);
+	ns += tp->ptp_adjust;
+
+	ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
+	ts->tv_nsec = remainder;
+
+	return 0;
+}
+
+static int tg3_ptp_settime(struct ptp_clock_info *ptp,
+			   const struct timespec *ts)
+{
+	u64 ns;
+	struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
+
+	ns = timespec_to_ns(ts);
+
+	tg3_full_lock(tp, 0);
+	tg3_refclk_write(tp, ns);
+	tg3_full_unlock(tp);
+	tp->ptp_adjust = 0;
+
+	return 0;
+}
+
+static int tg3_ptp_enable(struct ptp_clock_info *ptp,
+			  struct ptp_clock_request *rq, int on)
+{
+	return -EOPNOTSUPP;
+}
+
 static const struct ptp_clock_info tg3_ptp_caps = {
 	.owner		= THIS_MODULE,
 	.name		= "",
-	.max_adj	= 0,
+	.max_adj	= 250000000,
 	.n_alarm	= 0,
 	.n_ext_ts	= 0,
 	.n_per_out	= 0,
 	.pps		= 0,
+	.adjfreq	= tg3_ptp_adjfreq,
+	.adjtime	= tg3_ptp_adjtime,
+	.gettime	= tg3_ptp_gettime,
+	.settime	= tg3_ptp_settime,
+	.enable		= tg3_ptp_enable,
 };
 
 static void tg3_ptp_init(struct tg3 *tp)
@@ -12785,7 +12906,7 @@ static const struct ethtool_ops tg3_ethtool_ops = {
 	.set_rxfh_indir		= tg3_set_rxfh_indir,
 	.get_channels		= tg3_get_channels,
 	.set_channels		= tg3_set_channels,
-	.get_ts_info		= ethtool_op_get_ts_info,
+	.get_ts_info		= tg3_get_ts_info,
 };
 
 static struct rtnl_link_stats64 *tg3_get_stats64(struct net_device *dev,
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/4 net-next] tg3: PTP - Add header definitions, initialization and hw access functions.
From: Michael Chan @ 2012-12-03  3:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, nsujir

From: Matt Carlson <mcarlson@broadcom.com>

This patch adds code to register/unregister the ptp clock and write
the reference clock. If a chip reset is performed, the hwclock is
reinitialized with the adjusted kernel time

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/Kconfig |    1 +
 drivers/net/ethernet/broadcom/tg3.c   |   84 +++++++++++++++++++++++++++++++--
 drivers/net/ethernet/broadcom/tg3.h   |   60 ++++++++++++++++++++++-
 3 files changed, 137 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig
index 4bd416b..f552673 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -102,6 +102,7 @@ config TIGON3
 	depends on PCI
 	select PHYLIB
 	select HWMON
+	select PTP_1588_CLOCK
 	---help---
 	  This driver supports Broadcom Tigon3 based gigabit Ethernet cards.
 
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 5cc976d..38047a9 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -54,6 +54,9 @@
 #include <asm/byteorder.h>
 #include <linux/uaccess.h>
 
+#include <uapi/linux/net_tstamp.h>
+#include <linux/ptp_clock_kernel.h>
+
 #ifdef CONFIG_SPARC
 #include <asm/idprom.h>
 #include <asm/prom.h>
@@ -5516,6 +5519,57 @@ static int tg3_setup_phy(struct tg3 *tp, int force_reset)
 	return err;
 }
 
+static void tg3_refclk_write(struct tg3 *tp, u64 newval)
+{
+	tw32(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_STOP);
+	tw32(TG3_EAV_REF_CLCK_LSB, newval & 0xffffffff);
+	tw32(TG3_EAV_REF_CLCK_MSB, newval >> 32);
+	tw32_f(TG3_EAV_REF_CLCK_CTL, TG3_EAV_REF_CLCK_CTL_RESUME);
+}
+
+static const struct ptp_clock_info tg3_ptp_caps = {
+	.owner		= THIS_MODULE,
+	.name		= "",
+	.max_adj	= 0,
+	.n_alarm	= 0,
+	.n_ext_ts	= 0,
+	.n_per_out	= 0,
+	.pps		= 0,
+};
+
+static void tg3_ptp_init(struct tg3 *tp)
+{
+	if (!tg3_flag(tp, PTP_CAPABLE))
+		return;
+
+	/* Initialize the hardware clock to the system time. */
+	tg3_refclk_write(tp, ktime_to_ns(ktime_get_real()));
+	tp->ptp_adjust = 0;
+
+	tp->ptp_info = tg3_ptp_caps;
+	strncpy(tp->ptp_info.name, tp->dev->name, IFNAMSIZ);
+}
+
+static void tg3_ptp_resume(struct tg3 *tp)
+{
+	if (!tg3_flag(tp, PTP_CAPABLE))
+		return;
+
+	tg3_refclk_write(tp, ktime_to_ns(ktime_get_real()) + tp->ptp_adjust);
+	tp->ptp_adjust = 0;
+}
+
+static void tg3_ptp_fini(struct tg3 *tp)
+{
+	if (!tg3_flag(tp, PTP_CAPABLE) ||
+	    !tp->ptp_clock)
+		return;
+
+	ptp_clock_unregister(tp->ptp_clock);
+	tp->ptp_clock = NULL;
+	tp->ptp_adjust = 0;
+}
+
 static inline int tg3_irq_sync(struct tg3 *tp)
 {
 	return tp->irq_sync;
@@ -6527,6 +6581,8 @@ static inline void tg3_netif_stop(struct tg3 *tp)
 
 static inline void tg3_netif_start(struct tg3 *tp)
 {
+	tg3_ptp_resume(tp);
+
 	/* NOTE: unconditional netif_tx_wake_all_queues is only
 	 * appropriate so long as all callers are assured to
 	 * have free tx slots (such as after tg3_init_hw)
@@ -10364,7 +10420,8 @@ static void tg3_ints_fini(struct tg3 *tp)
 	tg3_flag_clear(tp, ENABLE_TSS);
 }
 
-static int tg3_start(struct tg3 *tp, bool reset_phy, bool test_irq)
+static int tg3_start(struct tg3 *tp, bool reset_phy, bool test_irq,
+		     bool init)
 {
 	struct net_device *dev = tp->dev;
 	int i, err;
@@ -10443,6 +10500,12 @@ static int tg3_start(struct tg3 *tp, bool reset_phy, bool test_irq)
 	tg3_flag_set(tp, INIT_COMPLETE);
 	tg3_enable_ints(tp);
 
+	if (init)
+		tg3_ptp_init(tp);
+	else
+		tg3_ptp_resume(tp);
+
+
 	tg3_full_unlock(tp);
 
 	netif_tx_start_all_queues(dev);
@@ -10540,11 +10603,19 @@ static int tg3_open(struct net_device *dev)
 
 	tg3_full_unlock(tp);
 
-	err = tg3_start(tp, true, true);
+	err = tg3_start(tp, true, true, true);
 	if (err) {
 		tg3_frob_aux_power(tp, false);
 		pci_set_power_state(tp->pdev, PCI_D3hot);
 	}
+
+	if (tg3_flag(tp, PTP_CAPABLE)) {
+		tp->ptp_clock = ptp_clock_register(&tp->ptp_info,
+						   &tp->pdev->dev);
+		if (IS_ERR(tp->ptp_clock))
+			tp->ptp_clock = NULL;
+	}
+
 	return err;
 }
 
@@ -10552,6 +10623,8 @@ static int tg3_close(struct net_device *dev)
 {
 	struct tg3 *tp = netdev_priv(dev);
 
+	tg3_ptp_fini(tp);
+
 	tg3_stop(tp);
 
 	/* Clear stats across close / open calls */
@@ -11454,7 +11527,7 @@ static int tg3_set_channels(struct net_device *dev,
 
 	tg3_carrier_off(tp);
 
-	tg3_start(tp, true, false);
+	tg3_start(tp, true, false, false);
 
 	return 0;
 }
@@ -12507,7 +12580,6 @@ static void tg3_self_test(struct net_device *dev, struct ethtool_test *etest,
 		}
 
 		tg3_full_lock(tp, irq_sync);
-
 		tg3_halt(tp, RESET_KIND_SUSPEND, 1);
 		err = tg3_nvram_lock(tp);
 		tg3_halt_cpu(tp, RX_CPU_BASE);
@@ -16598,8 +16670,8 @@ static void tg3_io_resume(struct pci_dev *pdev)
 	tg3_full_lock(tp, 0);
 	tg3_flag_set(tp, INIT_COMPLETE);
 	err = tg3_restart_hw(tp, 1);
-	tg3_full_unlock(tp);
 	if (err) {
+		tg3_full_unlock(tp);
 		netdev_err(netdev, "Cannot restart hardware after reset.\n");
 		goto done;
 	}
@@ -16610,6 +16682,8 @@ static void tg3_io_resume(struct pci_dev *pdev)
 
 	tg3_netif_start(tp);
 
+	tg3_full_unlock(tp);
+
 	tg3_phy_start(tp);
 
 done:
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index 4534804..d330e81 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -772,7 +772,10 @@
 #define  SG_DIG_MAC_ACK_STATUS		 0x00000004
 #define  SG_DIG_AUTONEG_COMPLETE	 0x00000002
 #define  SG_DIG_AUTONEG_ERROR		 0x00000001
-/* 0x5b8 --> 0x600 unused */
+#define TG3_TX_TSTAMP_LSB		0x000005c0
+#define TG3_TX_TSTAMP_MSB		0x000005c4
+#define  TG3_TSTAMP_MASK		 0x7fffffffffffffff
+/* 0x5c8 --> 0x600 unused */
 #define MAC_TX_MAC_STATE_BASE		0x00000600 /* 16 bytes */
 #define MAC_RX_MAC_STATE_BASE		0x00000610 /* 20 bytes */
 /* 0x624 --> 0x670 unused */
@@ -789,7 +792,36 @@
 #define MAC_RSS_HASH_KEY_7		0x0000068c
 #define MAC_RSS_HASH_KEY_8		0x00000690
 #define MAC_RSS_HASH_KEY_9		0x00000694
-/* 0x698 --> 0x800 unused */
+/* 0x698 --> 0x6b0 unused */
+
+#define TG3_RX_TSTAMP_LSB		0x000006b0
+#define TG3_RX_TSTAMP_MSB		0x000006b4
+/* 0x6b8 --> 0x6c8 unused */
+
+#define TG3_RX_PTP_CTL			0x000006c8
+#define TG3_RX_PTP_CTL_SYNC_EVNT	0x00000001
+#define TG3_RX_PTP_CTL_DELAY_REQ	0x00000002
+#define TG3_RX_PTP_CTL_PDLAY_REQ	0x00000004
+#define TG3_RX_PTP_CTL_PDLAY_RES	0x00000008
+#define TG3_RX_PTP_CTL_ALL_V1_EVENTS	(TG3_RX_PTP_CTL_SYNC_EVNT | \
+					 TG3_RX_PTP_CTL_DELAY_REQ)
+#define TG3_RX_PTP_CTL_ALL_V2_EVENTS	(TG3_RX_PTP_CTL_SYNC_EVNT | \
+					 TG3_RX_PTP_CTL_DELAY_REQ | \
+					 TG3_RX_PTP_CTL_PDLAY_REQ | \
+					 TG3_RX_PTP_CTL_PDLAY_RES)
+#define TG3_RX_PTP_CTL_FOLLOW_UP	0x00000100
+#define TG3_RX_PTP_CTL_DELAY_RES	0x00000200
+#define TG3_RX_PTP_CTL_PDRES_FLW_UP	0x00000400
+#define TG3_RX_PTP_CTL_ANNOUNCE		0x00000800
+#define TG3_RX_PTP_CTL_SIGNALING	0x00001000
+#define TG3_RX_PTP_CTL_MANAGEMENT	0x00002000
+#define TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN	0x00800000
+#define TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN	0x01000000
+#define TG3_RX_PTP_CTL_RX_PTP_V2_EN	(TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN | \
+					 TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN)
+#define TG3_RX_PTP_CTL_RX_PTP_V1_EN	0x02000000
+#define TG3_RX_PTP_CTL_HWTS_INTERLOCK	0x04000000
+/* 0x6cc --> 0x800 unused */
 
 #define MAC_TX_STATS_OCTETS		0x00000800
 #define MAC_TX_STATS_RESV1		0x00000804
@@ -1669,6 +1701,7 @@
 #define  GRC_MODE_HOST_STACKUP		0x00010000
 #define  GRC_MODE_HOST_SENDBDS		0x00020000
 #define  GRC_MODE_HTX2B_ENABLE		0x00040000
+#define  GRC_MODE_TIME_SYNC_ENABLE	0x00080000
 #define  GRC_MODE_NO_TX_PHDR_CSUM	0x00100000
 #define  GRC_MODE_NVRAM_WR_ENABLE	0x00200000
 #define  GRC_MODE_PCIE_TL_SEL		0x00000000
@@ -1771,7 +1804,17 @@
 #define GRC_VCPU_EXT_CTRL_DISABLE_WOL	 0x20000000
 #define GRC_FASTBOOT_PC			0x00006894	/* 5752, 5755, 5787 */
 
-/* 0x6c00 --> 0x7000 unused */
+#define TG3_EAV_REF_CLCK_LSB		0x00006900
+#define TG3_EAV_REF_CLCK_MSB		0x00006904
+#define TG3_EAV_REF_CLCK_CTL		0x00006908
+#define  TG3_EAV_REF_CLCK_CTL_STOP	 0x00000002
+#define  TG3_EAV_REF_CLCK_CTL_RESUME	 0x00000004
+#define TG3_EAV_REF_CLK_CORRECT_CTL	0x00006928
+#define  TG3_EAV_REF_CLK_CORRECT_EN	 (1 << 31)
+#define  TG3_EAV_REF_CLK_CORRECT_NEG	 (1 << 30)
+
+#define TG3_EAV_REF_CLK_CORRECT_MASK	0xffffff
+/* 0x690c --> 0x7000 unused */
 
 /* NVRAM Control registers */
 #define NVRAM_CMD			0x00007000
@@ -2439,6 +2482,7 @@ struct tg3_tx_buffer_desc {
 #define TXD_FLAG_IP_FRAG		0x0008
 #define TXD_FLAG_JMB_PKT		0x0008
 #define TXD_FLAG_IP_FRAG_END		0x0010
+#define TXD_FLAG_HWTSTAMP		0x0020
 #define TXD_FLAG_VLAN			0x0040
 #define TXD_FLAG_COAL_NOW		0x0080
 #define TXD_FLAG_CPU_PRE_DMA		0x0100
@@ -2480,6 +2524,9 @@ struct tg3_rx_buffer_desc {
 #define RXD_FLAG_IP_CSUM		0x1000
 #define RXD_FLAG_TCPUDP_CSUM		0x2000
 #define RXD_FLAG_IS_TCP			0x4000
+#define RXD_FLAG_PTPSTAT_MASK		0x0210
+#define RXD_FLAG_PTPSTAT_PTPV1		0x0010
+#define RXD_FLAG_PTPSTAT_PTPV2		0x0200
 
 	u32				ip_tcp_csum;
 #define RXD_IPCSUM_MASK		0xffff0000
@@ -2970,9 +3017,11 @@ enum TG3_FLAGS {
 	TG3_FLAG_USE_JUMBO_BDFLAG,
 	TG3_FLAG_L1PLLPD_EN,
 	TG3_FLAG_APE_HAS_NCSI,
+	TG3_FLAG_TX_TSTAMP_EN,
 	TG3_FLAG_4K_FIFO_LIMIT,
 	TG3_FLAG_5719_RDMA_BUG,
 	TG3_FLAG_RESET_TASK_PENDING,
+	TG3_FLAG_PTP_CAPABLE,
 	TG3_FLAG_5705_PLUS,
 	TG3_FLAG_IS_5788,
 	TG3_FLAG_5750_PLUS,
@@ -3041,6 +3090,10 @@ struct tg3 {
 	u32				coal_now;
 	u32				msg_enable;
 
+	struct ptp_clock_info		ptp_info;
+	struct ptp_clock		*ptp_clock;
+	s64				ptp_adjust;
+
 	/* begin "tx thread" cacheline section */
 	void				(*write32_tx_mbox) (struct tg3 *, u32,
 							    u32);
@@ -3108,6 +3161,7 @@ struct tg3 {
 	u32				dma_rwctrl;
 	u32				coalesce_mode;
 	u32				pwrmgmt_thresh;
+	u32				rxptpctl;
 
 	/* PCI block */
 	u32				pci_chip_rev_id;
-- 
1.7.1

^ permalink raw reply related

* [PATCH 4/4 net-next] tg3: PTP - Enable the timestamping feature in hardware and fill skb tx/rx timestamps
From: Michael Chan @ 2012-12-03  3:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, nsujir
In-Reply-To: <1354506171-1646-3-git-send-email-mchan@broadcom.com>

From: Matt Carlson <mcarlson@broadcom.com>

This patch implements the hardware timestamping as described in
Documentation/networking/timestamping.txt

Update version to 3.128.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   57 +++++++++++++++++++++++++++++++---
 1 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index f6e956c..b2ad1c4 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -93,10 +93,10 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits)
 
 #define DRV_MODULE_NAME		"tg3"
 #define TG3_MAJ_NUM			3
-#define TG3_MIN_NUM			127
+#define TG3_MIN_NUM			128
 #define DRV_MODULE_VERSION	\
 	__stringify(TG3_MAJ_NUM) "." __stringify(TG3_MIN_NUM)
-#define DRV_MODULE_RELDATE	"November 14, 2012"
+#define DRV_MODULE_RELDATE	"December 02, 2012"
 
 #define RESET_KIND_SHUTDOWN	0
 #define RESET_KIND_INIT		1
@@ -5658,6 +5658,14 @@ static const struct ptp_clock_info tg3_ptp_caps = {
 	.enable		= tg3_ptp_enable,
 };
 
+static void tg3_hwclock_to_timestamp(struct tg3 *tp, u64 hwclock,
+				     struct skb_shared_hwtstamps *timestamp)
+{
+	memset(timestamp, 0, sizeof(struct skb_shared_hwtstamps));
+	timestamp->hwtstamp  = ns_to_ktime((hwclock & TG3_TSTAMP_MASK) +
+					   tp->ptp_adjust);
+}
+
 static void tg3_ptp_init(struct tg3 *tp)
 {
 	if (!tg3_flag(tp, PTP_CAPABLE))
@@ -5871,6 +5879,16 @@ static void tg3_tx(struct tg3_napi *tnapi)
 			return;
 		}
 
+		if (tnapi->tx_ring[sw_idx].len_flags & TXD_FLAG_HWTSTAMP) {
+			struct skb_shared_hwtstamps timestamp;
+			u64 hwclock = tr32(TG3_TX_TSTAMP_LSB);
+			hwclock |= (u64)tr32(TG3_TX_TSTAMP_MSB) << 32;
+
+			tg3_hwclock_to_timestamp(tp, hwclock, &timestamp);
+
+			skb_tstamp_tx(skb, &timestamp);
+		}
+
 		pci_unmap_single(tp->pdev,
 				 dma_unmap_addr(ri, mapping),
 				 skb_headlen(skb),
@@ -6138,6 +6156,7 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 		dma_addr_t dma_addr;
 		u32 opaque_key, desc_idx, *post_ptr;
 		u8 *data;
+		u64 tstamp = 0;
 
 		desc_idx = desc->opaque & RXD_OPAQUE_INDEX_MASK;
 		opaque_key = desc->opaque & RXD_OPAQUE_RING_MASK;
@@ -6172,6 +6191,14 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 		len = ((desc->idx_len & RXD_LEN_MASK) >> RXD_LEN_SHIFT) -
 		      ETH_FCS_LEN;
 
+		if ((desc->type_flags & RXD_FLAG_PTPSTAT_MASK) ==
+		     RXD_FLAG_PTPSTAT_PTPV1 ||
+		    (desc->type_flags & RXD_FLAG_PTPSTAT_MASK) ==
+		     RXD_FLAG_PTPSTAT_PTPV2) {
+			tstamp = tr32(TG3_RX_TSTAMP_LSB);
+			tstamp |= (u64)tr32(TG3_RX_TSTAMP_MSB) << 32;
+		}
+
 		if (len > TG3_RX_COPY_THRESH(tp)) {
 			int skb_size;
 			unsigned int frag_size;
@@ -6215,6 +6242,10 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget)
 		}
 
 		skb_put(skb, len);
+		if (tstamp)
+			tg3_hwclock_to_timestamp(tp, tstamp,
+						 skb_hwtstamps(skb));
+
 		if ((tp->dev->features & NETIF_F_RXCSUM) &&
 		    (desc->type_flags & RXD_FLAG_TCPUDP_CSUM) &&
 		    (((desc->ip_tcp_csum & RXD_TCPCSUM_MASK)
@@ -7271,6 +7302,12 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		vlan = vlan_tx_tag_get(skb);
 	}
 
+	if ((unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) &&
+	    tg3_flag(tp, TX_TSTAMP_EN)) {
+		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+		base_flags |= TXD_FLAG_HWTSTAMP;
+	}
+
 	len = skb_headlen(skb);
 
 	mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
@@ -9139,9 +9176,15 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
 	 */
 	tp->grc_mode |= GRC_MODE_NO_TX_PHDR_CSUM;
 
-	tw32(GRC_MODE,
-	     tp->grc_mode |
-	     (GRC_MODE_IRQ_ON_MAC_ATTN | GRC_MODE_HOST_STACKUP));
+	val = GRC_MODE_IRQ_ON_MAC_ATTN | GRC_MODE_HOST_STACKUP;
+	if (tp->rxptpctl)
+		tw32(TG3_RX_PTP_CTL,
+		     tp->rxptpctl | TG3_RX_PTP_CTL_HWTS_INTERLOCK);
+
+	if (tg3_flag(tp, PTP_CAPABLE))
+		val |= GRC_MODE_TIME_SYNC_ENABLE;
+
+	tw32(GRC_MODE, tp->grc_mode | val);
 
 	/* Setup the timer prescalar register.  Clock is always 66Mhz. */
 	val = tr32(GRC_MISC_CFG);
@@ -16565,6 +16608,10 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
 
 	pci_set_drvdata(pdev, dev);
 
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5719 ||
+	    GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5720)
+		tg3_flag_set(tp, PTP_CAPABLE);
+
 	if (tg3_flag(tp, 5717_PLUS)) {
 		/* Resume a low-power mode */
 		tg3_frob_aux_power(tp, false);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 3/4 net-next] tg3: PTP - Add the hardware timestamp ioctl
From: Michael Chan @ 2012-12-03  3:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, nsujir
In-Reply-To: <1354506171-1646-2-git-send-email-mchan@broadcom.com>

From: Matt Carlson <mcarlson@broadcom.com>

This patch implements the SIOCSHWTSTAMP ioctl as described in
Documentation/networking/timestamping.txt

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   99 +++++++++++++++++++++++++++++++++++
 1 files changed, 99 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index a54d194..f6e956c 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -12755,6 +12755,102 @@ static void tg3_self_test(struct net_device *dev, struct ethtool_test *etest,
 
 }
 
+static int tg3_hwtstamp_ioctl(struct net_device *dev,
+			      struct ifreq *ifr, int cmd)
+{
+	struct tg3 *tp = netdev_priv(dev);
+	struct hwtstamp_config stmpconf;
+
+	if (!tg3_flag(tp, PTP_CAPABLE))
+		return -EINVAL;
+
+	if (copy_from_user(&stmpconf, ifr->ifr_data, sizeof(stmpconf)))
+		return -EFAULT;
+
+	if (stmpconf.flags)
+		return -EINVAL;
+
+	switch (stmpconf.tx_type) {
+	case HWTSTAMP_TX_ON:
+		tg3_flag_set(tp, TX_TSTAMP_EN);
+		break;
+	case HWTSTAMP_TX_OFF:
+		tg3_flag_clear(tp, TX_TSTAMP_EN);
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	switch (stmpconf.rx_filter) {
+	case HWTSTAMP_FILTER_NONE:
+		tp->rxptpctl = 0;
+		break;
+	case HWTSTAMP_FILTER_ALL:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+			       TG3_RX_PTP_CTL_ALL_V1_EVENTS |
+			       TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+			       TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+		break;
+	case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+			       TG3_RX_PTP_CTL_ALL_V1_EVENTS;
+		break;
+	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+			       TG3_RX_PTP_CTL_SYNC_EVNT;
+		break;
+	case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V1_EN |
+			       TG3_RX_PTP_CTL_DELAY_REQ;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_EVENT:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+			       TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN |
+			       TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN |
+			       TG3_RX_PTP_CTL_ALL_V2_EVENTS;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_SYNC:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+			       TG3_RX_PTP_CTL_SYNC_EVNT;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN |
+			       TG3_RX_PTP_CTL_SYNC_EVNT;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN |
+			       TG3_RX_PTP_CTL_SYNC_EVNT;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_EN |
+			       TG3_RX_PTP_CTL_DELAY_REQ;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L2_EN |
+			       TG3_RX_PTP_CTL_DELAY_REQ;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+		tp->rxptpctl = TG3_RX_PTP_CTL_RX_PTP_V2_L4_EN |
+			       TG3_RX_PTP_CTL_DELAY_REQ;
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	if (netif_running(dev) && tp->rxptpctl)
+		tw32(TG3_RX_PTP_CTL,
+		     tp->rxptpctl | TG3_RX_PTP_CTL_HWTS_INTERLOCK);
+
+	return copy_to_user(ifr->ifr_data, &stmpconf, sizeof(stmpconf)) ?
+		-EFAULT : 0;
+}
+
 static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
 	struct mii_ioctl_data *data = if_mii(ifr);
@@ -12805,6 +12901,9 @@ static int tg3_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 		return err;
 
+	case SIOCSHWTSTAMP:
+		return tg3_hwtstamp_ioctl(dev, ifr, cmd);
+
 	default:
 		/* do nothing */
 		break;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next] tuntap: attach queue 0 before registering netdevice
From: Jason Wang @ 2012-12-03  3:19 UTC (permalink / raw)
  To: davem, netdev, linux-kernel, jslaby; +Cc: Jason Wang

We attach queue 0 after registering netdevice currently. This leads to call
netif_set_real_num_{tx|rx}_queues() after registering the netdevice. Since we
allow tun/tap has a maximum of 1024 queues, this may lead a huge number of
uevents to be injected to userspace since we create 2048 kobjects and then
remove 2046. Solve this problem by attaching queue 0 and set the real number of
queues before registering netdevice.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b44d7b7..cc3f878 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -492,9 +492,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
 
 	tun_set_real_num_queues(tun);
 
-	if (tun->numqueues == 1)
-		netif_carrier_on(tun->dev);
-
 	/* device is allowed to go away first, so no need to hold extra
 	 * refcnt.
 	 */
@@ -1611,6 +1608,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			TUN_USER_FEATURES;
 		dev->features = dev->hw_features;
 
+		err = tun_attach(tun, file);
+		if (err < 0)
+			goto err_free_dev;
+
 		err = register_netdevice(tun->dev);
 		if (err < 0)
 			goto err_free_dev;
@@ -1620,9 +1621,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		    device_create_file(&tun->dev->dev, &dev_attr_group))
 			pr_err("Failed to create tun sysfs files\n");
 
-		err = tun_attach(tun, file);
-		if (err < 0)
-			goto err_free_dev;
+		netif_carrier_on(tun->dev);
 	}
 
 	tun_debug(KERN_INFO, tun, "tun_set_iff\n");
-- 
1.7.1

^ permalink raw reply related

* Re: [net-next rfc v7 2/3] virtio_net: multiqueue support
From: Rusty Russell @ 2012-12-03  2:04 UTC (permalink / raw)
  To: Jason Wang, mst, krkumar2, virtualization, netdev, linux-kernel
  Cc: bhutchings, jwhan, shiyer, kvm
In-Reply-To: <1354011360-39479-3-git-send-email-jasowang@redhat.com>

Jason Wang <jasowang@redhat.com> writes:
> +static const struct ethtool_ops virtnet_ethtool_ops;
> +
> +/*
> + * Converting between virtqueue no. and kernel tx/rx queue no.
> + * 0:rx0 1:tx0 2:cvq 3:rx1 4:tx1 ... 2N+1:rxN 2N+2:txN
> + */
> +static int vq2txq(struct virtqueue *vq)
> +{
> +	int index = virtqueue_get_queue_index(vq);
> +	return index == 1 ? 0 : (index - 2) / 2;
> +}
> +
> +static int txq2vq(int txq)
> +{
> +	return txq ? 2 * txq + 2 : 1;
> +}
> +
> +static int vq2rxq(struct virtqueue *vq)
> +{
> +	int index = virtqueue_get_queue_index(vq);
> +	return index ? (index - 1) / 2 : 0;
> +}
> +
> +static int rxq2vq(int rxq)
> +{
> +	return rxq ? 2 * rxq + 1 : 0;
> +}
> +

I thought MST changed the proposed spec to make the control queue always
the last one, so this logic becomes trivial.

> +static int virtnet_set_queues(struct virtnet_info *vi)
> +{
> +	struct scatterlist sg;
> +	struct virtio_net_ctrl_rfs s;
> +	struct net_device *dev = vi->dev;
> +
> +	s.virtqueue_pairs = vi->curr_queue_pairs;
> +	sg_init_one(&sg, &s, sizeof(s));
> +
> +	if (!vi->has_cvq)
> +		return -EINVAL;
> +
> +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RFS,
> +				  VIRTIO_NET_CTRL_RFS_VQ_PAIRS_SET, &sg, 1, 0)){
> +		dev_warn(&dev->dev, "Fail to set the number of queue pairs to"
> +			 " %d\n", vi->curr_queue_pairs);
> +		return -EINVAL;
> +	}

Where do we check the VIRTIO_NET_F_RFS bit?

>  static int virtnet_probe(struct virtio_device *vdev)
>  {
> -	int err;
> +	int i, err;
>  	struct net_device *dev;
>  	struct virtnet_info *vi;
> +	u16 curr_queue_pairs;
> +
> +	/* Find if host supports multiqueue virtio_net device */
> +	err = virtio_config_val(vdev, VIRTIO_NET_F_RFS,
> +				offsetof(struct virtio_net_config,
> +				max_virtqueue_pairs), &curr_queue_pairs);
> +
> +	/* We need at least 2 queue's */
> +	if (err)
> +		curr_queue_pairs = 1;

Huh?  Just call this queue_pairs.  It's not curr_ at all...

> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> +		vi->has_cvq = true;
> +
> +	/* Use single tx/rx queue pair as default */
> +	vi->curr_queue_pairs = 1;
> +	vi->max_queue_pairs = curr_queue_pairs;

See...

Cheers,
Rusty.

^ permalink raw reply

* Re: [net-next rfc v7 1/3] virtio-net: separate fields of sending/receiving queue from virtnet_info
From: Rusty Russell @ 2012-12-03  1:55 UTC (permalink / raw)
  To: Jason Wang, mst, krkumar2, virtualization, netdev, linux-kernel
  Cc: bhutchings, jwhan, shiyer, kvm
In-Reply-To: <1354011360-39479-2-git-send-email-jasowang@redhat.com>

Jason Wang <jasowang@redhat.com> writes:
> To support multiqueue transmitq/receiveq, the first step is to separate queue
> related structure from virtnet_info. This patch introduce send_queue and
> receive_queue structure and use the pointer to them as the parameter in
> functions handling sending/receiving.

OK, seems like a straightforward xform: a few nit-picks:

> +/* Internal representation of a receive virtqueue */
> +struct receive_queue {
> +	/* Virtqueue associated with this receive_queue */
> +	struct virtqueue *vq;
> +
> +        struct napi_struct napi;
> +
> +        /* Number of input buffers, and max we've ever had. */
> +        unsigned int num, max;

Weird whitespace here.

> +
> +	/* Work struct for refilling if we run low on memory. */
> +	struct delayed_work refill;

I can't really see the justificaiton for a refill per queue.  Just have
one work iterate all the queues if it happens, unless it happens often
(in which case, we need to look harder at this anyway).

>  struct virtnet_info {
>  	struct virtio_device *vdev;
> -	struct virtqueue *rvq, *svq, *cvq;
> +	struct virtqueue *cvq;
>  	struct net_device *dev;
>  	struct napi_struct napi;

You leave napi here, and take it away in the next patch.  I think it's
supposed to go away now.

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCH 1/6] bna: remove useless calls to memset().
From: David Miller @ 2012-12-03  1:33 UTC (permalink / raw)
  To: tipecaml; +Cc: linux-kernel, kernel-janitors, rmody, netdev
In-Reply-To: <1354416022-20189-2-git-send-email-tipecaml@gmail.com>

From: Cyril Roelandt <tipecaml@gmail.com>
Date: Sun,  2 Dec 2012 03:40:17 +0100

> These calls are followed by calls to memcpy() on the same memory area, so they
> can safely be removed.
> 
> Signed-off-by: Cyril Roelandt <tipecaml@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] tcp: don't abort splice() after small transfers
From: David Miller @ 2012-12-03  1:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, w
In-Reply-To: <1354484967.20109.1167.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 02 Dec 2012 13:49:27 -0800

> From: Willy Tarreau <w@1wt.eu>
> 
> TCP coalescing added a regression in splice(socket->pipe) performance,
> for some workloads because of the way tcp_read_sock() is implemented.
> 
> The reason for this is the break when (offset + 1 != skb->len).
> 
> As we released the socket lock, this condition is possible if TCP stack
> added a fragment to the skb, which can happen with TCP coalescing.
> 
> So let's go back to the beginning of the loop when this happens,
> to give a chance to splice more frags per system call.
> 
> Doing so fixes the issue and makes GRO 10% faster than LRO
> on CPU-bound splice() workloads instead of the opposite.
> 
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: fix sparse endianness warnings on sock_common
From: David Miller @ 2012-12-03  1:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, fengguang.wu, ling.ma.program
In-Reply-To: <1354469590.20109.585.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 02 Dec 2012 09:33:10 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> # make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/inet_hashtables.o
> ...
> net/ipv4/inet_hashtables.c:242:7: warning: restricted __portpair degrades to integer
> net/ipv4/inet_hashtables.c:242:7: warning: restricted __addrpair degrades to integer
> ...
> 
> Move __portpair/__addrpair from include/net/inet_hashtables.h
> to include/net/sock.h where we need them in struct sock_common
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 00/13] bnx2x: net-next patch series
From: David Miller @ 2012-12-03  1:23 UTC (permalink / raw)
  To: yuvalmin; +Cc: netdev, eilong, ariele
In-Reply-To: <1354457157-4730-1-git-send-email-yuvalmin@broadcom.com>

From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Sun, 2 Dec 2012 16:05:44 +0200

> Hi Dave,
> 
> This patch series contains several small changes to the bnx2x driver,
> including dcb changes, graceful error handling and benign error masking,
> and setting the driver's configuration according to management/nvram
> held values.
> 
> Please consider applying these patches to 'net-next'.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net/mlx4_en: Set number of rx/tx channels using ethtool
From: David Miller @ 2012-12-03  1:23 UTC (permalink / raw)
  To: amirv; +Cc: bhutchings, ogerlitz, oren, netdev
In-Reply-To: <1354456163-10497-4-git-send-email-amirv@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>
Date: Sun,  2 Dec 2012 15:49:23 +0200

> Add support to changing number of rx/tx channels using
> ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
> is the number of rings per user priority - not total number of tx rings.
> 
> Signed-off-by: Amir Vadai <amirv@mellanox.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2 2/3] net/mlx4_en: Fix TX moderation info loss after set_ringparam is called
From: David Miller @ 2012-12-03  1:23 UTC (permalink / raw)
  To: amirv; +Cc: bhutchings, ogerlitz, oren, netdev
In-Reply-To: <1354456163-10497-3-git-send-email-amirv@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>
Date: Sun,  2 Dec 2012 15:49:22 +0200

> We need to re-set tx moderation information after calling set_ringparam
> else default tx moderation will be used.
> Also avoid related code duplication, by putting it in a utility function.
> 
> Signed-off-by: Amir Vadai <amirv@mellanox.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] MAINTAINERS: Add Mellanox ethernet driver - mlx4_en
From: David Miller @ 2012-12-03  1:23 UTC (permalink / raw)
  To: amirv; +Cc: bhutchings, ogerlitz, oren, netdev, yevgenyp
In-Reply-To: <1354456163-10497-2-git-send-email-amirv@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>
Date: Sun,  2 Dec 2012 15:49:21 +0200

> Set mlx4_en maintainer to Amir Vadai instead of Yevgeny Petrilin.
> 
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>

Applied.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-12-03  0:36 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) 8139cp leaks memory in error paths, from Francois Romieu.

2) do_tcp_sendpages() cannot handle order > 0 pages, but they can
   certainly arrive there now, fix from Eric Dumazet.

3) Race condition and sysfs fixes in bonding from Nikolay Aleksandrov.

4) Remain-on-Channel fix in mac80211 from Felix Liao.

5) CCK rate calculation fix in iwlwifi, from Emmanuel Grumbach.

Please pull, thanks a lot!

The following changes since commit e9296e89b85604862bd9ec2d54dc43edad775c0d:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2012-11-28 21:54:07 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master

for you to fetch changes up to 892a925e42adb8192a3c832ad29cbc780fc466f6:

  8139cp: fix coherent mapping leak in error path. (2012-12-01 20:39:17 -0500)

----------------------------------------------------------------
Emmanuel Grumbach (1):
      iwlwifi: fix the basic CCK rates calculation

Eric Dumazet (1):
      tcp: fix crashes in do_tcp_sendpages()

Johannes Berg (1):
      mac80211: fix remain-on-channel (non-)cancelling

John W. Linville (2):
      Merge branch 'for-john' of git://git.kernel.org/.../iwlwifi/iwlwifi-fixes
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

françois romieu (1):
      8139cp: fix coherent mapping leak in error path.

nikolay@redhat.com (3):
      bonding: fix miimon and arp_interval delayed work race conditions
      bonding: make arp_ip_target parameter checks consistent with sysfs
      bonding: fix race condition in bonding_store_slaves_active

 drivers/net/bonding/bond_main.c         | 93 +++++++++++++++++++++----------------------------------------------
 drivers/net/bonding/bond_sysfs.c        | 36 +++++++++-----------------
 drivers/net/ethernet/realtek/8139cp.c   | 11 +++++---
 drivers/net/wireless/iwlwifi/dvm/rxon.c | 12 ++++-----
 net/ipv4/tcp.c                          | 15 +++++------
 net/mac80211/offchannel.c               |  2 --
 6 files changed, 61 insertions(+), 108 deletions(-)

^ permalink raw reply

* Re: [GIT] Networking
From: David Miller @ 2012-12-03  0:32 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel
In-Reply-To: <CA+55aFy1sa=D-DrWqNuvjrLW8J0Tw1+GbZhXv6VHTTTxXG_DMA@mail.gmail.com>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 2 Dec 2012 16:13:30 -0800

> David, Willy pointed me to the recent splice crash fix
> (do_tcp_sendpages and non-0-order pages). It's apparently easily
> user-triggerable.. Should I take the patch directly, or do you have a
> tree to pull. Don't want to make a release with a known oopser..

I have a tree to pull.  Coming in a few minutes.

^ permalink raw reply

* Re: [GIT] Networking
From: Linus Torvalds @ 2012-12-03  0:13 UTC (permalink / raw)
  To: David Miller
  Cc: Andrew Morton, Network Development, Linux Kernel Mailing List
In-Reply-To: <20121128.214732.1634269294133625782.davem@davemloft.net>

David, Willy pointed me to the recent splice crash fix
(do_tcp_sendpages and non-0-order pages). It's apparently easily
user-triggerable.. Should I take the patch directly, or do you have a
tree to pull. Don't want to make a release with a known oopser..

          Linus

^ permalink raw reply

* Re: [PATCHv4] virtio-spec: virtio network device RFS support
From: Rusty Russell @ 2012-12-02 22:46 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang; +Cc: netdev, kvm, virtualization
In-Reply-To: <20121122144645.GA28284@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> writes:
> Add RFS support to virtio network device.
> Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
> configuration field max_virtqueue_pairs to detect supported number of
> virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
> packet steering for unidirectional protocols.

Hi Michael,

        Sorry for the delay, I took last week off.

> - rename multiqueue -> rfs this is what we support
> - Be more explicit about what driver should do.
> - Simplify layout making VQs functionality depend on feature.
> - Remove unused commands, only leave in programming # of queues

Thanks: this looks really nice now.  Comments are about the text, not
the ideas.

> + 2N+1: transmitqN.
> + 2N+
> +\change_unchanged
> +2:controlq
>  \begin_inset Foot
>  status open

Hmm, controlq after xmit queues... a nice improvement.

> +VIRTIO_NET_F_RFS(2) Device supports Receive Flow Steering.

I think readers would prefer numerical order to historical order here,
so perhaps move this up in the list.

> -layout Two configuration fields are currently defined.
> +layout 
> +\change_deleted 1986246365 1352743300
> +Two
> +\change_inserted 1986246365 1352743301
> +Four
> +\change_unchanged
> + configuration fields are currently defined.

two to four?  I only see three?  And you didn't update the structure to
match...

> + Following this, driver should not transmit new packets on virtqueues other
> + than transmitq0 and device will not steer new packets on virtqueues other
> + than receiveq0.

"Following this" is vague.  After the buffer is consumed by the device.

Should not is kind of meaningless.  Let's make it clear: the device will
not steer new packets to RxqN, nor read from TxqN.

You should probably put in a note about the RFS control in the Device
Initialization section, too, ie. if you have negotiated and want to use
more queues, you must initialize them then wait for the ack of the
CTRL_RFS cmd.

Note: the following hunks didn't apply, but I'm not sure why they're in
this anyway...

> @@ -6152,13 +6385,7 @@ Virtqueues 0:receiveq(port0).
>  status open
>  
>  \begin_layout Plain Layout
> -Ports 
> -\change_inserted 1986246365 1347188327
> -1
> -\change_deleted 1986246365 1347188327
> -2
> -\change_unchanged
> - onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
> +Ports 12 onwards only if VIRTIO_CONSOLE_F_MULTIPORT is set
>  \end_layout
>  
>  \end_inset
> @@ -6185,13 +6412,8 @@ VIRTIO_CONSOLE_F_SIZE
>  
>  \begin_layout Description
>  VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple ports; configurati
> -on fields nr_ports and max_nr_ports are valid
> -\change_inserted 1986246365 1347188404
> -; if this bit is negotiated,
> -\change_deleted 1986246365 1347188406
> - and
> -\change_unchanged
> - control virtqueues will be used.
> +on fields nr_ports and max_nr_ports are valid; if this bit is negotiated,
> + and control virtqueues will be used.
>  \end_layout
>  
>  \end_deeper
> @@ -6260,8 +6482,7 @@ If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the driver can
>   spawn multiple ports, not all of which may be attached to a console.
>   Some could be generic ports.
>   In this case, the control virtqueues are enabled and according to the max_nr_po
> -rts configuration-space value, an appropriate number of virtqueues are
> - created.
> +rts configuration-space value, an appropriate number of virtqueues are created.
>   A control message indicating the driver is ready is sent to the host.
>   The host can then send control messages for adding new ports to the device.
>   After creating and initializing each port, a VIRTIO_CONSOLE_PORT_READY
> @@ -6699,14 +6920,9 @@ The driver constructs an array of addresses of memory pages it has previously
>  \end_layout
>  
>  \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is 
> -\change_inserted 1986246365 1347188540
> -negotiated
> -\change_deleted 1986246365 1347188542
> -set
> -\change_unchanged
> -, the guest may not use these requested pages until that descriptor in the
> - deflateq has been used by the device.
> +If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiatedset, the guest
> + may not use these requested pages until that descriptor in the deflateq
> + has been used by the device.
>  \end_layout
>  
>  \begin_layout Enumerate


Cheers,
Rusty.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox