Linux virtualization list
 help / color / mirror / Atom feed
* Re: Question regarding Virtio Console and Remoteproc
From: Amit Shah @ 2012-06-05  9:43 UTC (permalink / raw)
  To: Sjur BRENDELAND
  Cc: Linus Walleij, Arnd Bergmann,
	virtualization@lists.linux-foundation.org
In-Reply-To: <81C3A93C17462B4BBD7E272753C10579232F4FB338@EXDCVYMBSTM005.EQ1STM.local>

Hi Sjur,

On (Fri) 01 Jun 2012 [09:31:30], Sjur BRENDELAND wrote:
> Hi Amit and Rusty,
> 
> I've been looking into the possibility of using the Virtio Console
> Driver together with the remoteproc framework to communicate with
> ST-Ericsson modem over shared memory.
> 
> It seems like Virtio Console would be a good fit, except for a issue
> with buffer allocation. Due to HW limitations the STE-Modem cannot
> access kernel memory (no IOMMU and limited address range). Instead
> we have a designated shared memory region used for IPC. 
> 
> Due to this I cannot use kmalloc() for buffer allocation, but I
> have to allocate buffers from the memory region shared with the
> modem.
> 
> In remoteproc this is solved by using dma_alloc_coherent() for all
> memory to be shared with the modem. This works fine for me, because
> I can pass the IPC memory region to dma_declare_coherent_memory() 
> so dma_alloc_coherent() will allocate from this memory region.
> 
> I think I can solve this issue in Virtio Console by changing calls
> to kmalloc() to something like:
> 
> 	if (virtio_has_feature(vdev, VIRTIO_CONSOLE_USE_DMA_MEM)) {
> 		dma_addr_t dma;
> 		buf = dma_alloc_coherent(dev, size, &dma, GFP_KERNEL);
> 	} else
> 		buf = kmalloc(count, GFP_KERNEL);
> 
> I'd like to get the opinion from you virtualization folks on this!
> If you think it looks reasonable I might start cooking some patches...

I don't have a problem with this.

Thanks,
		Amit

^ permalink raw reply

* Re: [net-next RFC PATCH] virtio_net: collect satistics and export through ethtool
From: Michael S. Tsirkin @ 2012-06-05 10:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120605083841.11878.69056.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Tue, Jun 05, 2012 at 04:38:41PM +0800, Jason Wang wrote:
> Satistics counters is useful for debugging and performance optimization, so this
> patch lets virtio_net driver collect following and export them to userspace
> through "ethtool -S":
> 
> - number of packets sent/received
> - number of bytes sent/received
> - number of callbacks for tx/rx
> - number of kick for tx/rx
> - number of bytes/packets queued for tx
> 
> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were exposed
> like:
> 
> NIC statistics:
>      tx_bytes[0]: 2551
>      tx_packets[0]: 12
>      tx_kick[0]: 12
>      tx_callbacks[0]: 1
>      tx_queued_packets[0]: 12
>      tx_queued_bytes[0]: 3055
>      rx_bytes[0]: 0
>      rx_packets[0]: 0
>      rx_kick[0]: 0
>      rx_callbacks[0]: 0
>      tx_bytes[1]: 5575
>      tx_packets[1]: 37
>      tx_kick[1]: 38
>      tx_callbacks[1]: 0
>      tx_queued_packets[1]: 38
>      tx_queued_bytes[1]: 5217
>      rx_bytes[1]: 4175
>      rx_packets[1]: 25
>      rx_kick[1]: 1
>      rx_callbacks[1]: 16
>      tx_bytes: 8126
>      tx_packets: 49
>      tx_kick: 50
>      tx_callbacks: 1
>      tx_queued_packets: 50
>      tx_queued_bytes: 8272
>      rx_bytes: 4175
>      rx_packets: 25
>      rx_kick: 1
>      rx_callbacks: 16
> 
> TODO:
> 
> - more satistics
> - unitfy the ndo_get_stats64 and get_ethtool_stats
> - calculate the pending bytes/pkts
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |  130 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 127 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..7ab0cc1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -41,6 +41,10 @@ module_param(gso, bool, 0444);
>  #define VIRTNET_SEND_COMMAND_SG_MAX    2
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +#define VIRTNET_STAT_OFF(m) offsetof(struct virtnet_stats, m)
> +#define VIRTNET_STAT(stat, i) (*((u64 *)((char *)stat + \

What's going on? Why cast to char *?

> +			       virtnet_stats_str_attr[i].stat_offset)))

These are confusing unless you see what virtnet_stats_str_attr
is so please move them near that definition.

> +
>  struct virtnet_stats {
>  	struct u64_stats_sync syncp;
>  	u64 tx_bytes;
> @@ -48,8 +52,33 @@ struct virtnet_stats {
>  
>  	u64 rx_bytes;
>  	u64 rx_packets;
> +
> +	u64 tx_kick;
> +	u64 rx_kick;
> +	u64 tx_callbacks;
> +	u64 rx_callbacks;
> +	u64 tx_queued_packets;
> +	u64 tx_queued_bytes;
> +};

I have an idea (not a must): why don't we simply create an enum
enum virtnet_stats {
	VIRTNET_TX_KICK,
	VIRTNET_RX_KICK,
	....
	VIRTNET_MAX_STAT,
}


now stats can just do
	stats->data[VIRTNET_RX_KICK] instead of stats->rx_kick
which is not a big problem, but copying them in bulk
becomes straight-forward, no need for macros at all.

If we decide to do this, needs to be a separate patch,
then this one on top.

> +
> +static struct {

static const.

> +	char string[ETH_GSTRING_LEN];
> +	int stat_offset;
> +} virtnet_stats_str_attr[] = {
> +	{ "tx_bytes", VIRTNET_STAT_OFF(tx_bytes)},
> +	{ "tx_packets", VIRTNET_STAT_OFF(tx_packets)},
> +	{ "tx_kick", VIRTNET_STAT_OFF(tx_kick)},
> +	{ "tx_callbacks", VIRTNET_STAT_OFF(tx_callbacks)},
> +	{ "tx_queued_packets", VIRTNET_STAT_OFF(tx_queued_packets)},
> +	{ "tx_queued_bytes", VIRTNET_STAT_OFF(tx_queued_bytes)},
> +	{ "rx_bytes" , VIRTNET_STAT_OFF(rx_bytes)},
> +	{ "rx_packets", VIRTNET_STAT_OFF(rx_packets)},
> +	{ "rx_kick", VIRTNET_STAT_OFF(rx_kick)},
> +	{ "rx_callbacks", VIRTNET_STAT_OFF(rx_callbacks)},

VIRTNET_STAT_OFF does not save much here, but if you are after
saving characters then make the macro instanciate the string
as well.

>  };
>  
> +#define VIRTNET_NUM_STATS ARRAY_SIZE(virtnet_stats_str_attr)
> +

if you pass virtnet_stats_str_attr to VIRTNET_STAT macro,
then it's explicit and VIRTNET_NUM_STATS won't be needed either.

>  struct virtnet_info {
>  	struct virtio_device *vdev;
>  	struct virtqueue *rvq, *svq, *cvq;
> @@ -142,6 +171,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>  static void skb_xmit_done(struct virtqueue *svq)
>  {
>  	struct virtnet_info *vi = svq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->tx_callbacks++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Suppress further interrupts. */
>  	virtqueue_disable_cb(svq);
> @@ -461,6 +495,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
>  	bool oom;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
>  	do {
>  		if (vi->mergeable_rx_bufs)
> @@ -477,13 +512,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  	} while (err > 0);
>  	if (unlikely(vi->num > vi->max))
>  		vi->max = vi->num;
> -	virtqueue_kick(vi->rvq);
> +	if (virtqueue_kick_prepare(vi->rvq)) {
> +		virtqueue_notify(vi->rvq);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->rx_kick++;
> +		u64_stats_update_end(&stats->syncp);
> +	}
>  	return !oom;
>  }
>  
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->rx_callbacks++;
> +	u64_stats_update_end(&stats->syncp);
> +
>  	/* Schedule NAPI, Suppress further interrupts if successful. */
>  	if (napi_schedule_prep(&vi->napi)) {
>  		virtqueue_disable_cb(rvq);
> @@ -626,7 +672,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	int capacity;
> +	bool kick;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
> @@ -651,7 +699,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> -	virtqueue_kick(vi->svq);
> +
> +	kick = virtqueue_kick_prepare(vi->svq);
> +	if (kick)

probably
	 if (unlikely(kick))

> +		virtqueue_notify(vi->svq);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	if (kick)

this too

> +		stats->tx_kick++;
> +	stats->tx_queued_bytes += skb->len;
> +	stats->tx_queued_packets++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Don't wait up for transmitted skbs to be freed. */
>  	skb_orphan(skb);
> @@ -926,7 +984,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
>  
>  }
>  
> -
>  static void virtnet_get_drvinfo(struct net_device *dev,
>  				struct ethtool_drvinfo *info)
>  {
> @@ -939,10 +996,77 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> +{
> +	int i, cpu;
> +	switch (stringset) {
> +	case ETH_SS_STATS:
> +		for_each_possible_cpu(cpu)
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				sprintf(buf, "%s[%u]",
> +					virtnet_stats_str_attr[i].string, cpu);
> +				buf += ETH_GSTRING_LEN;
> +			}
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			memcpy(buf, virtnet_stats_str_attr[i].string,
> +				ETH_GSTRING_LEN);
> +			buf += ETH_GSTRING_LEN;
> +		}
> +		break;
> +	}
> +}
> +
> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return VIRTNET_NUM_STATS * (num_online_cpus() + 1);

This will allocate buffers for online cpus only, but the above
will fill them in for all possible cpus.
Will this overrun some buffer?

> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void virtnet_get_ethtool_stats(struct net_device *dev,
> +				struct ethtool_stats *stats, u64 *buf)

The coding style says
	Descendants are always substantially shorter than the parent and
	are placed substantially to the right.

you can't call it substantially to the right if it's to the left of
the opening '('  :), so please indent it aligning on the opening.

> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int cpu, i;
> +	unsigned int start;
> +	struct virtnet_stats sample, total;
> +
> +	memset(&total, 0, sizeof(total));
> +	memset(&sample, 0, sizeof(sample));
> +
> +	for_each_possible_cpu(cpu) {
> +		struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&stats->syncp);
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				VIRTNET_STAT(&sample, i) =
> +					VIRTNET_STAT(stats, i);

when you feel the need to break lines like this - don't :)
use an inline function instead.

> +

kill empty line here
> +			}

don't put {} around single statements pls.

> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			*buf = VIRTNET_STAT(&sample, i);
> +			VIRTNET_STAT(&total, i) += VIRTNET_STAT(stats, i);
> +			buf++;
> +		}
> +	}
> +
> +	for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +		*buf = VIRTNET_STAT(&total, i);
> +		buf++;
> +	}
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
> +	.get_strings = virtnet_get_strings,
> +	.get_sset_count = virtnet_get_sset_count,
>  };
>  
>  #define MIN_MTU 68

^ permalink raw reply

* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Konrad Rzeszutek Wilk @ 2012-06-05 16:49 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Russell King, akpm,
	nikunj, peterz, linux-kernel, x86, vatsa, virtualization, rjw,
	yong.zhang0, Ingo Molnar, Jan Beulich, H. Peter Anvin,
	Keir Fraser, Thomas Gleixner, paulmck, mingo
In-Reply-To: <4FCA5608.2010404@linux.vnet.ibm.com>

On Sat, Jun 02, 2012 at 11:36:00PM +0530, Srivatsa S. Bhat wrote:
> On 06/01/2012 09:06 PM, Jan Beulich wrote:
> 
> >>>> On 01.06.12 at 17:13, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
> > wrote:
> >> On 06/01/2012 06:29 PM, Jan Beulich wrote:
> >>
> >>>>>> On 01.06.12 at 11:11, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
> >>> wrote:
> >>>> xen_play_dead calls cpu_bringup() which looks weird, because xen_play_dead()
> >>>> is invoked in the cpu down path, whereas cpu_bringup() (as the name 
> >>>> suggests) is useful in the cpu bringup path.
> >>>
> >>> This might not be correct - the code as it is without this change is
> >>> safe even when the vCPU gets onlined back later by an external
> >>> entity (e.g. the Xen tool stack), and it would in that case resume
> >>> at the return point of the VCPUOP_down hypercall. That might
> >>> be a heritage from the original XenoLinux tree though, and be
> >>> meaningless in pv-ops context - Jeremy, Konrad?
> >>>
> >>> Possibly it was bogus/unused even in that original tree - Keir?
> >>>
> >>
> >>
> >> Thanks for your comments Jan!
> >>
> >> In case this change is wrong, the other method I had in mind was to call
> >> cpu_bringup_and_idle() in xen_play_dead(). (Even ARM does something similar,
> >> in the sense that it runs the cpu bringup code including cpu_idle(), in the
> >> cpu offline path, namely the cpu_die() function). Would that approach work
> >> for xen as well? If yes, then we wouldn't have any issues to convert xen to
> >> generic code.
> > 
> > No, that wouldn't work either afaict - the function is expected
> > to return.
> > 
> 
> 
> Ok.. So, I would love to hear a confirmation about whether this patch (which
> removes cpu_bringup() in xen_play_dead()) will break things or it is good as is.

I think it will break - are these patches available on some git tree to test them out?

> 
> If its not correct, then we can probably make __cpu_post_online() return an int,
> with the meaning:
> 
> 0 => success, go ahead and call cpu_idle()
> non-zero => stop here, thanks for your services so far.. now leave the rest to me.
> 
> So all other archs will return 0, Xen will return non-zero, and it will handle
> when to call cpu_idle() and when not to do so.

The call-chain (this is taken from 41bd956de3dfdc3a43708fe2e0c8096c69064a1e):

    cpu_bringup_and_idle:
     \- cpu_bringup
      |   \-[preempt_disable]
      |
      |- cpu_idle
           \- play_dead [assuming the user offlined the VCPU]
           |     \
           |     +- (xen_play_dead)
           |          \- HYPERVISOR_VCPU_off [so VCPU is dead, once user
           |          |                       onlines it starts from here]
           |          \- cpu_bringup [preempt_disable]
           |
           +- preempt_enable_no_reschedule()
           +- schedule()
           \- preempt_enable()

Which I think is a bit different from your use-case?

> 
> Might sound a bit ugly, but I don't see much other option. Suggestions are
> appreciated!
> 
> Regards,
> Srivatsa S. Bhat

^ permalink raw reply

* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Srivatsa S. Bhat @ 2012-06-05 17:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Russell King, akpm,
	nikunj, peterz, linux-kernel, x86, vatsa, virtualization, rjw,
	yong.zhang0, Ingo Molnar, Jan Beulich, H. Peter Anvin,
	Keir Fraser, Thomas Gleixner, paulmck, mingo
In-Reply-To: <20120605164957.GA1997@phenom.dumpdata.com>

On 06/05/2012 10:19 PM, Konrad Rzeszutek Wilk wrote:

> On Sat, Jun 02, 2012 at 11:36:00PM +0530, Srivatsa S. Bhat wrote:
>> On 06/01/2012 09:06 PM, Jan Beulich wrote:
>>
>>>>>> On 01.06.12 at 17:13, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>>> wrote:
>>>> On 06/01/2012 06:29 PM, Jan Beulich wrote:
>>>>
>>>>>>>> On 01.06.12 at 11:11, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>>>>> wrote:
>>>>>> xen_play_dead calls cpu_bringup() which looks weird, because xen_play_dead()
>>>>>> is invoked in the cpu down path, whereas cpu_bringup() (as the name 
>>>>>> suggests) is useful in the cpu bringup path.
>>>>>
>>>>> This might not be correct - the code as it is without this change is
>>>>> safe even when the vCPU gets onlined back later by an external
>>>>> entity (e.g. the Xen tool stack), and it would in that case resume
>>>>> at the return point of the VCPUOP_down hypercall. That might
>>>>> be a heritage from the original XenoLinux tree though, and be
>>>>> meaningless in pv-ops context - Jeremy, Konrad?
>>>>>
>>>>> Possibly it was bogus/unused even in that original tree - Keir?
>>>>>
>>>>
>>>>
>>>> Thanks for your comments Jan!
>>>>
>>>> In case this change is wrong, the other method I had in mind was to call
>>>> cpu_bringup_and_idle() in xen_play_dead(). (Even ARM does something similar,
>>>> in the sense that it runs the cpu bringup code including cpu_idle(), in the
>>>> cpu offline path, namely the cpu_die() function). Would that approach work
>>>> for xen as well? If yes, then we wouldn't have any issues to convert xen to
>>>> generic code.
>>>
>>> No, that wouldn't work either afaict - the function is expected
>>> to return.
>>>
>>
>>
>> Ok.. So, I would love to hear a confirmation about whether this patch (which
>> removes cpu_bringup() in xen_play_dead()) will break things or it is good as is.
> 
> I think it will break - are these patches available on some git tree to test them out?
> 


Oh, sorry I haven't hosted them on any git tree.. 

>>
>> If its not correct, then we can probably make __cpu_post_online() return an int,
>> with the meaning:
>>
>> 0 => success, go ahead and call cpu_idle()
>> non-zero => stop here, thanks for your services so far.. now leave the rest to me.
>>
>> So all other archs will return 0, Xen will return non-zero, and it will handle
>> when to call cpu_idle() and when not to do so.
> 
> The call-chain (this is taken from 41bd956de3dfdc3a43708fe2e0c8096c69064a1e):
> 
>     cpu_bringup_and_idle:
>      \- cpu_bringup
>       |   \-[preempt_disable]
>       |
>       |- cpu_idle
>            \- play_dead [assuming the user offlined the VCPU]
>            |     \
>            |     +- (xen_play_dead)
>            |          \- HYPERVISOR_VCPU_off [so VCPU is dead, once user
>            |          |                       onlines it starts from here]
>            |          \- cpu_bringup [preempt_disable]
>            |
>            +- preempt_enable_no_reschedule()
>            +- schedule()
>            \- preempt_enable()
> 
> Which I think is a bit different from your use-case?
> 


Yes, when this patch is applied, the call to cpu_bringup() after HYPERVISOR_VCPU_off
gets removed. So it will look like this:

    cpu_bringup_and_idle:
     \- cpu_bringup
      |   \-[preempt_disable]
      |
      |- cpu_idle
           \- play_dead [assuming the user offlined the VCPU]
           |     \
           |     +- (xen_play_dead)
           |          \- HYPERVISOR_VCPU_off [so VCPU is dead, once user
           |                                  onlines it starts from here]
           |
           |
           +- preempt_enable_no_reschedule()
           +- schedule()
           \- preempt_enable()


And hence we wouldn't have the preempt imbalance, hence no need for the
extra preempt_enable() in xen_play_dead().

So, basically our problem is this:
The generic smp booting code calls cpu_idle() after setting the cpu in
cpu_online_mask etc, because this call to cpu_idle() is used in every
architecture. But just for xen, that too only in the cpu down path, this
call is omitted - which makes it difficult to convert xen to the generic
smp booting framework.

So I proposed 3 solutions, of which we can choose whichever that doesn't
break stuff and whichever looks the least ugly:

1. Omit the call to cpu_bringup() after HYPERVISOR_VCPU_off (which this
patch does).

2. Or, invoke cpu_bringup_and_idle() after HYPERVISOR_VCPU_off (Jan said
this might not work since we are expected to return). ARM actually does
something like this (it does the complete bringup including idle in the
cpu down path).

3. Or, Just for the sake of Xen, convert the __cpu_post_online() function to
return an int - Xen will return non-zero and do the next steps itself,
also deciding between whether or not to call cpu_idle(). All other archs
will just return 0, and allow the generic smp booting code to continue
on to cpu_idle().

Please let me know your thoughts.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Thomas Gleixner @ 2012-06-05 17:40 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Russell King, nikunj,
	Konrad Rzeszutek Wilk, peterz, linux-kernel, x86, vatsa,
	virtualization, rjw, yong.zhang0, Ingo Molnar, Jan Beulich,
	H. Peter Anvin, Keir Fraser, akpm, paulmck, mingo
In-Reply-To: <4FCA5608.2010404@linux.vnet.ibm.com>

On Sat, 2 Jun 2012, Srivatsa S. Bhat wrote:
> Ok.. So, I would love to hear a confirmation about whether this patch (which
> removes cpu_bringup() in xen_play_dead()) will break things or it is good as is.
> 
> If its not correct, then we can probably make __cpu_post_online() return an int,
> with the meaning:
> 
> 0 => success, go ahead and call cpu_idle()
> non-zero => stop here, thanks for your services so far.. now leave the rest to me.
> 
> So all other archs will return 0, Xen will return non-zero, and it will handle
> when to call cpu_idle() and when not to do so.
> 
> Might sound a bit ugly, but I don't see much other option. Suggestions are
> appreciated!

Yes, it's butt ugly. 

You are tripping over the main misconception of the current hotplug
code: It's asymetric.

So people added warts and workarounds like the xen one. What you are
proposing is another wart and workaround.

The real way to avoid it, is to have the symetric state machine in
place first and then convert everything to that instead of introducing
an intermediate state which resembles the existing state.

One of the main things we need to do to make it symetric is to kill
the play_dead() thing in the idle loop and make idle a function which
returns on cpu_should_die().

Give me a day or two and I get you a working version of that. (Up is
functional, just down refuses to play along)

Thanks,

	tglx

^ permalink raw reply

* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Srivatsa S. Bhat @ 2012-06-05 17:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Russell King, nikunj,
	Konrad Rzeszutek Wilk, peterz, linux-kernel, x86, vatsa,
	virtualization, rjw, yong.zhang0, Ingo Molnar, Jan Beulich,
	H. Peter Anvin, Keir Fraser, akpm, paulmck, mingo
In-Reply-To: <alpine.LFD.2.02.1206051921320.3086@ionos>

On 06/05/2012 11:10 PM, Thomas Gleixner wrote:

> On Sat, 2 Jun 2012, Srivatsa S. Bhat wrote:
>> Ok.. So, I would love to hear a confirmation about whether this patch (which
>> removes cpu_bringup() in xen_play_dead()) will break things or it is good as is.
>>
>> If its not correct, then we can probably make __cpu_post_online() return an int,
>> with the meaning:
>>
>> 0 => success, go ahead and call cpu_idle()
>> non-zero => stop here, thanks for your services so far.. now leave the rest to me.
>>
>> So all other archs will return 0, Xen will return non-zero, and it will handle
>> when to call cpu_idle() and when not to do so.
>>
>> Might sound a bit ugly, but I don't see much other option. Suggestions are
>> appreciated!
> 
> Yes, it's butt ugly. 
> 
> You are tripping over the main misconception of the current hotplug
> code: It's asymetric.
> 
> So people added warts and workarounds like the xen one. What you are
> proposing is another wart and workaround.
> 
> The real way to avoid it, is to have the symetric state machine in
> place first and then convert everything to that instead of introducing
> an intermediate state which resembles the existing state.
> 
> One of the main things we need to do to make it symetric is to kill
> the play_dead() thing in the idle loop and make idle a function which
> returns on cpu_should_die().
> 
> Give me a day or two and I get you a working version of that. (Up is
> functional, just down refuses to play along)
> 


Oh great! So, then I'll wait for your patches and then adapt this patchset
to your model then. Let me know if I can help out with something..

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [net-next RFC PATCH] virtio_net: collect satistics and export through ethtool
From: Jason Wang @ 2012-06-06  5:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120605101037.GA19834@redhat.com>

On 06/05/2012 06:10 PM, Michael S. Tsirkin wrote:
> On Tue, Jun 05, 2012 at 04:38:41PM +0800, Jason Wang wrote:
>> Satistics counters is useful for debugging and performance optimization, so this
>> patch lets virtio_net driver collect following and export them to userspace
>> through "ethtool -S":
>>
>> - number of packets sent/received
>> - number of bytes sent/received
>> - number of callbacks for tx/rx
>> - number of kick for tx/rx
>> - number of bytes/packets queued for tx
>>
>> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were exposed
>> like:
>>
>> NIC statistics:
>>       tx_bytes[0]: 2551
>>       tx_packets[0]: 12
>>       tx_kick[0]: 12
>>       tx_callbacks[0]: 1
>>       tx_queued_packets[0]: 12
>>       tx_queued_bytes[0]: 3055
>>       rx_bytes[0]: 0
>>       rx_packets[0]: 0
>>       rx_kick[0]: 0
>>       rx_callbacks[0]: 0
>>       tx_bytes[1]: 5575
>>       tx_packets[1]: 37
>>       tx_kick[1]: 38
>>       tx_callbacks[1]: 0
>>       tx_queued_packets[1]: 38
>>       tx_queued_bytes[1]: 5217
>>       rx_bytes[1]: 4175
>>       rx_packets[1]: 25
>>       rx_kick[1]: 1
>>       rx_callbacks[1]: 16
>>       tx_bytes: 8126
>>       tx_packets: 49
>>       tx_kick: 50
>>       tx_callbacks: 1
>>       tx_queued_packets: 50
>>       tx_queued_bytes: 8272
>>       rx_bytes: 4175
>>       rx_packets: 25
>>       rx_kick: 1
>>       rx_callbacks: 16
>>
>> TODO:
>>
>> - more satistics
>> - unitfy the ndo_get_stats64 and get_ethtool_stats
>> - calculate the pending bytes/pkts
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c |  130 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 files changed, 127 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 5214b1e..7ab0cc1 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -41,6 +41,10 @@ module_param(gso, bool, 0444);
>>   #define VIRTNET_SEND_COMMAND_SG_MAX    2
>>   #define VIRTNET_DRIVER_VERSION "1.0.0"
>>
>> +#define VIRTNET_STAT_OFF(m) offsetof(struct virtnet_stats, m)
>> +#define VIRTNET_STAT(stat, i) (*((u64 *)((char *)stat + \
> What's going on? Why cast to char *?

It's used to let the pointer advance at the unit of bytes instead of the 
whole stat strcuture.
>> +			       virtnet_stats_str_attr[i].stat_offset)))
> These are confusing unless you see what virtnet_stats_str_attr
> is so please move them near that definition.

ok.
>> +
>>   struct virtnet_stats {
>>   	struct u64_stats_sync syncp;
>>   	u64 tx_bytes;
>> @@ -48,8 +52,33 @@ struct virtnet_stats {
>>
>>   	u64 rx_bytes;
>>   	u64 rx_packets;
>> +
>> +	u64 tx_kick;
>> +	u64 rx_kick;
>> +	u64 tx_callbacks;
>> +	u64 rx_callbacks;
>> +	u64 tx_queued_packets;
>> +	u64 tx_queued_bytes;
>> +};
> I have an idea (not a must): why don't we simply create an enum
> enum virtnet_stats {
> 	VIRTNET_TX_KICK,
> 	VIRTNET_RX_KICK,
> 	....
> 	VIRTNET_MAX_STAT,
> }
>
>
> now stats can just do
> 	stats->data[VIRTNET_RX_KICK] instead of stats->rx_kick
> which is not a big problem, but copying them in bulk
> becomes straight-forward, no need for macros at all.
>
> If we decide to do this, needs to be a separate patch,
> then this one on top.

Make sense, would do this.
>> +
>> +static struct {
> static const.
>
>> +	char string[ETH_GSTRING_LEN];
>> +	int stat_offset;
>> +} virtnet_stats_str_attr[] = {
>> +	{ "tx_bytes", VIRTNET_STAT_OFF(tx_bytes)},
>> +	{ "tx_packets", VIRTNET_STAT_OFF(tx_packets)},
>> +	{ "tx_kick", VIRTNET_STAT_OFF(tx_kick)},
>> +	{ "tx_callbacks", VIRTNET_STAT_OFF(tx_callbacks)},
>> +	{ "tx_queued_packets", VIRTNET_STAT_OFF(tx_queued_packets)},
>> +	{ "tx_queued_bytes", VIRTNET_STAT_OFF(tx_queued_bytes)},
>> +	{ "rx_bytes" , VIRTNET_STAT_OFF(rx_bytes)},
>> +	{ "rx_packets", VIRTNET_STAT_OFF(rx_packets)},
>> +	{ "rx_kick", VIRTNET_STAT_OFF(rx_kick)},
>> +	{ "rx_callbacks", VIRTNET_STAT_OFF(rx_callbacks)},
> VIRTNET_STAT_OFF does not save much here, but if you are after
> saving characters then make the macro instanciate the string
> as well.
>
>>   };
>>
>> +#define VIRTNET_NUM_STATS ARRAY_SIZE(virtnet_stats_str_attr)
>> +
> if you pass virtnet_stats_str_attr to VIRTNET_STAT macro,
> then it's explicit and VIRTNET_NUM_STATS won't be needed either.

It's used to report the number of satistics through .get_sset_count.
>
>>   struct virtnet_info {
>>   	struct virtio_device *vdev;
>>   	struct virtqueue *rvq, *svq, *cvq;
>> @@ -142,6 +171,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>>   static void skb_xmit_done(struct virtqueue *svq)
>>   {
>>   	struct virtnet_info *vi = svq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->tx_callbacks++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Suppress further interrupts. */
>>   	virtqueue_disable_cb(svq);
>> @@ -461,6 +495,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   {
>>   	int err;
>>   	bool oom;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>
>>   	do {
>>   		if (vi->mergeable_rx_bufs)
>> @@ -477,13 +512,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   	} while (err>  0);
>>   	if (unlikely(vi->num>  vi->max))
>>   		vi->max = vi->num;
>> -	virtqueue_kick(vi->rvq);
>> +	if (virtqueue_kick_prepare(vi->rvq)) {
>> +		virtqueue_notify(vi->rvq);
>> +		u64_stats_update_begin(&stats->syncp);
>> +		stats->rx_kick++;
>> +		u64_stats_update_end(&stats->syncp);
>> +	}
>>   	return !oom;
>>   }
>>
>>   static void skb_recv_done(struct virtqueue *rvq)
>>   {
>>   	struct virtnet_info *vi = rvq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->rx_callbacks++;
>> +	u64_stats_update_end(&stats->syncp);
>> +
>>   	/* Schedule NAPI, Suppress further interrupts if successful. */
>>   	if (napi_schedule_prep(&vi->napi)) {
>>   		virtqueue_disable_cb(rvq);
>> @@ -626,7 +672,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>>   static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct virtnet_info *vi = netdev_priv(dev);
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>   	int capacity;
>> +	bool kick;
>>
>>   	/* Free up any pending old buffers before queueing new ones. */
>>   	free_old_xmit_skbs(vi);
>> @@ -651,7 +699,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   		kfree_skb(skb);
>>   		return NETDEV_TX_OK;
>>   	}
>> -	virtqueue_kick(vi->svq);
>> +
>> +	kick = virtqueue_kick_prepare(vi->svq);
>> +	if (kick)
> probably
> 	 if (unlikely(kick))
>
>> +		virtqueue_notify(vi->svq);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	if (kick)
> this too
>
>> +		stats->tx_kick++;
>> +	stats->tx_queued_bytes += skb->len;
>> +	stats->tx_queued_packets++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Don't wait up for transmitted skbs to be freed. */
>>   	skb_orphan(skb);
>> @@ -926,7 +984,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
>>
>>   }
>>
>> -
>>   static void virtnet_get_drvinfo(struct net_device *dev,
>>   				struct ethtool_drvinfo *info)
>>   {
>> @@ -939,10 +996,77 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>>
>>   }
>>
>> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
>> +{
>> +	int i, cpu;
>> +	switch (stringset) {
>> +	case ETH_SS_STATS:
>> +		for_each_possible_cpu(cpu)
>> +			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +				sprintf(buf, "%s[%u]",
>> +					virtnet_stats_str_attr[i].string, cpu);
>> +				buf += ETH_GSTRING_LEN;
>> +			}
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			memcpy(buf, virtnet_stats_str_attr[i].string,
>> +				ETH_GSTRING_LEN);
>> +			buf += ETH_GSTRING_LEN;
>> +		}
>> +		break;
>> +	}
>> +}
>> +
>> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
>> +{
>> +	switch (sset) {
>> +	case ETH_SS_STATS:
>> +		return VIRTNET_NUM_STATS * (num_online_cpus() + 1);
> This will allocate buffers for online cpus only, but the above
> will fill them in for all possible cpus.
> Will this overrun some buffer?
>

Yes, a typo here, should be num_possible_cpus().
Thanks
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +}
>> +
>> +static void virtnet_get_ethtool_stats(struct net_device *dev,
>> +				struct ethtool_stats *stats, u64 *buf)
> The coding style says
> 	Descendants are always substantially shorter than the parent and
> 	are placed substantially to the right.
>
> you can't call it substantially to the right if it's to the left of
> the opening '('  :), so please indent it aligning on the opening.

Looks like something wrong in my emacs c-style confiugration, would 
check it.
>> +{
>> +	struct virtnet_info *vi = netdev_priv(dev);
>> +	int cpu, i;
>> +	unsigned int start;
>> +	struct virtnet_stats sample, total;
>> +
>> +	memset(&total, 0, sizeof(total));
>> +	memset(&sample, 0, sizeof(sample));
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
>> +		do {
>> +			start = u64_stats_fetch_begin(&stats->syncp);
>> +			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +				VIRTNET_STAT(&sample, i) =
>> +					VIRTNET_STAT(stats, i);
> when you feel the need to break lines like this - don't :)
> use an inline function instead.
>
>> +
> kill empty line here
>> +			}
> don't put {} around single statements pls.

Sure
>> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			*buf = VIRTNET_STAT(&sample, i);
>> +			VIRTNET_STAT(&total, i) += VIRTNET_STAT(stats, i);
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +		*buf = VIRTNET_STAT(&total, i);
>> +		buf++;
>> +	}
>> +}
>> +
>>   static const struct ethtool_ops virtnet_ethtool_ops = {
>>   	.get_drvinfo = virtnet_get_drvinfo,
>>   	.get_link = ethtool_op_get_link,
>>   	.get_ringparam = virtnet_get_ringparam,
>> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
>> +	.get_strings = virtnet_get_strings,
>> +	.get_sset_count = virtnet_get_sset_count,
>>   };
>>
>>   #define MIN_MTU 68
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [vmw_vmci RFC 00/11] VMCI for Linux
From: Greg KH @ 2012-06-06  5:06 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Andy King, linux-kernel, virtualization, dsouders,
	Andrew Stiegmann (stieg), akpm, cschamp
In-Reply-To: <20120605070251.GA28032@dtor-ws.eng.vmware.com>

On Tue, Jun 05, 2012 at 12:02:51AM -0700, Dmitry Torokhov wrote:
> Hi Greg,
> 
> On Mon, Jun 04, 2012 at 03:57:57PM -0700, Greg KH wrote:
> > On Fri, Jun 01, 2012 at 08:33:02AM -0700, Andy King wrote:
> > > Greg,
> > > 
> > > Thanks so much for the comments and apologies for the delayed response.
> > > 
> > > > Don't we have something like this already for KVM and maybe Xen?
> > > > virtio?  Can't you use that code instead of a new block of code that
> > > > is only used by vmware users?  It has virtual pci devices which
> > > > should give you what you want/need here, right?
> > > >
> > > > If not, why doesn't that work for you?  Would it be easier to just
> > > > extend it?
> > > 
> > > The VMCI virtual device for which this driver is intended has been
> > > around a lot longer than this submission might suggest.  The virtual
> > > hardware was released in a product before Rusty sent his RFC and
> > > quite a bit before it made it to mainline; there was, regrettably,
> > > no virtio then.
> > > 
> > > As such, it was designed to be its own transport, and it's something
> > > that is now very much fixed at the hardware level (enhancements
> > > not withstanding), and which we have to support all the way back.
> > 
> > What "hardware" are you refering to here?
> 
> The virtual hardware that is currently shipping and has been shipping
> for a few years.
> 
> > 
> > > In addition to that, our hypervisor endpoints are written using
> > > the existing device backend; virtio doesn't currently make a lot of
> > > sense for them, and would require a lot of additional work.
> > > 
> > > All of this is unfortunate.  While I agree that virtio is certainly
> > > the right approach, and we need to avoid this proliferation, I think
> > > at this point we'd really like to try and upstream this in its current
> > > form.  There's certainly the possibility going forwards that we could
> > > add a glue layer, such that other clients could use virtio if they're
> > > willing to write their own hypervisor endpoints.
> > > 
> > > Does that sound reasonable?
> > 
> > Not really, why should we take an interface that is tied to something
> > that you are saying isn't something we should be using?
> 
> That is not what Andy said. If virtio was available when we started
> shipping VMCI then we certainly could have used that, but since it
> wasn't there we invented something else.

Ok, that makes sense.

> > Don't you also
> > have control over the hypervisor side of things in order to properly
> > design this type of thing?
> 
> We do not have a time machine to go back and change products that we
> already shipped to the customers. It is probably the same story as with
> Hyper-V's vmbus which is not virtio.
> 
> Besides, virtio is not available on non-Linux guests with we have to
> support as well, and than affected the design decisions in hypervisor
> layer that have been made several years ago.

Ok, thanks for clearing that up, I was confused here.

greg k-h

^ permalink raw reply

* Re: [net-next RFC PATCH] virtio_net: collect satistics and export through ethtool
From: Michael S. Tsirkin @ 2012-06-06  6:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <4FCEE45B.8080800@redhat.com>

On Wed, Jun 06, 2012 at 01:02:19PM +0800, Jason Wang wrote:
> On 06/05/2012 06:10 PM, Michael S. Tsirkin wrote:
> >On Tue, Jun 05, 2012 at 04:38:41PM +0800, Jason Wang wrote:
> >>Satistics counters is useful for debugging and performance optimization, so this
> >>patch lets virtio_net driver collect following and export them to userspace
> >>through "ethtool -S":
> >>
> >>- number of packets sent/received
> >>- number of bytes sent/received
> >>- number of callbacks for tx/rx
> >>- number of kick for tx/rx
> >>- number of bytes/packets queued for tx
> >>
> >>As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were exposed
> >>like:
> >>
> >>NIC statistics:
> >>      tx_bytes[0]: 2551
> >>      tx_packets[0]: 12
> >>      tx_kick[0]: 12
> >>      tx_callbacks[0]: 1
> >>      tx_queued_packets[0]: 12
> >>      tx_queued_bytes[0]: 3055
> >>      rx_bytes[0]: 0
> >>      rx_packets[0]: 0
> >>      rx_kick[0]: 0
> >>      rx_callbacks[0]: 0
> >>      tx_bytes[1]: 5575
> >>      tx_packets[1]: 37
> >>      tx_kick[1]: 38
> >>      tx_callbacks[1]: 0
> >>      tx_queued_packets[1]: 38
> >>      tx_queued_bytes[1]: 5217
> >>      rx_bytes[1]: 4175
> >>      rx_packets[1]: 25
> >>      rx_kick[1]: 1
> >>      rx_callbacks[1]: 16
> >>      tx_bytes: 8126
> >>      tx_packets: 49
> >>      tx_kick: 50
> >>      tx_callbacks: 1
> >>      tx_queued_packets: 50
> >>      tx_queued_bytes: 8272
> >>      rx_bytes: 4175
> >>      rx_packets: 25
> >>      rx_kick: 1
> >>      rx_callbacks: 16
> >>
> >>TODO:
> >>
> >>- more satistics
> >>- unitfy the ndo_get_stats64 and get_ethtool_stats
> >>- calculate the pending bytes/pkts
> >>
> >>Signed-off-by: Jason Wang<jasowang@redhat.com>
> >>---
> >>  drivers/net/virtio_net.c |  130 +++++++++++++++++++++++++++++++++++++++++++++-
> >>  1 files changed, 127 insertions(+), 3 deletions(-)
> >>
> >>diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>index 5214b1e..7ab0cc1 100644
> >>--- a/drivers/net/virtio_net.c
> >>+++ b/drivers/net/virtio_net.c
> >>@@ -41,6 +41,10 @@ module_param(gso, bool, 0444);
> >>  #define VIRTNET_SEND_COMMAND_SG_MAX    2
> >>  #define VIRTNET_DRIVER_VERSION "1.0.0"
> >>
> >>+#define VIRTNET_STAT_OFF(m) offsetof(struct virtnet_stats, m)
> >>+#define VIRTNET_STAT(stat, i) (*((u64 *)((char *)stat + \
> >What's going on? Why cast to char *?
> 
> It's used to let the pointer advance at the unit of bytes instead of
> the whole stat strcuture.

Make offset be in units of sizeof u64 and you won't need
this hack.
Also, macro parameters must be surrounded with ().

> >>+			       virtnet_stats_str_attr[i].stat_offset)))
> >These are confusing unless you see what virtnet_stats_str_attr
> >is so please move them near that definition.
> 
> ok.
> >>+
> >>  struct virtnet_stats {
> >>  	struct u64_stats_sync syncp;
> >>  	u64 tx_bytes;
> >>@@ -48,8 +52,33 @@ struct virtnet_stats {
> >>
> >>  	u64 rx_bytes;
> >>  	u64 rx_packets;
> >>+
> >>+	u64 tx_kick;
> >>+	u64 rx_kick;
> >>+	u64 tx_callbacks;
> >>+	u64 rx_callbacks;
> >>+	u64 tx_queued_packets;
> >>+	u64 tx_queued_bytes;
> >>+};
> >I have an idea (not a must): why don't we simply create an enum
> >enum virtnet_stats {
> >	VIRTNET_TX_KICK,
> >	VIRTNET_RX_KICK,
> >	....
> >	VIRTNET_MAX_STAT,
> >}
> >
> >
> >now stats can just do
> >	stats->data[VIRTNET_RX_KICK] instead of stats->rx_kick
> >which is not a big problem, but copying them in bulk
> >becomes straight-forward, no need for macros at all.
> >
> >If we decide to do this, needs to be a separate patch,
> >then this one on top.
> 
> Make sense, would do this.
> >>+
> >>+static struct {
> >static const.
> >
> >>+	char string[ETH_GSTRING_LEN];
> >>+	int stat_offset;
> >>+} virtnet_stats_str_attr[] = {
> >>+	{ "tx_bytes", VIRTNET_STAT_OFF(tx_bytes)},
> >>+	{ "tx_packets", VIRTNET_STAT_OFF(tx_packets)},
> >>+	{ "tx_kick", VIRTNET_STAT_OFF(tx_kick)},
> >>+	{ "tx_callbacks", VIRTNET_STAT_OFF(tx_callbacks)},
> >>+	{ "tx_queued_packets", VIRTNET_STAT_OFF(tx_queued_packets)},
> >>+	{ "tx_queued_bytes", VIRTNET_STAT_OFF(tx_queued_bytes)},
> >>+	{ "rx_bytes" , VIRTNET_STAT_OFF(rx_bytes)},
> >>+	{ "rx_packets", VIRTNET_STAT_OFF(rx_packets)},
> >>+	{ "rx_kick", VIRTNET_STAT_OFF(rx_kick)},
> >>+	{ "rx_callbacks", VIRTNET_STAT_OFF(rx_callbacks)},
> >VIRTNET_STAT_OFF does not save much here, but if you are after
> >saving characters then make the macro instanciate the string
> >as well.
> >
> >>  };
> >>
> >>+#define VIRTNET_NUM_STATS ARRAY_SIZE(virtnet_stats_str_attr)
> >>+
> >if you pass virtnet_stats_str_attr to VIRTNET_STAT macro,
> >then it's explicit and VIRTNET_NUM_STATS won't be needed either.
> 
> It's used to report the number of satistics through .get_sset_count.

Yes but you can then open-code.

> >
> >>  struct virtnet_info {
> >>  	struct virtio_device *vdev;
> >>  	struct virtqueue *rvq, *svq, *cvq;
> >>@@ -142,6 +171,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
> >>  static void skb_xmit_done(struct virtqueue *svq)
> >>  {
> >>  	struct virtnet_info *vi = svq->vdev->priv;
> >>+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >>+
> >>+	u64_stats_update_begin(&stats->syncp);
> >>+	stats->tx_callbacks++;
> >>+	u64_stats_update_end(&stats->syncp);
> >>
> >>  	/* Suppress further interrupts. */
> >>  	virtqueue_disable_cb(svq);
> >>@@ -461,6 +495,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
> >>  {
> >>  	int err;
> >>  	bool oom;
> >>+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >>
> >>  	do {
> >>  		if (vi->mergeable_rx_bufs)
> >>@@ -477,13 +512,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
> >>  	} while (err>  0);
> >>  	if (unlikely(vi->num>  vi->max))
> >>  		vi->max = vi->num;
> >>-	virtqueue_kick(vi->rvq);
> >>+	if (virtqueue_kick_prepare(vi->rvq)) {
> >>+		virtqueue_notify(vi->rvq);
> >>+		u64_stats_update_begin(&stats->syncp);
> >>+		stats->rx_kick++;
> >>+		u64_stats_update_end(&stats->syncp);
> >>+	}
> >>  	return !oom;
> >>  }
> >>
> >>  static void skb_recv_done(struct virtqueue *rvq)
> >>  {
> >>  	struct virtnet_info *vi = rvq->vdev->priv;
> >>+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >>+
> >>+	u64_stats_update_begin(&stats->syncp);
> >>+	stats->rx_callbacks++;
> >>+	u64_stats_update_end(&stats->syncp);
> >>+
> >>  	/* Schedule NAPI, Suppress further interrupts if successful. */
> >>  	if (napi_schedule_prep(&vi->napi)) {
> >>  		virtqueue_disable_cb(rvq);
> >>@@ -626,7 +672,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> >>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >>  {
> >>  	struct virtnet_info *vi = netdev_priv(dev);
> >>+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >>  	int capacity;
> >>+	bool kick;
> >>
> >>  	/* Free up any pending old buffers before queueing new ones. */
> >>  	free_old_xmit_skbs(vi);
> >>@@ -651,7 +699,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >>  		kfree_skb(skb);
> >>  		return NETDEV_TX_OK;
> >>  	}
> >>-	virtqueue_kick(vi->svq);
> >>+
> >>+	kick = virtqueue_kick_prepare(vi->svq);
> >>+	if (kick)
> >probably
> >	 if (unlikely(kick))
> >
> >>+		virtqueue_notify(vi->svq);
> >>+
> >>+	u64_stats_update_begin(&stats->syncp);
> >>+	if (kick)
> >this too
> >
> >>+		stats->tx_kick++;
> >>+	stats->tx_queued_bytes += skb->len;
> >>+	stats->tx_queued_packets++;
> >>+	u64_stats_update_end(&stats->syncp);
> >>
> >>  	/* Don't wait up for transmitted skbs to be freed. */
> >>  	skb_orphan(skb);
> >>@@ -926,7 +984,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
> >>
> >>  }
> >>
> >>-
> >>  static void virtnet_get_drvinfo(struct net_device *dev,
> >>  				struct ethtool_drvinfo *info)
> >>  {
> >>@@ -939,10 +996,77 @@ static void virtnet_get_drvinfo(struct net_device *dev,
> >>
> >>  }
> >>
> >>+static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> >>+{
> >>+	int i, cpu;
> >>+	switch (stringset) {
> >>+	case ETH_SS_STATS:
> >>+		for_each_possible_cpu(cpu)
> >>+			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
> >>+				sprintf(buf, "%s[%u]",
> >>+					virtnet_stats_str_attr[i].string, cpu);
> >>+				buf += ETH_GSTRING_LEN;
> >>+			}
> >>+		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
> >>+			memcpy(buf, virtnet_stats_str_attr[i].string,
> >>+				ETH_GSTRING_LEN);
> >>+			buf += ETH_GSTRING_LEN;
> >>+		}
> >>+		break;
> >>+	}
> >>+}
> >>+
> >>+static int virtnet_get_sset_count(struct net_device *dev, int sset)
> >>+{
> >>+	switch (sset) {
> >>+	case ETH_SS_STATS:
> >>+		return VIRTNET_NUM_STATS * (num_online_cpus() + 1);
> >This will allocate buffers for online cpus only, but the above
> >will fill them in for all possible cpus.
> >Will this overrun some buffer?
> >
> 
> Yes, a typo here, should be num_possible_cpus().
> Thanks
> >>+	default:
> >>+		return -EOPNOTSUPP;
> >>+	}
> >>+}
> >>+
> >>+static void virtnet_get_ethtool_stats(struct net_device *dev,
> >>+				struct ethtool_stats *stats, u64 *buf)
> >The coding style says
> >	Descendants are always substantially shorter than the parent and
> >	are placed substantially to the right.
> >
> >you can't call it substantially to the right if it's to the left of
> >the opening '('  :), so please indent it aligning on the opening.
> 
> Looks like something wrong in my emacs c-style confiugration, would
> check it.
> >>+{
> >>+	struct virtnet_info *vi = netdev_priv(dev);
> >>+	int cpu, i;
> >>+	unsigned int start;
> >>+	struct virtnet_stats sample, total;
> >>+
> >>+	memset(&total, 0, sizeof(total));
> >>+	memset(&sample, 0, sizeof(sample));
> >>+
> >>+	for_each_possible_cpu(cpu) {
> >>+		struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
> >>+		do {
> >>+			start = u64_stats_fetch_begin(&stats->syncp);
> >>+			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
> >>+				VIRTNET_STAT(&sample, i) =
> >>+					VIRTNET_STAT(stats, i);
> >when you feel the need to break lines like this - don't :)
> >use an inline function instead.
> >
> >>+
> >kill empty line here
> >>+			}
> >don't put {} around single statements pls.
> 
> Sure
> >>+		} while (u64_stats_fetch_retry(&stats->syncp, start));
> >>+		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
> >>+			*buf = VIRTNET_STAT(&sample, i);
> >>+			VIRTNET_STAT(&total, i) += VIRTNET_STAT(stats, i);
> >>+			buf++;
> >>+		}
> >>+	}
> >>+
> >>+	for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
> >>+		*buf = VIRTNET_STAT(&total, i);
> >>+		buf++;
> >>+	}
> >>+}
> >>+
> >>  static const struct ethtool_ops virtnet_ethtool_ops = {
> >>  	.get_drvinfo = virtnet_get_drvinfo,
> >>  	.get_link = ethtool_op_get_link,
> >>  	.get_ringparam = virtnet_get_ringparam,
> >>+	.get_ethtool_stats = virtnet_get_ethtool_stats,
> >>+	.get_strings = virtnet_get_strings,
> >>+	.get_sset_count = virtnet_get_sset_count,
> >>  };
> >>
> >>  #define MIN_MTU 68
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [V2 RFC net-next PATCH 1/2] virtio_net: convert the statistics into array
From: Jason Wang @ 2012-06-06  7:52 UTC (permalink / raw)
  To: netdev, rusty, virtualization, linux-kernel, mst

Currently, we store the statistics in the independent fields of virtnet_stats,
this is not scalable when we want to add more counters. As suggested by Michael,
this patch convert it to an array and use the enum as the index to access them.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |   30 +++++++++++++++++-------------
 1 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..6e4aa6f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,13 +41,17 @@ module_param(gso, bool, 0444);
 #define VIRTNET_SEND_COMMAND_SG_MAX    2
 #define VIRTNET_DRIVER_VERSION "1.0.0"
 
+enum virtnet_stats_type {
+	VIRTNET_TX_BYTES,
+	VIRTNET_TX_PACKETS,
+	VIRTNET_RX_BYTES,
+	VIRTNET_RX_PACKETS,
+	VIRTNET_NUM_STATS,
+};
+
 struct virtnet_stats {
 	struct u64_stats_sync syncp;
-	u64 tx_bytes;
-	u64 tx_packets;
-
-	u64 rx_bytes;
-	u64 rx_packets;
+	u64 data[VIRTNET_NUM_STATS];
 };
 
 struct virtnet_info {
@@ -301,8 +305,8 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
 	hdr = skb_vnet_hdr(skb);
 
 	u64_stats_update_begin(&stats->syncp);
-	stats->rx_bytes += skb->len;
-	stats->rx_packets++;
+	stats->data[VIRTNET_RX_BYTES] += skb->len;
+	stats->data[VIRTNET_RX_PACKETS]++;
 	u64_stats_update_end(&stats->syncp);
 
 	if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
@@ -566,8 +570,8 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
 		pr_debug("Sent skb %p\n", skb);
 
 		u64_stats_update_begin(&stats->syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
+		stats->data[VIRTNET_TX_BYTES] += skb->len;
+		stats->data[VIRTNET_TX_PACKETS]++;
 		u64_stats_update_end(&stats->syncp);
 
 		tot_sgs += skb_vnet_hdr(skb)->num_sg;
@@ -704,10 +708,10 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 
 		do {
 			start = u64_stats_fetch_begin(&stats->syncp);
-			tpackets = stats->tx_packets;
-			tbytes   = stats->tx_bytes;
-			rpackets = stats->rx_packets;
-			rbytes   = stats->rx_bytes;
+			tpackets = stats->data[VIRTNET_TX_PACKETS];
+			tbytes   = stats->data[VIRTNET_TX_BYTES];
+			rpackets = stats->data[VIRTNET_RX_PACKETS];
+			rbytes   = stats->data[VIRTNET_RX_BYTES];
 		} while (u64_stats_fetch_retry(&stats->syncp, start));
 
 		tot->rx_packets += rpackets;

^ permalink raw reply related

* [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Jason Wang @ 2012-06-06  7:52 UTC (permalink / raw)
  To: netdev, rusty, virtualization, linux-kernel, mst
In-Reply-To: <20120606075208.29081.75284.stgit@amd-6168-8-1.englab.nay.redhat.com>

Satistics counters is useful for debugging and performance optimization, so this
patch lets virtio_net driver collect following and export them to userspace
through "ethtool -S":

- number of packets sent/received
- number of bytes sent/received
- number of callbacks for tx/rx
- number of kick for tx/rx
- number of bytes/packets queued for tx

As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
collected like:

NIC statistics:
     tx_bytes[0]: 1731209929
     tx_packets[0]: 60685
     tx_kicks[0]: 63
     tx_callbacks[0]: 73
     tx_queued_bytes[0]: 1935749360
     tx_queued_packets[0]: 80652
     rx_bytes[0]: 2695648
     rx_packets[0]: 40767
     rx_kicks[0]: 1
     rx_callbacks[0]: 2077
     tx_bytes[1]: 9105588697
     tx_packets[1]: 344150
     tx_kicks[1]: 162
     tx_callbacks[1]: 905
     tx_queued_bytes[1]: 8901049412
     tx_queued_packets[1]: 324184
     rx_bytes[1]: 23679828
     rx_packets[1]: 358770
     rx_kicks[1]: 6
     rx_callbacks[1]: 17717
     tx_bytes: 10836798626
     tx_packets: 404835
     tx_kicks: 225
     tx_callbacks: 978
     tx_queued_bytes: 10836798772
     tx_queued_packets: 404836
     rx_bytes: 26375476
     rx_packets: 399537
     rx_kicks: 7
     rx_callbacks: 19794

TODO:

- more statistics
- calculate the pending bytes/pkts

Signed-off-by: Jason Wang <jasowang@redhat.com>

---
Changes from v1:

- style & typo fixs
- convert the statistics fields to array
- use unlikely()
---
 drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6e4aa6f..909a0a7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
 enum virtnet_stats_type {
 	VIRTNET_TX_BYTES,
 	VIRTNET_TX_PACKETS,
+	VIRTNET_TX_KICKS,
+	VIRTNET_TX_CBS,
+	VIRTNET_TX_Q_BYTES,
+	VIRTNET_TX_Q_PACKETS,
 	VIRTNET_RX_BYTES,
 	VIRTNET_RX_PACKETS,
+	VIRTNET_RX_KICKS,
+	VIRTNET_RX_CBS,
 	VIRTNET_NUM_STATS,
 };
 
@@ -54,6 +60,21 @@ struct virtnet_stats {
 	u64 data[VIRTNET_NUM_STATS];
 };
 
+static struct {
+	char string[ETH_GSTRING_LEN];
+} virtnet_stats_str_attr[] = {
+	{ "tx_bytes" },
+	{ "tx_packets" },
+	{ "tx_kicks" },
+	{ "tx_callbacks" },
+	{ "tx_queued_bytes" },
+	{ "tx_queued_packets" },
+	{ "rx_bytes" },
+	{ "rx_packets" },
+	{ "rx_kicks" },
+	{ "rx_callbacks" },
+};
+
 struct virtnet_info {
 	struct virtio_device *vdev;
 	struct virtqueue *rvq, *svq, *cvq;
@@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
 static void skb_xmit_done(struct virtqueue *svq)
 {
 	struct virtnet_info *vi = svq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+
+	u64_stats_update_begin(&stats->syncp);
+	stats->data[VIRTNET_TX_CBS]++;
+	u64_stats_update_end(&stats->syncp);
 
 	/* Suppress further interrupts. */
 	virtqueue_disable_cb(svq);
@@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
 {
 	int err;
 	bool oom;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 
 	do {
 		if (vi->mergeable_rx_bufs)
@@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
 	} while (err > 0);
 	if (unlikely(vi->num > vi->max))
 		vi->max = vi->num;
-	virtqueue_kick(vi->rvq);
+	if (virtqueue_kick_prepare(vi->rvq)) {
+		virtqueue_notify(vi->rvq);
+		u64_stats_update_begin(&stats->syncp);
+		stats->data[VIRTNET_RX_KICKS]++;
+		u64_stats_update_end(&stats->syncp);
+	}
 	return !oom;
 }
 
 static void skb_recv_done(struct virtqueue *rvq)
 {
 	struct virtnet_info *vi = rvq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+
+	u64_stats_update_begin(&stats->syncp);
+	stats->data[VIRTNET_RX_CBS]++;
+	u64_stats_update_end(&stats->syncp);
+
 	/* Schedule NAPI, Suppress further interrupts if successful. */
 	if (napi_schedule_prep(&vi->napi)) {
 		virtqueue_disable_cb(rvq);
@@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	int capacity;
+	bool kick;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(vi);
@@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
-	virtqueue_kick(vi->svq);
+
+	kick = virtqueue_kick_prepare(vi->svq);
+	if (unlikely(kick))
+		virtqueue_notify(vi->svq);
+
+	u64_stats_update_begin(&stats->syncp);
+	if (unlikely(kick))
+		stats->data[VIRTNET_TX_KICKS]++;
+	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
+	stats->data[VIRTNET_TX_Q_PACKETS]++;
+	u64_stats_update_end(&stats->syncp);
 
 	/* Don't wait up for transmitted skbs to be freed. */
 	skb_orphan(skb);
@@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
 
 }
 
+static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+	int i, cpu;
+	switch (stringset) {
+	case ETH_SS_STATS:
+		for_each_possible_cpu(cpu)
+			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+				sprintf(buf, "%s[%u]",
+					virtnet_stats_str_attr[i].string, cpu);
+				buf += ETH_GSTRING_LEN;
+			}
+		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+			memcpy(buf, virtnet_stats_str_attr[i].string,
+				ETH_GSTRING_LEN);
+			buf += ETH_GSTRING_LEN;
+		}
+		break;
+	}
+}
+
+static int virtnet_get_sset_count(struct net_device *dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void virtnet_get_ethtool_stats(struct net_device *dev,
+				      struct ethtool_stats *stats, u64 *buf)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int cpu, i;
+	unsigned int start;
+	struct virtnet_stats sample, total;
+
+	memset(&total, 0, sizeof(total));
+
+	for_each_possible_cpu(cpu) {
+		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
+		do {
+			start = u64_stats_fetch_begin(&s->syncp);
+			memcpy(&sample.data, &s->data,
+			       sizeof(u64) * VIRTNET_NUM_STATS);
+		} while (u64_stats_fetch_retry(&s->syncp, start));
+
+		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+			*buf = sample.data[i];
+			total.data[i] += sample.data[i];
+			buf++;
+		}
+	}
+
+	memcpy(buf, &total.data, sizeof(u64) * VIRTNET_NUM_STATS);
+}
+
 static const struct ethtool_ops virtnet_ethtool_ops = {
 	.get_drvinfo = virtnet_get_drvinfo,
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = virtnet_get_ringparam,
+	.get_ethtool_stats = virtnet_get_ethtool_stats,
+	.get_strings = virtnet_get_strings,
+	.get_sset_count = virtnet_get_sset_count,
 };
 
 #define MIN_MTU 68

^ permalink raw reply related

* Re: [V2 RFC net-next PATCH 1/2] virtio_net: convert the statistics into array
From: Eric Dumazet @ 2012-06-06  8:22 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, mst, linux-kernel, virtualization
In-Reply-To: <20120606075208.29081.75284.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, 2012-06-06 at 15:52 +0800, Jason Wang wrote:
> Currently, we store the statistics in the independent fields of virtnet_stats,
> this is not scalable when we want to add more counters. As suggested by Michael,
> this patch convert it to an array and use the enum as the index to access them.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |   30 +++++++++++++++++-------------
>  1 files changed, 17 insertions(+), 13 deletions(-)
> 

>  struct virtnet_stats {
>  	struct u64_stats_sync syncp;
> -	u64 tx_bytes;
> -	u64 tx_packets;
> -
> -	u64 rx_bytes;
> -	u64 rx_packets;
> +	u64 data[VIRTNET_NUM_STATS];
>  };
>  

Interesting, but I fear you'll have a lot of problems.

Current code is buggy, and you are adding more possible races.

We could have one cpu doing the :

       u64_stats_update_begin(&stats->syncp);
       stats->rx_bytes += skb->len;
       stats->rx_packets++;
       u64_stats_update_end(&stats->syncp);

And another one doing :

       u64_stats_update_begin(&stats->syncp);
       stats->tx_bytes += skb->len;
       stats->tx_packets++;
       u64_stats_update_end(&stats->syncp);
 
And one syncp sequence increment can be lost, since both cpus are
basically doing this at the same time :

    write_seqcount_begin(&syncp->seq);

I'll send a fix in a separate thread.

^ permalink raw reply

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Michael S. Tsirkin @ 2012-06-06  8:27 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606075217.29081.30713.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
> Satistics counters is useful for debugging and performance optimization, so this
> patch lets virtio_net driver collect following and export them to userspace
> through "ethtool -S":
> 
> - number of packets sent/received
> - number of bytes sent/received
> - number of callbacks for tx/rx
> - number of kick for tx/rx
> - number of bytes/packets queued for tx
> 
> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
> collected like:
> 
> NIC statistics:
>      tx_bytes[0]: 1731209929
>      tx_packets[0]: 60685
>      tx_kicks[0]: 63
>      tx_callbacks[0]: 73
>      tx_queued_bytes[0]: 1935749360
>      tx_queued_packets[0]: 80652
>      rx_bytes[0]: 2695648
>      rx_packets[0]: 40767
>      rx_kicks[0]: 1
>      rx_callbacks[0]: 2077
>      tx_bytes[1]: 9105588697
>      tx_packets[1]: 344150
>      tx_kicks[1]: 162
>      tx_callbacks[1]: 905
>      tx_queued_bytes[1]: 8901049412
>      tx_queued_packets[1]: 324184
>      rx_bytes[1]: 23679828
>      rx_packets[1]: 358770
>      rx_kicks[1]: 6
>      rx_callbacks[1]: 17717
>      tx_bytes: 10836798626
>      tx_packets: 404835
>      tx_kicks: 225
>      tx_callbacks: 978
>      tx_queued_bytes: 10836798772
>      tx_queued_packets: 404836
>      rx_bytes: 26375476
>      rx_packets: 399537
>      rx_kicks: 7
>      rx_callbacks: 19794
> 
> TODO:
> 
> - more statistics
> - calculate the pending bytes/pkts
>

Do we need that? pending is (queued - packets), no?
 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> ---
> Changes from v1:
> 
> - style & typo fixs
> - convert the statistics fields to array
> - use unlikely()
> ---
>  drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 113 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6e4aa6f..909a0a7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>  enum virtnet_stats_type {
>  	VIRTNET_TX_BYTES,
>  	VIRTNET_TX_PACKETS,
> +	VIRTNET_TX_KICKS,
> +	VIRTNET_TX_CBS,
> +	VIRTNET_TX_Q_BYTES,
> +	VIRTNET_TX_Q_PACKETS,
>  	VIRTNET_RX_BYTES,
>  	VIRTNET_RX_PACKETS,
> +	VIRTNET_RX_KICKS,
> +	VIRTNET_RX_CBS,
>  	VIRTNET_NUM_STATS,
>  };
>  
> @@ -54,6 +60,21 @@ struct virtnet_stats {
>  	u64 data[VIRTNET_NUM_STATS];
>  };
>  
> +static struct {

static const?

> +	char string[ETH_GSTRING_LEN];
> +} virtnet_stats_str_attr[] = {
> +	{ "tx_bytes" },
> +	{ "tx_packets" },
> +	{ "tx_kicks" },
> +	{ "tx_callbacks" },
> +	{ "tx_queued_bytes" },
> +	{ "tx_queued_packets" },
> +	{ "rx_bytes" },
> +	{ "rx_packets" },
> +	{ "rx_kicks" },
> +	{ "rx_callbacks" },
> +};
> +
>  struct virtnet_info {
>  	struct virtio_device *vdev;
>  	struct virtqueue *rvq, *svq, *cvq;
> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>  static void skb_xmit_done(struct virtqueue *svq)
>  {
>  	struct virtnet_info *vi = svq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_TX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Suppress further interrupts. */
>  	virtqueue_disable_cb(svq);
> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
>  	bool oom;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
>  	do {
>  		if (vi->mergeable_rx_bufs)
> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  	} while (err > 0);
>  	if (unlikely(vi->num > vi->max))
>  		vi->max = vi->num;
> -	virtqueue_kick(vi->rvq);
> +	if (virtqueue_kick_prepare(vi->rvq)) {

if (unlikely())
also move stats here where they are actually used?

> +		virtqueue_notify(vi->rvq);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->data[VIRTNET_RX_KICKS]++;
> +		u64_stats_update_end(&stats->syncp);
> +	}
>  	return !oom;
>  }
>  
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_RX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
> +
>  	/* Schedule NAPI, Suppress further interrupts if successful. */
>  	if (napi_schedule_prep(&vi->napi)) {
>  		virtqueue_disable_cb(rvq);
> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	int capacity;
> +	bool kick;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> -	virtqueue_kick(vi->svq);
> +
> +	kick = virtqueue_kick_prepare(vi->svq);
> +	if (unlikely(kick))
> +		virtqueue_notify(vi->svq);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	if (unlikely(kick))
> +		stats->data[VIRTNET_TX_KICKS]++;
> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Don't wait up for transmitted skbs to be freed. */
>  	skb_orphan(skb);
> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> +{
> +	int i, cpu;
> +	switch (stringset) {
> +	case ETH_SS_STATS:
> +		for_each_possible_cpu(cpu)
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				sprintf(buf, "%s[%u]",
> +					virtnet_stats_str_attr[i].string, cpu);
> +				buf += ETH_GSTRING_LEN;

I would do
	 ret = snprintf(buf, ETH_GSTRING_LEN, ...)
	 BUG_ON(ret >= ETH_GSTRING_LEN);
here to make it more robust.

> +			}
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			memcpy(buf, virtnet_stats_str_attr[i].string,
> +				ETH_GSTRING_LEN);
> +			buf += ETH_GSTRING_LEN;
> +		}

		So why not just memcpy the whole array there?
		memcpy(buf, virtnet_stats_str_attr,
		       sizeof virtnet_stats_str_attr);

> +		break;
> +	}
> +}
> +
> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:

also add
	BUILD_BUG_ON(VIRTNET_NUM_STATS != (sizeof virtnet_stats_str_attr) / ETH_GSTRING_LEN);


> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void virtnet_get_ethtool_stats(struct net_device *dev,
> +				      struct ethtool_stats *stats, u64 *buf)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int cpu, i;
> +	unsigned int start;
> +	struct virtnet_stats sample, total;
> +
> +	memset(&total, 0, sizeof(total));

sizeof total
when operand is a variable,
to distinguish from when it is a type.

> +
> +	for_each_possible_cpu(cpu) {
> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&s->syncp);
> +			memcpy(&sample.data, &s->data,
> +			       sizeof(u64) * VIRTNET_NUM_STATS);
> +		} while (u64_stats_fetch_retry(&s->syncp, start));
> +
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			*buf = sample.data[i];
> +			total.data[i] += sample.data[i];
> +			buf++;
> +		}
> +	}
> +
> +	memcpy(buf, &total.data, sizeof(u64) * VIRTNET_NUM_STATS);
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
> +	.get_strings = virtnet_get_strings,
> +	.get_sset_count = virtnet_get_sset_count,
>  };
>  
>  #define MIN_MTU 68

^ permalink raw reply

* [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06  8:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, Stephen Hemminger

From: Eric Dumazet <edumazet@google.com>

commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
on 32bit arches.

We must use separate syncp for rx and tx path as they can be run at the
same time on different cpus. Thus one sequence increment can be lost and
readers spin forever.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..f18149a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -42,7 +42,8 @@ module_param(gso, bool, 0444);
 #define VIRTNET_DRIVER_VERSION "1.0.0"
 
 struct virtnet_stats {
-	struct u64_stats_sync syncp;
+	struct u64_stats_sync tx_syncp;
+	struct u64_stats_sync rx_syncp;
 	u64 tx_bytes;
 	u64 tx_packets;
 
@@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len)
 
 	hdr = skb_vnet_hdr(skb);
 
-	u64_stats_update_begin(&stats->syncp);
+	u64_stats_update_begin(&stats->rx_syncp);
 	stats->rx_bytes += skb->len;
 	stats->rx_packets++;
-	u64_stats_update_end(&stats->syncp);
+	u64_stats_update_end(&stats->rx_syncp);
 
 	if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
 		pr_debug("Needs csum!\n");
@@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
 	while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->syncp);
+		u64_stats_update_begin(&stats->tx_syncp);
 		stats->tx_bytes += skb->len;
 		stats->tx_packets++;
-		u64_stats_update_end(&stats->syncp);
+		u64_stats_update_end(&stats->tx_syncp);
 
 		tot_sgs += skb_vnet_hdr(skb)->num_sg;
 		dev_kfree_skb_any(skb);
@@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 		u64 tpackets, tbytes, rpackets, rbytes;
 
 		do {
-			start = u64_stats_fetch_begin(&stats->syncp);
+			start = u64_stats_fetch_begin(&stats->tx_syncp);
 			tpackets = stats->tx_packets;
 			tbytes   = stats->tx_bytes;
+		} while (u64_stats_fetch_retry(&stats->tx_syncp, start));
+
+		do {
+			start = u64_stats_fetch_begin(&stats->rx_syncp);
 			rpackets = stats->rx_packets;
 			rbytes   = stats->rx_bytes;
-		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		} while (u64_stats_fetch_retry(&stats->rx_syncp, start));
 
 		tot->rx_packets += rpackets;
 		tot->tx_packets += tpackets;

^ permalink raw reply related

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06  8:45 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338971724.2760.3913.camel@edumazet-glaptop>

On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> on 32bit arches.
> 
> We must use separate syncp for rx and tx path as they can be run at the
> same time on different cpus. Thus one sequence increment can be lost and
> readers spin forever.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> ---

Just to make clear : even using percpu stats/syncp, we have no guarantee
that write_seqcount_begin() is done with one instruction. [1]

It is OK on x86 if "incl" instruction is generated by the compiler, but
on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
be interrupted.

So if you are 100% sure all paths are safe against preemption/BH, then
this patch is not needed, but a big comment in the code would avoid
adding possible races in the future.

[1] If done with one instruction, we still have a race, since a reader
might see an even sequence and conclude no writer is inside the critical
section. So read values could be wrong.

^ permalink raw reply

* Re: [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Stefan Hajnoczi @ 2012-06-06  9:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: khoa, Stefan Hajnoczi, kvm, virtualization
In-Reply-To: <20120604111507.GB28673@redhat.com>

On Mon, Jun 4, 2012 at 12:15 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 01, 2012 at 10:13:06AM +0100, Stefan Hajnoczi wrote:
>> Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed that.
>> To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
>> irqs with spin_unlock_irq().  If the caller of blk_run_queue() had irqs
>> disabled and we enable them again this could be a problem, right?  Can someone
>> more familiar with kernel locking comment?
>
> Why take the risk?  What's the advantage of enabling them here? VCPU is
> not running while the hypervisor is processing the notification anyway.
> And the next line returns from the function so the interrupts will get
> enabled.

I agree.  After looking through the code more following Asias' call
chain, I'm happy to use spin_unlock() and not worry about the irq
part.

Will fix.

Stefan

^ permalink raw reply

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Michael S. Tsirkin @ 2012-06-06  9:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606075217.29081.30713.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
> Satistics counters is useful for debugging and performance optimization, so this
> patch lets virtio_net driver collect following and export them to userspace
> through "ethtool -S":
> 
> - number of packets sent/received
> - number of bytes sent/received
> - number of callbacks for tx/rx
> - number of kick for tx/rx
> - number of bytes/packets queued for tx
> 
> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
> collected like:
> 
> NIC statistics:
>      tx_bytes[0]: 1731209929
>      tx_packets[0]: 60685
>      tx_kicks[0]: 63
>      tx_callbacks[0]: 73
>      tx_queued_bytes[0]: 1935749360
>      tx_queued_packets[0]: 80652
>      rx_bytes[0]: 2695648
>      rx_packets[0]: 40767
>      rx_kicks[0]: 1
>      rx_callbacks[0]: 2077
>      tx_bytes[1]: 9105588697
>      tx_packets[1]: 344150
>      tx_kicks[1]: 162
>      tx_callbacks[1]: 905
>      tx_queued_bytes[1]: 8901049412
>      tx_queued_packets[1]: 324184
>      rx_bytes[1]: 23679828
>      rx_packets[1]: 358770
>      rx_kicks[1]: 6
>      rx_callbacks[1]: 17717
>      tx_bytes: 10836798626
>      tx_packets: 404835
>      tx_kicks: 225
>      tx_callbacks: 978
>      tx_queued_bytes: 10836798772
>      tx_queued_packets: 404836
>      rx_bytes: 26375476
>      rx_packets: 399537
>      rx_kicks: 7
>      rx_callbacks: 19794
> 
> TODO:
> 
> - more statistics
> - calculate the pending bytes/pkts
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> ---
> Changes from v1:
> 
> - style & typo fixs
> - convert the statistics fields to array
> - use unlikely()
> ---
>  drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 113 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6e4aa6f..909a0a7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>  enum virtnet_stats_type {
>  	VIRTNET_TX_BYTES,
>  	VIRTNET_TX_PACKETS,
> +	VIRTNET_TX_KICKS,
> +	VIRTNET_TX_CBS,
> +	VIRTNET_TX_Q_BYTES,
> +	VIRTNET_TX_Q_PACKETS,

What about counting the time we spend with queue
stopped and # of times we stop the queue?

>  	VIRTNET_RX_BYTES,
>  	VIRTNET_RX_PACKETS,
> +	VIRTNET_RX_KICKS,
> +	VIRTNET_RX_CBS,

What about a counter for oom on rx?

>  	VIRTNET_NUM_STATS,
>  };
>  
> @@ -54,6 +60,21 @@ struct virtnet_stats {
>  	u64 data[VIRTNET_NUM_STATS];
>  };
>  
> +static struct {
> +	char string[ETH_GSTRING_LEN];
> +} virtnet_stats_str_attr[] = {
> +	{ "tx_bytes" },
> +	{ "tx_packets" },
> +	{ "tx_kicks" },
> +	{ "tx_callbacks" },
> +	{ "tx_queued_bytes" },
> +	{ "tx_queued_packets" },
> +	{ "rx_bytes" },
> +	{ "rx_packets" },
> +	{ "rx_kicks" },
> +	{ "rx_callbacks" },
> +};
> +
>  struct virtnet_info {
>  	struct virtio_device *vdev;
>  	struct virtqueue *rvq, *svq, *cvq;
> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>  static void skb_xmit_done(struct virtqueue *svq)
>  {
>  	struct virtnet_info *vi = svq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_TX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Suppress further interrupts. */
>  	virtqueue_disable_cb(svq);
> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
>  	bool oom;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
>  	do {
>  		if (vi->mergeable_rx_bufs)
> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  	} while (err > 0);
>  	if (unlikely(vi->num > vi->max))
>  		vi->max = vi->num;
> -	virtqueue_kick(vi->rvq);
> +	if (virtqueue_kick_prepare(vi->rvq)) {
> +		virtqueue_notify(vi->rvq);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->data[VIRTNET_RX_KICKS]++;
> +		u64_stats_update_end(&stats->syncp);
> +	}
>  	return !oom;
>  }
>  
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_RX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
> +

This data path so not entirely free.
I am guessing the overhead is not measureable but
did you check?

An alternative is to count when napi callbacks
are envoked. If we also count when weight was exceeded
we get almost the same result.


>  	/* Schedule NAPI, Suppress further interrupts if successful. */
>  	if (napi_schedule_prep(&vi->napi)) {
>  		virtqueue_disable_cb(rvq);
> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	int capacity;
> +	bool kick;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> -	virtqueue_kick(vi->svq);
> +
> +	kick = virtqueue_kick_prepare(vi->svq);
> +	if (unlikely(kick))
> +		virtqueue_notify(vi->svq);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	if (unlikely(kick))
> +		stats->data[VIRTNET_TX_KICKS]++;
> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Don't wait up for transmitted skbs to be freed. */
>  	skb_orphan(skb);
> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> +{
> +	int i, cpu;
> +	switch (stringset) {
> +	case ETH_SS_STATS:
> +		for_each_possible_cpu(cpu)
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				sprintf(buf, "%s[%u]",
> +					virtnet_stats_str_attr[i].string, cpu);
> +				buf += ETH_GSTRING_LEN;
> +			}
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			memcpy(buf, virtnet_stats_str_attr[i].string,
> +				ETH_GSTRING_LEN);
> +			buf += ETH_GSTRING_LEN;
> +		}
> +		break;
> +	}
> +}
> +
> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void virtnet_get_ethtool_stats(struct net_device *dev,
> +				      struct ethtool_stats *stats, u64 *buf)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int cpu, i;
> +	unsigned int start;
> +	struct virtnet_stats sample, total;
> +
> +	memset(&total, 0, sizeof(total));
> +
> +	for_each_possible_cpu(cpu) {
> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&s->syncp);
> +			memcpy(&sample.data, &s->data,
> +			       sizeof(u64) * VIRTNET_NUM_STATS);
> +		} while (u64_stats_fetch_retry(&s->syncp, start));
> +
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			*buf = sample.data[i];
> +			total.data[i] += sample.data[i];
> +			buf++;
> +		}
> +	}
> +
> +	memcpy(buf, &total.data, sizeof(u64) * VIRTNET_NUM_STATS);
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
> +	.get_strings = virtnet_get_strings,
> +	.get_sset_count = virtnet_get_sset_count,
>  };
>  
>  #define MIN_MTU 68

^ permalink raw reply

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Jason Wang @ 2012-06-06  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606082752.GA12767@redhat.com>

On 06/06/2012 04:27 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
>> Satistics counters is useful for debugging and performance optimization, so this
>> patch lets virtio_net driver collect following and export them to userspace
>> through "ethtool -S":
>>
>> - number of packets sent/received
>> - number of bytes sent/received
>> - number of callbacks for tx/rx
>> - number of kick for tx/rx
>> - number of bytes/packets queued for tx
>>
>> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
>> collected like:
>>
>> NIC statistics:
>>       tx_bytes[0]: 1731209929
>>       tx_packets[0]: 60685
>>       tx_kicks[0]: 63
>>       tx_callbacks[0]: 73
>>       tx_queued_bytes[0]: 1935749360
>>       tx_queued_packets[0]: 80652
>>       rx_bytes[0]: 2695648
>>       rx_packets[0]: 40767
>>       rx_kicks[0]: 1
>>       rx_callbacks[0]: 2077
>>       tx_bytes[1]: 9105588697
>>       tx_packets[1]: 344150
>>       tx_kicks[1]: 162
>>       tx_callbacks[1]: 905
>>       tx_queued_bytes[1]: 8901049412
>>       tx_queued_packets[1]: 324184
>>       rx_bytes[1]: 23679828
>>       rx_packets[1]: 358770
>>       rx_kicks[1]: 6
>>       rx_callbacks[1]: 17717
>>       tx_bytes: 10836798626
>>       tx_packets: 404835
>>       tx_kicks: 225
>>       tx_callbacks: 978
>>       tx_queued_bytes: 10836798772
>>       tx_queued_packets: 404836
>>       rx_bytes: 26375476
>>       rx_packets: 399537
>>       rx_kicks: 7
>>       rx_callbacks: 19794
>>
>> TODO:
>>
>> - more statistics
>> - calculate the pending bytes/pkts
>>
> Do we need that? pending is (queued - packets), no?
>   

No, if we choose to calculate by tools.
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>
>> ---
>> Changes from v1:
>>
>> - style&  typo fixs
>> - convert the statistics fields to array
>> - use unlikely()
>> ---
>>   drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 files changed, 113 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 6e4aa6f..909a0a7 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>>   enum virtnet_stats_type {
>>   	VIRTNET_TX_BYTES,
>>   	VIRTNET_TX_PACKETS,
>> +	VIRTNET_TX_KICKS,
>> +	VIRTNET_TX_CBS,
>> +	VIRTNET_TX_Q_BYTES,
>> +	VIRTNET_TX_Q_PACKETS,
>>   	VIRTNET_RX_BYTES,
>>   	VIRTNET_RX_PACKETS,
>> +	VIRTNET_RX_KICKS,
>> +	VIRTNET_RX_CBS,
>>   	VIRTNET_NUM_STATS,
>>   };
>>
>> @@ -54,6 +60,21 @@ struct virtnet_stats {
>>   	u64 data[VIRTNET_NUM_STATS];
>>   };
>>
>> +static struct {
> static const?
>

Sorry, forget this.
>> +	char string[ETH_GSTRING_LEN];
>> +} virtnet_stats_str_attr[] = {
>> +	{ "tx_bytes" },
>> +	{ "tx_packets" },
>> +	{ "tx_kicks" },
>> +	{ "tx_callbacks" },
>> +	{ "tx_queued_bytes" },
>> +	{ "tx_queued_packets" },
>> +	{ "rx_bytes" },
>> +	{ "rx_packets" },
>> +	{ "rx_kicks" },
>> +	{ "rx_callbacks" },
>> +};
>> +
>>   struct virtnet_info {
>>   	struct virtio_device *vdev;
>>   	struct virtqueue *rvq, *svq, *cvq;
>> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>>   static void skb_xmit_done(struct virtqueue *svq)
>>   {
>>   	struct virtnet_info *vi = svq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->data[VIRTNET_TX_CBS]++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Suppress further interrupts. */
>>   	virtqueue_disable_cb(svq);
>> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   {
>>   	int err;
>>   	bool oom;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>
>>   	do {
>>   		if (vi->mergeable_rx_bufs)
>> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   	} while (err>  0);
>>   	if (unlikely(vi->num>  vi->max))
>>   		vi->max = vi->num;
>> -	virtqueue_kick(vi->rvq);
>> +	if (virtqueue_kick_prepare(vi->rvq)) {
> if (unlikely())
> also move stats here where they are actually used?

Sure.
>> +		virtqueue_notify(vi->rvq);
>> +		u64_stats_update_begin(&stats->syncp);
>> +		stats->data[VIRTNET_RX_KICKS]++;
>> +		u64_stats_update_end(&stats->syncp);
>> +	}
>>   	return !oom;
>>   }
>>
>>   static void skb_recv_done(struct virtqueue *rvq)
>>   {
>>   	struct virtnet_info *vi = rvq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->data[VIRTNET_RX_CBS]++;
>> +	u64_stats_update_end(&stats->syncp);
>> +
>>   	/* Schedule NAPI, Suppress further interrupts if successful. */
>>   	if (napi_schedule_prep(&vi->napi)) {
>>   		virtqueue_disable_cb(rvq);
>> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>>   static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct virtnet_info *vi = netdev_priv(dev);
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>   	int capacity;
>> +	bool kick;
>>
>>   	/* Free up any pending old buffers before queueing new ones. */
>>   	free_old_xmit_skbs(vi);
>> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   		kfree_skb(skb);
>>   		return NETDEV_TX_OK;
>>   	}
>> -	virtqueue_kick(vi->svq);
>> +
>> +	kick = virtqueue_kick_prepare(vi->svq);
>> +	if (unlikely(kick))
>> +		virtqueue_notify(vi->svq);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	if (unlikely(kick))
>> +		stats->data[VIRTNET_TX_KICKS]++;
>> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
>> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Don't wait up for transmitted skbs to be freed. */
>>   	skb_orphan(skb);
>> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>>
>>   }
>>
>> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
>> +{
>> +	int i, cpu;
>> +	switch (stringset) {
>> +	case ETH_SS_STATS:
>> +		for_each_possible_cpu(cpu)
>> +			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +				sprintf(buf, "%s[%u]",
>> +					virtnet_stats_str_attr[i].string, cpu);
>> +				buf += ETH_GSTRING_LEN;
> I would do
> 	 ret = snprintf(buf, ETH_GSTRING_LEN, ...)
> 	 BUG_ON(ret>= ETH_GSTRING_LEN);
> here to make it more robust.

Ok.
>> +			}
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			memcpy(buf, virtnet_stats_str_attr[i].string,
>> +				ETH_GSTRING_LEN);
>> +			buf += ETH_GSTRING_LEN;
>> +		}
> 		So why not just memcpy the whole array there?
> 		memcpy(buf, virtnet_stats_str_attr,
> 		       sizeof virtnet_stats_str_attr);
>
>> +		break;
>> +	}
>> +}
>> +
>> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
>> +{
>> +	switch (sset) {
>> +	case ETH_SS_STATS:
> also add
> 	BUILD_BUG_ON(VIRTNET_NUM_STATS != (sizeof virtnet_stats_str_attr) / ETH_GSTRING_LEN);
>

Ok.
>> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +}
>> +
>> +static void virtnet_get_ethtool_stats(struct net_device *dev,
>> +				      struct ethtool_stats *stats, u64 *buf)
>> +{
>> +	struct virtnet_info *vi = netdev_priv(dev);
>> +	int cpu, i;
>> +	unsigned int start;
>> +	struct virtnet_stats sample, total;
>> +
>> +	memset(&total, 0, sizeof(total));
> sizeof total
> when operand is a variable,
> to distinguish from when it is a type.

Sure.
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
>> +		do {
>> +			start = u64_stats_fetch_begin(&s->syncp);
>> +			memcpy(&sample.data,&s->data,
>> +			       sizeof(u64) * VIRTNET_NUM_STATS);
>> +		} while (u64_stats_fetch_retry(&s->syncp, start));
>> +
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			*buf = sample.data[i];
>> +			total.data[i] += sample.data[i];
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	memcpy(buf,&total.data, sizeof(u64) * VIRTNET_NUM_STATS);
>> +}
>> +
>>   static const struct ethtool_ops virtnet_ethtool_ops = {
>>   	.get_drvinfo = virtnet_get_drvinfo,
>>   	.get_link = ethtool_op_get_link,
>>   	.get_ringparam = virtnet_get_ringparam,
>> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
>> +	.get_strings = virtnet_get_strings,
>> +	.get_sset_count = virtnet_get_sset_count,
>>   };
>>
>>   #define MIN_MTU 68

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Jason Wang @ 2012-06-06  9:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: mst, netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338972341.2760.3944.camel@edumazet-glaptop>

On 06/06/2012 04:45 PM, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet<edumazet@google.com>
>>
>> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
>> on 32bit arches.
>>
>> We must use separate syncp for rx and tx path as they can be run at the
>> same time on different cpus. Thus one sequence increment can be lost and
>> readers spin forever.
>>
>> Signed-off-by: Eric Dumazet<edumazet@google.com>
>> Cc: Stephen Hemminger<shemminger@vyatta.com>
>> Cc: Michael S. Tsirkin<mst@redhat.com>
>> Cc: Jason Wang<jasowang@redhat.com>
>> ---
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
>
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
>
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

Thanks for explaing, current virtio-net is safe I think. But the patch 
is still needed as my patch would update the statistics in irq.
>
> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 11:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338972341.2760.3944.camel@edumazet-glaptop>

On Wed, Jun 06, 2012 at 10:45:41AM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> > on 32bit arches.
> > 
> > We must use separate syncp for rx and tx path as they can be run at the
> > same time on different cpus. Thus one sequence increment can be lost and
> > readers spin forever.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Stephen Hemminger <shemminger@vyatta.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > ---
> 
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
> 
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
> 
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

We currently do all stats either on napi callback or from
start_xmit callback.
This makes them safe, yes?

> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
> 
> 

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 13:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606111357.GA15070@redhat.com>

On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:

> We currently do all stats either on napi callback or from
> start_xmit callback.
> This makes them safe, yes?

Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
include/linux/u64_stats_sync.h section 6)

 * 6) If counter might be written by an interrupt, readers should block interrupts.
 *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
 *     read partial values)

Yes, its tricky...

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..705aaa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 		u64 tpackets, tbytes, rpackets, rbytes;
 
 		do {
-			start = u64_stats_fetch_begin(&stats->syncp);
+			start = u64_stats_fetch_begin_bh(&stats->syncp);
 			tpackets = stats->tx_packets;
 			tbytes   = stats->tx_bytes;
 			rpackets = stats->rx_packets;
 			rbytes   = stats->rx_bytes;
-		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		} while (u64_stats_fetch_retry_bh(&stats->syncp, start));
 
 		tot->rx_packets += rpackets;
 		tot->tx_packets += tpackets;

^ permalink raw reply related

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 14:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338988210.2760.4485.camel@edumazet-glaptop>

On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> 
> > We currently do all stats either on napi callback or from
> > start_xmit callback.
> > This makes them safe, yes?
> 
> Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> include/linux/u64_stats_sync.h section 6)
> 
>  * 6) If counter might be written by an interrupt, readers should block interrupts.
>  *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
>  *     read partial values)
> 
> Yes, its tricky...

Sounds good, but I have a question: this realies on counters
being atomic on 64 bit.
Would not it be better to always use a seqlock even on 64 bit?
This way counters would actually be correct and in sync.
As it is if we want e.g. average packet size,
we can not rely e.g. on it being bytes/packets.

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..705aaa7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
>  		u64 tpackets, tbytes, rpackets, rbytes;
>  
>  		do {
> -			start = u64_stats_fetch_begin(&stats->syncp);
> +			start = u64_stats_fetch_begin_bh(&stats->syncp);
>  			tpackets = stats->tx_packets;
>  			tbytes   = stats->tx_bytes;
>  			rpackets = stats->rx_packets;
>  			rbytes   = stats->rx_bytes;
> -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		} while (u64_stats_fetch_retry_bh(&stats->syncp, start));
>  
>  		tot->rx_packets += rpackets;
>  		tot->tx_packets += tpackets;
> 

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Stephen Hemminger @ 2012-06-06 15:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Eric Dumazet, netdev, linux-kernel, virtualization
In-Reply-To: <20120606144941.GA17092@redhat.com>

On Wed, 6 Jun 2012 17:49:42 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

This has not been a requirement on real physical devices; therefore
the added overhead is not really justified.

Many network cards use counters in hardware to count packets/bytes
and there is no expectation of atomic access there.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 15:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606144941.GA17092@redhat.com>

On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> > 
> > > We currently do all stats either on napi callback or from
> > > start_xmit callback.
> > > This makes them safe, yes?
> > 
> > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > include/linux/u64_stats_sync.h section 6)
> > 
> >  * 6) If counter might be written by an interrupt, readers should block interrupts.
> >  *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> >  *     read partial values)
> > 
> > Yes, its tricky...
> 
> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

When this stuff was discussed, we chose to have a nop on 64bits.

Your point has little to do with 64bit stats, it was already like that
with 'long int' counters.

Consider average driver doing :

dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;

A concurrent reader can read an updated rx_bytes and a 'previous'
rx_packets one.

'fixing' this requires a lot of work and memory barriers (in all
drivers), for a very litle gain (at most one packet error)

u64_stats_sync was really meant to be 0-cost on 64bit arches.

^ permalink raw reply

* Re: [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Stefan Hajnoczi @ 2012-06-06 15:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: khoa, Stefan Hajnoczi, kvm, virtualization
In-Reply-To: <20120604111134.GA28673@redhat.com>

On Mon, Jun 4, 2012 at 12:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Jun 01, 2012 at 10:13:06AM +0100, Stefan Hajnoczi wrote:
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 774c31d..d674977 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
>>               issued++;
>>       }
>>
>> -     if (issued)
>> -             virtqueue_kick(vblk->vq);
>> +     if (!issued)
>> +             return;
>> +
>> +     if (virtqueue_kick_prepare(vblk->vq)) {
>> +             spin_unlock_irq(vblk->disk->queue->queue_lock);
>> +             virtqueue_notify(vblk->vq);
>
> If blk_done runs and completes the request at this point,
> can hot unplug then remove the queue?
> If yes will we get a use after free?

This is a difficult question, I haven't been able to decide one way or
another.  The use-after-free is the
spin_lock_irq(vblk->disk->queue->queue_lock).

It still doesn't explain why existing drivers are doing this.  In nbd,
for example, I can't see anything preventing the same situation in
drivers/block/nbd.c:do_nbd_request() between wake_up() and
spin_lock_irq(q->queue_lock).  If the request completes (like in your
example scenario) then the module remove code path has no way of
knowing there is still a thread in do_nbd_request().

Any ideas?

Stefan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox