virtio-net: tx queue was stopped

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* virtio-net: tx queue was stopped
@ 2015-03-15  6:50 Linhaifeng
  2015-03-15  8:40 ` Michael S. Tsirkin
  0 siblings, 1 reply; 5+ messages in thread
From: Linhaifeng @ 2015-03-15  6:50 UTC (permalink / raw)
  To: netdev; +Cc: Michael S. Tsirkin, lilijun, liuyongan@huawei.com, lixiao (H)

Hi,Michael

I had tested the start_xmit function by the follow code found that the tx queue's state is stopped and can't send any packets anymore.

static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
{
	... ...


        capacity = 10;	//########## test code : force to call netif_stop_queue

        if (capacity < 2+MAX_SKB_FRAGS) {
                netif_stop_queue(dev);

                if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
                        /* More just got used, free them then recheck. */
                        capacity += free_old_xmit_skbs(vi);
                        dev_warn(&dev->dev, "free_old_xmit_skbs capacity =%d MAX_SKB_FRAGS=%d", capacity, MAX_SKB_FRAGS);

                        capacity = 10;		//########## test code : force not to call  netif_start_queue

                        if (capacity >= 2+MAX_SKB_FRAGS) {
                                netif_start_queue(dev);
                                virtqueue_disable_cb(vi->svq);
                        } else {
				//########## OTOH if often enter this branch tx queue maybe stopped.
			}
			
                }

		//########## Should we start queue here? I found that sometimes skb_xmit_done run before netif_stop_queue if this occurred the queue's state is
		//########## stopped and have to reload virtio-net module to restore network.

        }
	
}

ping 9.62.1.2 -i 0.1
64 bytes from 9.62.1.2: icmp_seq=19 ttl=64 time=0.115 ms
64 bytes from 9.62.1.2: icmp_seq=20 ttl=64 time=0.101 ms
64 bytes from 9.62.1.2: icmp_seq=21 ttl=64 time=0.094 ms
64 bytes from 9.62.1.2: icmp_seq=22 ttl=64 time=0.098 ms
64 bytes from 9.62.1.2: icmp_seq=23 ttl=64 time=0.097 ms
64 bytes from 9.62.1.2: icmp_seq=24 ttl=64 time=0.095 ms
64 bytes from 9.62.1.2: icmp_seq=25 ttl=64 time=0.095 ms
....
ping:  sendmsg:  No buffer space available
ping:  sendmsg:  No buffer space available
ping:  sendmsg:  No buffer space available
ping:  sendmsg:  No buffer space available
ping:  sendmsg:  No buffer space available
ping:  sendmsg:  No buffer space available
....

-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-net: tx queue was stopped
  2015-03-15  6:50 virtio-net: tx queue was stopped Linhaifeng
@ 2015-03-15  8:40 ` Michael S. Tsirkin
  2015-03-16  9:24   ` Linhaifeng
  0 siblings, 1 reply; 5+ messages in thread
From: Michael S. Tsirkin @ 2015-03-15  8:40 UTC (permalink / raw)
  To: Linhaifeng
  Cc: netdev, lilijun, liuyongan@huawei.com, lixiao (H), virtualization,
	Rusty Russell

On Sun, Mar 15, 2015 at 02:50:27PM +0800, Linhaifeng wrote:
> Hi,Michael
> 
> I had tested the start_xmit function by the follow code found that the tx queue's state is stopped and can't send any packets anymore.

Why don't you Cc all maintainers on this email?
Pls check the file MAINTAINERS for the full list.
I added Cc for now.

> 
> static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> 	... ...
> 
> 
>         capacity = 10;	//########## test code : force to call netif_stop_queue
> 
>         if (capacity < 2+MAX_SKB_FRAGS) {
>                 netif_stop_queue(dev);

So you changed code to make it think we are out of capacity, now it
stops the queue.

> 
>                 if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
>                         /* More just got used, free them then recheck. */
>                         capacity += free_old_xmit_skbs(vi);
>                         dev_warn(&dev->dev, "free_old_xmit_skbs capacity =%d MAX_SKB_FRAGS=%d", capacity, MAX_SKB_FRAGS);
> 
>                         capacity = 10;		//########## test code : force not to call  netif_start_queue
> 
>                         if (capacity >= 2+MAX_SKB_FRAGS) {
>                                 netif_start_queue(dev);
>                                 virtqueue_disable_cb(vi->svq);
>                         } else {
> 				//########## OTOH if often enter this branch tx queue maybe stopped.
> 			}

and changed it here so it won't restart queue if host consumed
all buffers.
unsurprisingly this makes driver not work.


> 			
>                 }
> 
> 		//########## Should we start queue here? I found that sometimes skb_xmit_done run before netif_stop_queue if this occurred the queue's state is
> 		//########## stopped and have to reload virtio-net module to restore network.

With or without your changes?
Is this the condition you describe?


        if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {

---> at this point, skb_xmit_done runs. this does:
        /* Suppress further interrupts. */
        virtqueue_disable_cb(vq);

        /* We were probably waiting for more output buffers. */
        netif_wake_subqueue(vi->dev, vq2txq(vq));
--->



                netif_stop_subqueue(dev, qnum);

---> queue is now stopped

                if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {

----> this re-enables interrupts, after an interrupt skb_xmit_done
	will run again.

                        /* More just got used, free them then recheck.
 * */
                        free_old_xmit_skbs(sq);
                        if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
                                netif_start_subqueue(dev, qnum);
                                virtqueue_disable_cb(sq->vq);
                        }
                }
        }


I can't see a race condition from your description above.

>         }
> 	
> }
> 
> ping 9.62.1.2 -i 0.1
> 64 bytes from 9.62.1.2: icmp_seq=19 ttl=64 time=0.115 ms
> 64 bytes from 9.62.1.2: icmp_seq=20 ttl=64 time=0.101 ms
> 64 bytes from 9.62.1.2: icmp_seq=21 ttl=64 time=0.094 ms
> 64 bytes from 9.62.1.2: icmp_seq=22 ttl=64 time=0.098 ms
> 64 bytes from 9.62.1.2: icmp_seq=23 ttl=64 time=0.097 ms
> 64 bytes from 9.62.1.2: icmp_seq=24 ttl=64 time=0.095 ms
> 64 bytes from 9.62.1.2: icmp_seq=25 ttl=64 time=0.095 ms
> ....
> ping:  sendmsg:  No buffer space available
> ping:  sendmsg:  No buffer space available
> ping:  sendmsg:  No buffer space available
> ping:  sendmsg:  No buffer space available
> ping:  sendmsg:  No buffer space available
> ping:  sendmsg:  No buffer space available
> ....
> 
> -- 
> Regards,
> Haifeng

I can't say what does your code-changing experiment show.
It might be better to introduce delay by calling something like
cpu_relax at specific points (maybe multiple times in a loop).

-- 
MST

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-net: tx queue was stopped
  2015-03-15  8:40 ` Michael S. Tsirkin
@ 2015-03-16  9:24   ` Linhaifeng
  2015-03-16 12:26     ` Michael S. Tsirkin
  0 siblings, 1 reply; 5+ messages in thread
From: Linhaifeng @ 2015-03-16  9:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, lilijun, liuyongan@huawei.com, lixiao (H), virtualization,
	Rusty Russell



On 2015/3/15 16:40, Michael S. Tsirkin wrote:
> On Sun, Mar 15, 2015 at 02:50:27PM +0800, Linhaifeng wrote:
>> Hi,Michael
>>
>> I had tested the start_xmit function by the follow code found that the tx queue's state is stopped and can't send any packets anymore.
> 
> Why don't you Cc all maintainers on this email?
> Pls check the file MAINTAINERS for the full list.
> I added Cc for now.
> 

Thank you.

>>
>> static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>> {
>> 	... ...
>>
>>
>>         capacity = 10;	//########## test code : force to call netif_stop_queue
>>
>>         if (capacity < 2+MAX_SKB_FRAGS) {
>>                 netif_stop_queue(dev);
> 
> So you changed code to make it think we are out of capacity, now it
> stops the queue.
> 
>>
>>                 if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
>>                         /* More just got used, free them then recheck. */
>>                         capacity += free_old_xmit_skbs(vi);
>>                         dev_warn(&dev->dev, "free_old_xmit_skbs capacity =%d MAX_SKB_FRAGS=%d", capacity, MAX_SKB_FRAGS);
>>
>>                         capacity = 10;		//########## test code : force not to call  netif_start_queue
>>
>>                         if (capacity >= 2+MAX_SKB_FRAGS) {
>>                                 netif_start_queue(dev);
>>                                 virtqueue_disable_cb(vi->svq);
>>                         } else {
>> 				//########## OTOH if often enter this branch tx queue maybe stopped.
>> 			}
> 
> and changed it here so it won't restart queue if host consumed
> all buffers.
> unsurprisingly this makes driver not work.
> 
> 
>> 			
>>                 }
>>
>> 		//########## Should we start queue here? I found that sometimes skb_xmit_done run before netif_stop_queue if this occurred the queue's state is
>> 		//########## stopped and have to reload virtio-net module to restore network.
> 
> With or without your changes?

without

> Is this the condition you describe?
> 
> 
>         if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> 
> ---> at this point, skb_xmit_done runs. this does:
>         /* Suppress further interrupts. */
>         virtqueue_disable_cb(vq);
> 
>         /* We were probably waiting for more output buffers. */
>         netif_wake_subqueue(vi->dev, vq2txq(vq));
> --->
> 
> 
> 

Because i use vhost-user(poll mode) with virtio_net so at this time vhost
had received all packets.

>                 netif_stop_subqueue(dev, qnum);
> 
> ---> queue is now stopped
> 
>                 if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> 
> ----> this re-enables interrupts, after an interrupt skb_xmit_done
> 	will run again.
> 

Before netif_stop_subqueue called vhost had received all packets so virtio_net
will never receive any skb_xmit_done.

If vhost is in poll mode should we need or not to stop tx queue?
Can i add a flag VHOST_F_POLL_MODE to support poll mode vhost(vhost-user)?

>                         /* More just got used, free them then recheck.
>  * */
>                         free_old_xmit_skbs(sq);
>                         if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>                                 netif_start_subqueue(dev, qnum);
>                                 virtqueue_disable_cb(sq->vq);
>                         }
>                 }
>         }
> 
> 
> I can't see a race condition from your description above.
> 
>>         }
>> 	
>> }
>>
>> ping 9.62.1.2 -i 0.1
>> 64 bytes from 9.62.1.2: icmp_seq=19 ttl=64 time=0.115 ms
>> 64 bytes from 9.62.1.2: icmp_seq=20 ttl=64 time=0.101 ms
>> 64 bytes from 9.62.1.2: icmp_seq=21 ttl=64 time=0.094 ms
>> 64 bytes from 9.62.1.2: icmp_seq=22 ttl=64 time=0.098 ms
>> 64 bytes from 9.62.1.2: icmp_seq=23 ttl=64 time=0.097 ms
>> 64 bytes from 9.62.1.2: icmp_seq=24 ttl=64 time=0.095 ms
>> 64 bytes from 9.62.1.2: icmp_seq=25 ttl=64 time=0.095 ms
>> ....
>> ping:  sendmsg:  No buffer space available
>> ping:  sendmsg:  No buffer space available
>> ping:  sendmsg:  No buffer space available
>> ping:  sendmsg:  No buffer space available
>> ping:  sendmsg:  No buffer space available
>> ping:  sendmsg:  No buffer space available
>> ....
>>
>> -- 
>> Regards,
>> Haifeng
> 
> I can't say what does your code-changing experiment show.
> It might be better to introduce delay by calling something like
> cpu_relax at specific points (maybe multiple times in a loop).
> 



-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-net: tx queue was stopped
  2015-03-16  9:24   ` Linhaifeng
@ 2015-03-16 12:26     ` Michael S. Tsirkin
  2015-03-20  9:23       ` Linhaifeng
  0 siblings, 1 reply; 5+ messages in thread
From: Michael S. Tsirkin @ 2015-03-16 12:26 UTC (permalink / raw)
  To: Linhaifeng
  Cc: netdev, lilijun, virtualization, liuyongan@huawei.com, lixiao (H)

On Mon, Mar 16, 2015 at 05:24:07PM +0800, Linhaifeng wrote:
> 
> 
> On 2015/3/15 16:40, Michael S. Tsirkin wrote:
> > On Sun, Mar 15, 2015 at 02:50:27PM +0800, Linhaifeng wrote:
> >> Hi,Michael
> >>
> >> I had tested the start_xmit function by the follow code found that the tx queue's state is stopped and can't send any packets anymore.
> > 
> > Why don't you Cc all maintainers on this email?
> > Pls check the file MAINTAINERS for the full list.
> > I added Cc for now.
> > 
> 
> Thank you.
> 
> >>
> >> static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >> {
> >> 	... ...
> >>
> >>
> >>         capacity = 10;	//########## test code : force to call netif_stop_queue
> >>
> >>         if (capacity < 2+MAX_SKB_FRAGS) {
> >>                 netif_stop_queue(dev);
> > 
> > So you changed code to make it think we are out of capacity, now it
> > stops the queue.
> > 
> >>
> >>                 if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> >>                         /* More just got used, free them then recheck. */
> >>                         capacity += free_old_xmit_skbs(vi);
> >>                         dev_warn(&dev->dev, "free_old_xmit_skbs capacity =%d MAX_SKB_FRAGS=%d", capacity, MAX_SKB_FRAGS);
> >>
> >>                         capacity = 10;		//########## test code : force not to call  netif_start_queue
> >>
> >>                         if (capacity >= 2+MAX_SKB_FRAGS) {
> >>                                 netif_start_queue(dev);
> >>                                 virtqueue_disable_cb(vi->svq);
> >>                         } else {
> >> 				//########## OTOH if often enter this branch tx queue maybe stopped.
> >> 			}
> > 
> > and changed it here so it won't restart queue if host consumed
> > all buffers.
> > unsurprisingly this makes driver not work.
> > 
> > 
> >> 			
> >>                 }
> >>
> >> 		//########## Should we start queue here? I found that sometimes skb_xmit_done run before netif_stop_queue if this occurred the queue's state is
> >> 		//########## stopped and have to reload virtio-net module to restore network.
> > 
> > With or without your changes?
> 
> without
> 
> > Is this the condition you describe?
> > 
> > 
> >         if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> > 
> > ---> at this point, skb_xmit_done runs. this does:
> >         /* Suppress further interrupts. */
> >         virtqueue_disable_cb(vq);
> > 
> >         /* We were probably waiting for more output buffers. */
> >         netif_wake_subqueue(vi->dev, vq2txq(vq));
> > --->
> > 
> > 
> > 
> 
> Because i use vhost-user(poll mode) with virtio_net so at this time vhost
> had received all packets.

Must likely a vhost-user bug then.

> >                 netif_stop_subqueue(dev, qnum);
> > 
> > ---> queue is now stopped
> > 
> >                 if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> > 
> > ----> this re-enables interrupts, after an interrupt skb_xmit_done
> > 	will run again.
> > 
> 
> Before netif_stop_subqueue called vhost had received all packets so virtio_net
> will never receive any skb_xmit_done.

And completed them in the used ring?
In that case virtqueue_enable_cb_delayed will return false,
so we'll call free_old_xmit_skbs below, and restart ring.

> If vhost is in poll mode should we need or not to stop tx queue?
> Can i add a flag VHOST_F_POLL_MODE to support poll mode vhost(vhost-user)?

Host just needs to be spec-compliant.
It must send interrupts unless they are disabled.
So this sounds like a VHOST_F_FIX_A_BUG to me. Just fix races in
vhost-user code, and no need for extra flags.


> >                         /* More just got used, free them then recheck.
> >  * */
> >                         free_old_xmit_skbs(sq);
> >                         if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> >                                 netif_start_subqueue(dev, qnum);
> >                                 virtqueue_disable_cb(sq->vq);
> >                         }
> >                 }
> >         }
> > 
> > 
> > I can't see a race condition from your description above.
> > 
> >>         }
> >> 	
> >> }
> >>
> >> ping 9.62.1.2 -i 0.1
> >> 64 bytes from 9.62.1.2: icmp_seq=19 ttl=64 time=0.115 ms
> >> 64 bytes from 9.62.1.2: icmp_seq=20 ttl=64 time=0.101 ms
> >> 64 bytes from 9.62.1.2: icmp_seq=21 ttl=64 time=0.094 ms
> >> 64 bytes from 9.62.1.2: icmp_seq=22 ttl=64 time=0.098 ms
> >> 64 bytes from 9.62.1.2: icmp_seq=23 ttl=64 time=0.097 ms
> >> 64 bytes from 9.62.1.2: icmp_seq=24 ttl=64 time=0.095 ms
> >> 64 bytes from 9.62.1.2: icmp_seq=25 ttl=64 time=0.095 ms
> >> ....
> >> ping:  sendmsg:  No buffer space available
> >> ping:  sendmsg:  No buffer space available
> >> ping:  sendmsg:  No buffer space available
> >> ping:  sendmsg:  No buffer space available
> >> ping:  sendmsg:  No buffer space available
> >> ping:  sendmsg:  No buffer space available
> >> ....
> >>
> >> -- 
> >> Regards,
> >> Haifeng
> > 
> > I can't say what does your code-changing experiment show.
> > It might be better to introduce delay by calling something like
> > cpu_relax at specific points (maybe multiple times in a loop).
> > 
> 
> 
> 
> -- 
> Regards,
> Haifeng

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-net: tx queue was stopped
  2015-03-16 12:26     ` Michael S. Tsirkin
@ 2015-03-20  9:23       ` Linhaifeng
  0 siblings, 0 replies; 5+ messages in thread
From: Linhaifeng @ 2015-03-20  9:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, lilijun, liuyongan@huawei.com, lixiao (H), virtualization,
	Rusty Russell



On 2015/3/16 20:26, Michael S. Tsirkin wrote:
> Host just needs to be spec-compliant.
> It must send interrupts unless they are disabled.
> So this sounds like a VHOST_F_FIX_A_BUG to me. Just fix races in
> vhost-user code, and no need for extra flags.

Michael,

Thank you for your suggestion.
I will fix races in vhost-user's code.

-- 
Regards,
Haifeng

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-03-20  9:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-15  6:50 virtio-net: tx queue was stopped Linhaifeng
2015-03-15  8:40 ` Michael S. Tsirkin
2015-03-16  9:24   ` Linhaifeng
2015-03-16 12:26     ` Michael S. Tsirkin
2015-03-20  9:23       ` Linhaifeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).