public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* Receive processing stops when dev->poll returns 1
@ 2010-08-05 14:20 Usha Srinivasan
  2010-08-05 16:04 ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Usha Srinivasan @ 2010-08-05 14:20 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,
I have run into an interesting and frustrating problem which I've not been able to resolve. I am hoping someone can help me.  

I have a network driver which sets its dev->weight to 100 (like ipoib) and when it processes 100 received packets, following the rules, it decrements dev->quota and *budget and returns 1 without calling netif_rx_complete.  When my driver does that, all processing of incoming packets for all interfaces comes to a halt.  

How do I know this?  Because, as soon as my driver returns 1 to dev->poll, I lose my putty session and eth0 stops working; eth0 counters show that it stops receiving packets, though it is able to transmit.  My own device stops receiving packets.  I have scoured the code for ipoib and other network devices and I see no difference in what my driver does.  I have tried to lower weight for ipoib & eth0 hoping to reproduce with those device it but no luck.

One guess is that net_rx_action spent more than 1 tick processing all the incoming packets for all interfaces it polled; I verified that my driver by itself does not spend that much. When this happens, net_rx_action exits after marking NETIF_RX_SOFTIRQ as pending.  So one would expect it to be called again later, but my guess is that doesn't happen thereby resulting in a stoppage of incoming packets. Is that possible and, if so, what is the fix?

1814 static void net_rx_action(struct softirq_action *h)
1815 {
1816         struct softnet_data *queue = &__get_cpu_var(softnet_data);
1817         unsigned long start_time = jiffies;
1818         int budget = netdev_budget;
1819         void *have;
1820 
1821         local_irq_disable();
1822 
1823         while (!list_empty(&queue->poll_list)) {
1824                 struct net_device *dev;
1825 
1826                 if (budget <= 0 || jiffies - start_time > 1)
1827                         goto softnet_break;
1828 
1829                 local_irq_enable();
1830 
1831                 dev = list_entry(queue->poll_list.next,
1832                                  struct net_device, poll_list);
1833                 have = netpoll_poll_lock(dev);
1834 
1835                 if (dev->quota <= 0 || dev->poll(dev, &budget)) {
1836                         netpoll_poll_unlock(have);
1837                         local_irq_disable();
1838                         list_move_tail(&dev->poll_list, &queue->poll_list);
1839                         if (dev->quota < 0)
1840                                 dev->quota += dev->weight;
1841                         else
1842                                 dev->quota = dev->weight;
1843                 } else {
1844                         netpoll_poll_unlock(have);
1845                         dev_put(dev);
1846                         local_irq_disable();
1847                 }
1848         }
1849 out:
1850         local_irq_enable();
1851         return;
1852 
1853 softnet_break:
1854         __get_cpu_var(netdev_rx_stat).time_squeeze++;
1855         __raise_softirq_irqoff(NET_RX_SOFTIRQ);
1856         goto out;
1857 }
1858

I have run into this problem on four systems running RHEL5, SLES10 or SLES 11.  The above describes what happens in RHEL5/SLES10.  This is different in SLES11, wherein dev->poll has been replaced by netif_napi_add and the poll function returns done without quota/budget manipulation; yet, I run into the same behavior. 

Any help appreciated! Thanks in advance!

Usha

___________________
Usha Srinivasan
Software Engineer
QLogic Corporation
780 5th Ave, Suite A
King of Prussia, PA 19406
(610) 233-4844
(610) 233-4777 (Fax)
(610) 233-4838 (Main Desk)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Receive processing stops when dev->poll returns 1
  2010-08-05 14:20 Receive processing stops when dev->poll returns 1 Usha Srinivasan
@ 2010-08-05 16:04 ` Stephen Hemminger
  2010-08-05 16:11   ` Usha Srinivasan
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2010-08-05 16:04 UTC (permalink / raw)
  To: Usha Srinivasan; +Cc: netdev@vger.kernel.org

On Thu, 5 Aug 2010 09:20:03 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

> Hello,
> I have run into an interesting and frustrating problem which I've not been able to resolve. I am hoping someone can help me.  
> 
> I have a network driver which sets its dev->weight to 100 (like ipoib) and when it processes 100 received packets, following the rules, it decrements dev->quota and *budget and returns 1 without calling netif_rx_complete.  When my driver does that, all processing of incoming packets for all interfaces comes to a halt.  
> 
> How do I know this?  Because, as soon as my driver returns 1 to dev->poll, I lose my putty session and eth0 stops working; eth0 counters show that it stops receiving packets, though it is able to transmit.  My own device stops receiving packets.  I have scoured the code for ipoib and other network devices and I see no difference in what my driver does.  I have tried to lower weight for ipoib & eth0 hoping to reproduce with those device it but no luck.

You maybe looking at old documentation on how NAPI works. 
In NAPI <= 2.6.23, the driver changed  dev->quota and budget
and returned 0 or 1.

For current kernels, the NAPI poll has changed.
Using your example,
  dev->weight = 100
  budget would be 100
 if your network driver process 100 packets, it should return 100
 and call napi_complete().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Receive processing stops when dev->poll returns 1
  2010-08-05 16:04 ` Stephen Hemminger
@ 2010-08-05 16:11   ` Usha Srinivasan
  2010-08-05 16:16     ` Stephen Hemminger
  2010-08-05 16:22     ` Stephen Hemminger
  0 siblings, 2 replies; 8+ messages in thread
From: Usha Srinivasan @ 2010-08-05 16:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org

Thanks for your response. What you said is exactly what my driver is doing:


<= 2.6.23 
Calls netif_rx_complete if done < budget; decrements quota & *budget by done; returns 0 if done < budget and 1 otherwise.

When 1 is returned, I encounter the problem I described)

> 2.6.23
Calls napi-complete if done < budget; returns done.

When done==budget, I encounter the problem I described.

Any ideas?

-----Original Message-----
From: Stephen Hemminger [mailto:shemminger@vyatta.com] 
Sent: Thursday, August 05, 2010 12:05 PM
To: Usha Srinivasan
Cc: netdev@vger.kernel.org
Subject: Re: Receive processing stops when dev->poll returns 1

On Thu, 5 Aug 2010 09:20:03 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

> Hello,
> I have run into an interesting and frustrating problem which I've not been able to resolve. I am hoping someone can help me.  
> 
> I have a network driver which sets its dev->weight to 100 (like ipoib) and when it processes 100 received packets, following the rules, it decrements dev->quota and *budget and returns 1 without calling netif_rx_complete.  When my driver does that, all processing of incoming packets for all interfaces comes to a halt.  
> 
> How do I know this?  Because, as soon as my driver returns 1 to dev->poll, I lose my putty session and eth0 stops working; eth0 counters show that it stops receiving packets, though it is able to transmit.  My own device stops receiving packets.  I have scoured the code for ipoib and other network devices and I see no difference in what my driver does.  I have tried to lower weight for ipoib & eth0 hoping to reproduce with those device it but no luck.

You maybe looking at old documentation on how NAPI works. 
In NAPI <= 2.6.23, the driver changed  dev->quota and budget
and returned 0 or 1.

For current kernels, the NAPI poll has changed.
Using your example,
  dev->weight = 100
  budget would be 100
 if your network driver process 100 packets, it should return 100
 and call napi_complete().


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Receive processing stops when dev->poll returns 1
  2010-08-05 16:11   ` Usha Srinivasan
@ 2010-08-05 16:16     ` Stephen Hemminger
  2010-08-05 16:22     ` Stephen Hemminger
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2010-08-05 16:16 UTC (permalink / raw)
  To: Usha Srinivasan; +Cc: netdev@vger.kernel.org

On Thu, 5 Aug 2010 11:11:51 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

> Thanks for your response. What you said is exactly what my driver is doing:
> 
> 
> <= 2.6.23 
> Calls netif_rx_complete if done < budget; decrements quota & *budget by done; returns 0 if done < budget and 1 otherwise.
> 
> When 1 is returned, I encounter the problem I described)
> 
> > 2.6.23
> Calls napi-complete if done < budget; returns done.
> 
> When done==budget, I encounter the problem I described.
> 

Your driver did not call napi_complete (and re-enable interrupts).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Receive processing stops when dev->poll returns 1
  2010-08-05 16:11   ` Usha Srinivasan
  2010-08-05 16:16     ` Stephen Hemminger
@ 2010-08-05 16:22     ` Stephen Hemminger
  2010-08-05 16:36       ` Usha Srinivasan
  1 sibling, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2010-08-05 16:22 UTC (permalink / raw)
  To: Usha Srinivasan; +Cc: netdev@vger.kernel.org

On Thu, 5 Aug 2010 11:11:51 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

> Thanks for your response. What you said is exactly what my driver is doing:
> 
> 
> <= 2.6.23 
> Calls netif_rx_complete if done < budget; decrements quota & *budget by done; returns 0 if done < budget and 1 otherwise.
> 
> When 1 is returned, I encounter the problem I described)
> 
> > 2.6.23
> Calls napi-complete if done < budget; returns done.
> 
> When done==budget, I encounter the problem I described.
> 
> Any ideas?

Ignore last mail...

If you done == budget, the poll will be recalled (after other drivers).
If quantum exhausts, then it gets called it gets deferred to ksoftirq
thread.

One possibility is that the driver is looking at wrong parameter
for budget and is exceeding the requested value. Please post your code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Receive processing stops when dev->poll returns 1
  2010-08-05 16:22     ` Stephen Hemminger
@ 2010-08-05 16:36       ` Usha Srinivasan
  2010-08-05 17:37         ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Usha Srinivasan @ 2010-08-05 16:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org

I have compared the code in my driver to code in other drivers and they are quite similar. Here is my code:

int vnic_napi_poll(struct napi_struct *napi, int budget)
{
    done = 0;
poll_more:
    while (done < budget) {
        int max = (budget - done);
        t = min(<max-supported-by-driver>, max);
        n = get-completions(comp_list);
        for (i = 0; i < n; i++, done++)
            handle_completions(<complist[i]);
        if (n != t)
            break;
    }
    if (done < budget) {
        netif_rx_complete(dev, napi);
        /* check again just to be sure */
        if (more-completions()) { 
            If netif_rx_reschedule(dev, napi))
                goto poll_more;
        }
    }
    return done;
}

***********************
BACKPORTED version:
***********************
int vnic_poll(struct net_device *dev, int *budget)
{
    int max = min(*budget, dev->quota);

    done = 0;
poll_more:
    while (max) {
        t = min(<max-supported-by-driver>, max);
        n = get-completions(comp_list);
        for (i = 0; i < n; i++, --max, done++)
            handle_completions(<complist[i]);
        if (n != t)
            break;
    }
    if (max) {
        netif_rx_complete(dev);
        /* check again just to be sure */
        if (more-completions()) { 
            If netif_rx_reschedule(dev, napi))
                goto poll_more;
        }
        ret = 0;
    } else
        ret = 1;

    dev->quota  -= done;
    *budget     -= done;

    return ret;
}

***********************
-----Original Message-----
From: Stephen Hemminger [mailto:shemminger@vyatta.com] 
Sent: Thursday, August 05, 2010 12:23 PM
To: Usha Srinivasan
Cc: netdev@vger.kernel.org
Subject: Re: Receive processing stops when dev->poll returns 1

On Thu, 5 Aug 2010 11:11:51 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

> Thanks for your response. What you said is exactly what my driver is doing:
> 
> 
> <= 2.6.23 
> Calls netif_rx_complete if done < budget; decrements quota & *budget by done; returns 0 if done < budget and 1 otherwise.
> 
> When 1 is returned, I encounter the problem I described)
> 
> > 2.6.23
> Calls napi-complete if done < budget; returns done.
> 
> When done==budget, I encounter the problem I described.
> 
> Any ideas?

Ignore last mail...

If you done == budget, the poll will be recalled (after other drivers).
If quantum exhausts, then it gets called it gets deferred to ksoftirq
thread.

One possibility is that the driver is looking at wrong parameter
for budget and is exceeding the requested value. Please post your code.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Receive processing stops when dev->poll returns 1
  2010-08-05 16:36       ` Usha Srinivasan
@ 2010-08-05 17:37         ` Stephen Hemminger
  2010-08-05 18:11           ` Usha Srinivasan
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2010-08-05 17:37 UTC (permalink / raw)
  To: Usha Srinivasan; +Cc: netdev@vger.kernel.org

On Thu, 5 Aug 2010 11:36:26 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

>         int max = (budget - done);
>         t = min(<max-supported-by-driver>, max);
>         n = get-completions(comp_list);

You need to handle all completions pending in the poll, the code will
not call you back. So this min() is the problem.

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Receive processing stops when dev->poll returns 1
  2010-08-05 17:37         ` Stephen Hemminger
@ 2010-08-05 18:11           ` Usha Srinivasan
  0 siblings, 0 replies; 8+ messages in thread
From: Usha Srinivasan @ 2010-08-05 18:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org

Stephen,
The min is inside a while loop; it is purely used to limit the number of completions that are retrieved at-a-time.  The outer while loops ensuring that all the completions are handled until budget is reached or there are no completions left. Please look again at the code I sent you.
Usha

-----Original Message-----
From: Stephen Hemminger [mailto:shemminger@vyatta.com] 
Sent: Thursday, August 05, 2010 1:37 PM
To: Usha Srinivasan
Cc: netdev@vger.kernel.org
Subject: Re: Receive processing stops when dev->poll returns 1

On Thu, 5 Aug 2010 11:36:26 -0500
Usha Srinivasan <usha.srinivasan@qlogic.com> wrote:

>         int max = (budget - done);
>         t = min(<max-supported-by-driver>, max);
>         n = get-completions(comp_list);

You need to handle all completions pending in the poll, the code will
not call you back. So this min() is the problem.

-- 


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-08-05 18:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-05 14:20 Receive processing stops when dev->poll returns 1 Usha Srinivasan
2010-08-05 16:04 ` Stephen Hemminger
2010-08-05 16:11   ` Usha Srinivasan
2010-08-05 16:16     ` Stephen Hemminger
2010-08-05 16:22     ` Stephen Hemminger
2010-08-05 16:36       ` Usha Srinivasan
2010-08-05 17:37         ` Stephen Hemminger
2010-08-05 18:11           ` Usha Srinivasan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox