All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan-Bernd Themann <ossthema@de.ibm.com>
To: netdev <netdev@vger.kernel.org>
Cc: Thomas Klein <tklein@de.ibm.com>,
	Jan-Bernd Themann <themann@de.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-ppc <linuxppc-dev@ozlabs.org>,
	Christoph Raisch <raisch@de.ibm.com>,
	Marcus Eder <meder@de.ibm.com>,
	Stefan Roscher <stefan.roscher@de.ibm.com>
Subject: RFC: issues concerning the next NAPI interface
Date: Fri, 24 Aug 2007 15:59:16 +0200	[thread overview]
Message-ID: <200708241559.17055.ossthema@de.ibm.com> (raw)

Hi,

when I tried to get the eHEA driver working with the new interface,
the following issues came up.

1) The current implementation of netif_rx_schedule, netif_rx_complete
=A0 =A0and the net_rx_action have the following problem: netif_rx_schedule
=A0 =A0sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the pol=
l_list.
=A0 =A0netif_rx_action checks NAPI_STATE_SCHED, if set it will add the devi=
ce
=A0 =A0to the poll_list again (as well). netif_rx_complete clears the NAPI_=
STATE_SCHED.
=A0 =A0If an interrupt handler calls netif_rx_schedule on CPU 2
=A0 =A0after netif_rx_complete has been called on CPU 1 (and the poll funct=
ion=20
=A0 =A0has not returned yet), the NAPI instance will be added twice to the=
=20
=A0 =A0poll_list (by netif_rx_schedule and net_rx_action). Problems occur w=
hen=20
=A0 =A0netif_rx_complete is called twice for the device (BUG() called)

2) If an ethernet chip supports multiple receive queues, the queues are=20
=A0 =A0currently all processed on the CPU where the interrupt comes in. This
=A0 =A0is because netif_rx_schedule will always add the rx queue to the CPU=
's
=A0 =A0napi poll_list. The result under heavy presure is that all queues wi=
ll
=A0 =A0gather on the weakest CPU (with highest CPU load) after some time as=
 they
=A0 =A0will stay there as long as the entire queue is emptied. On SMP syste=
ms=20
=A0 =A0this behaviour is not desired. It should also work well without inte=
rrupt
=A0 =A0pinning.
=A0 =A0It would be nice if it is possible to schedule queues to other CPU's=
, or
=A0 =A0at least to use interrupts to put the queue to another cpu (not nice=
 for=20
=A0 =A0as you never know which one you will hit).=20
=A0 =A0I'm not sure how bad the tradeoff would be.

3) On modern systems the incoming packets are processed very fast. Especial=
ly
=A0 =A0on SMP systems when we use multiple queues we process only a few pac=
kets
=A0 =A0per napi poll cycle. So NAPI does not work very well here and the in=
terrupt=20
=A0 =A0rate is still high. What we need would be some sort of timer polling=
 mode=20
=A0 =A0which will schedule a device after a certain amount of time for high=
 load=20
=A0 =A0situations. With high precision timers this could work well. Current
=A0 =A0usual timers are too slow. A finer granularity would be needed to ke=
ep the
   latency down (and queue length moderate).

What do you think?

Thanks,
Jan-Bernd

WARNING: multiple messages have this Message-ID (diff)
From: Jan-Bernd Themann <ossthema@de.ibm.com>
To: netdev <netdev@vger.kernel.org>
Cc: Christoph Raisch <raisch@de.ibm.com>,
	"Jan-Bernd Themann" <themann@de.ibm.com>,
	"linux-kernel" <linux-kernel@vger.kernel.org>,
	"linux-ppc" <linuxppc-dev@ozlabs.org>,
	Marcus Eder <meder@de.ibm.com>, Thomas Klein <tklein@de.ibm.com>,
	Stefan Roscher <stefan.roscher@de.ibm.com>
Subject: RFC: issues concerning the next NAPI interface
Date: Fri, 24 Aug 2007 15:59:16 +0200	[thread overview]
Message-ID: <200708241559.17055.ossthema@de.ibm.com> (raw)

Hi,

when I tried to get the eHEA driver working with the new interface,
the following issues came up.

1) The current implementation of netif_rx_schedule, netif_rx_complete
   and the net_rx_action have the following problem: netif_rx_schedule
   sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
   netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
   to the poll_list again (as well). netif_rx_complete clears the NAPI_STATE_SCHED.
   If an interrupt handler calls netif_rx_schedule on CPU 2
   after netif_rx_complete has been called on CPU 1 (and the poll function 
   has not returned yet), the NAPI instance will be added twice to the 
   poll_list (by netif_rx_schedule and net_rx_action). Problems occur when 
   netif_rx_complete is called twice for the device (BUG() called)

2) If an ethernet chip supports multiple receive queues, the queues are 
   currently all processed on the CPU where the interrupt comes in. This
   is because netif_rx_schedule will always add the rx queue to the CPU's
   napi poll_list. The result under heavy presure is that all queues will
   gather on the weakest CPU (with highest CPU load) after some time as they
   will stay there as long as the entire queue is emptied. On SMP systems 
   this behaviour is not desired. It should also work well without interrupt
   pinning.
   It would be nice if it is possible to schedule queues to other CPU's, or
   at least to use interrupts to put the queue to another cpu (not nice for 
   as you never know which one you will hit). 
   I'm not sure how bad the tradeoff would be.

3) On modern systems the incoming packets are processed very fast. Especially
   on SMP systems when we use multiple queues we process only a few packets
   per napi poll cycle. So NAPI does not work very well here and the interrupt 
   rate is still high. What we need would be some sort of timer polling mode 
   which will schedule a device after a certain amount of time for high load 
   situations. With high precision timers this could work well. Current
   usual timers are too slow. A finer granularity would be needed to keep the
   latency down (and queue length moderate).

What do you think?

Thanks,
Jan-Bernd

             reply	other threads:[~2007-08-24 13:59 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-24 13:59 Jan-Bernd Themann [this message]
2007-08-24 13:59 ` RFC: issues concerning the next NAPI interface Jan-Bernd Themann
2007-08-24 15:37 ` akepner
2007-08-24 15:37   ` akepner
2007-08-24 15:47   ` Jan-Bernd Themann
2007-08-24 15:47     ` Jan-Bernd Themann
2007-08-24 15:52     ` Stephen Hemminger
2007-08-24 15:52       ` Stephen Hemminger
2007-08-24 16:50       ` David Stevens
2007-08-24 16:50         ` David Stevens
2007-08-24 21:44         ` David Miller
2007-08-24 21:44           ` David Miller
2007-08-24 21:51           ` Linas Vepstas
2007-08-24 21:51             ` Linas Vepstas
2007-08-24 16:51       ` Linas Vepstas
2007-08-24 16:51         ` Linas Vepstas
2007-08-24 17:07         ` Rick Jones
2007-08-24 17:07           ` Rick Jones
2007-08-24 17:45         ` Shirley Ma
2007-08-24 17:45           ` Shirley Ma
2007-08-24 17:16       ` James Chapman
2007-08-24 17:16         ` James Chapman
2007-08-24 18:11         ` Jan-Bernd Themann
2007-08-24 18:11           ` Jan-Bernd Themann
2007-08-24 21:47         ` David Miller
2007-08-24 21:47           ` David Miller
2007-08-24 22:06           ` akepner
2007-08-24 22:06             ` akepner
2007-08-26 19:36           ` James Chapman
2007-08-26 19:36             ` James Chapman
2007-08-27  1:58             ` David Miller
2007-08-27  1:58               ` David Miller
2007-08-27  9:47               ` Jan-Bernd Themann
2007-08-27  9:47                 ` Jan-Bernd Themann
2007-08-27 20:37                 ` David Miller
2007-08-27 20:37                   ` David Miller
2007-08-28 11:19                   ` Jan-Bernd Themann
2007-08-28 11:19                     ` Jan-Bernd Themann
2007-08-28 20:21                     ` David Miller
2007-08-28 20:21                       ` David Miller
2007-08-29  7:10                       ` Jan-Bernd Themann
2007-08-29  7:10                         ` Jan-Bernd Themann
2007-08-29  8:15                         ` James Chapman
2007-08-29  8:15                           ` James Chapman
2007-08-29  8:43                           ` Jan-Bernd Themann
2007-08-29  8:43                             ` Jan-Bernd Themann
2007-08-29  8:29                         ` David Miller
2007-08-29  8:29                           ` David Miller
2007-08-29  8:31                           ` Jan-Bernd Themann
2007-08-29  8:31                             ` Jan-Bernd Themann
2007-08-27 15:51               ` James Chapman
2007-08-27 15:51                 ` James Chapman
2007-08-27 16:02                 ` Jan-Bernd Themann
2007-08-27 16:02                   ` Jan-Bernd Themann
2007-08-27 17:05                   ` James Chapman
2007-08-27 17:05                     ` James Chapman
2007-08-27 21:02                 ` David Miller
2007-08-27 21:02                   ` David Miller
2007-08-27 21:41                   ` James Chapman
2007-08-27 21:41                     ` James Chapman
2007-08-27 21:56                     ` David Miller
2007-08-27 21:56                       ` David Miller
2007-08-28  9:22                       ` James Chapman
2007-08-28  9:22                         ` James Chapman
2007-08-28 11:48                         ` Jan-Bernd Themann
2007-08-28 11:48                           ` Jan-Bernd Themann
2007-08-28 12:16                           ` Evgeniy Polyakov
2007-08-28 12:16                             ` Evgeniy Polyakov
2007-08-28 14:55                           ` James Chapman
2007-08-28 14:55                             ` James Chapman
2007-08-28 11:21                   ` Jan-Bernd Themann
2007-08-28 11:21                     ` Jan-Bernd Themann
2007-08-28 20:25                     ` David Miller
2007-08-28 20:25                       ` David Miller
2007-08-28 20:27                     ` David Miller
2007-08-28 20:27                       ` David Miller
2007-08-24 16:45 ` Linas Vepstas
2007-08-24 16:45   ` Linas Vepstas
2007-08-24 21:43   ` David Miller
2007-08-24 21:43     ` David Miller
2007-08-24 21:32 ` David Miller
2007-08-24 21:32   ` David Miller
2007-08-24 21:37 ` David Miller
2007-08-24 21:37   ` David Miller
     [not found] <8VHRR-45R-17@gated-at.bofh.it>
     [not found] ` <8VKwj-8ke-27@gated-at.bofh.it>
2007-08-24 19:04   ` Bodo Eggert
2007-08-24 19:04     ` Bodo Eggert
2007-08-24 20:42     ` Linas Vepstas
2007-08-24 20:42       ` Linas Vepstas
2007-08-24 21:11       ` Jan-Bernd Themann
2007-08-24 21:11         ` Jan-Bernd Themann
2007-08-24 21:35         ` Linas Vepstas
2007-08-24 21:35           ` Linas Vepstas
     [not found]   ` <E1IOeSm-0000bm-Jo__24045.532072387$1187982363$gmane$org@be1.lrz>
2007-08-24 20:24     ` Stephen Hemminger
  -- strict thread matches above, loose matches on Subject: below --
2007-08-25  2:10 Mitchell Erblich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200708241559.17055.ossthema@de.ibm.com \
    --to=ossthema@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=meder@de.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=raisch@de.ibm.com \
    --cc=stefan.roscher@de.ibm.com \
    --cc=themann@de.ibm.com \
    --cc=tklein@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.