Question about way that NICs deliver packets to the kernel

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Question about way that NICs deliver packets to the kernel
@ 2010-07-15 14:24 Junchang Wang
  2010-07-15 14:33 ` Ben Hutchings
  2010-07-15 21:12 ` Francois Romieu
  0 siblings, 2 replies; 10+ messages in thread
From: Junchang Wang @ 2010-07-15 14:24 UTC (permalink / raw)
  To: romieu, netdev

Hi list,
My understand of the way that NICs deliver packets to the kernel is
as follows. Correct me if any of this is wrong. Thanks.

1) The device buffer is fixed. When the kernel is acknowledged arrival of a 
new packet, it dynamically allocate a new skb and copy the packet into it. 
For example, 8139too.

2) The device buffer is mapped by streaming DMA. When the kernel is 
acknowledged arrival of a new packet, it unmaps the region previously mapped. 
Obviously, there is NO memcpy operation. Additional cost is streaming DMA 
map/unmap operations. For example, e100 and e1000.

Here comes my question:
1) Is there a principle indicating which one is better? Is streaming DMA
map/unmap operations more expensive than memcpy operation?

2) Why does r8169 bias towards the first approach even if it support both? I 
convert r8169 to the second one and get a 5% performance boost. Below is result
running netperf TCP_STREAM test with 1.6K byte packet length.
        scheme 1    scheme 2    Imp.
r8169     683M        718M       5%

The following patch shows what I did:

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 239d7ef..707876f 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -4556,15 +4556,9 @@ static int rtl8169_rx_interrupt(struct net_device *dev,

 			rtl8169_rx_csum(skb, desc);

-			if (rtl8169_try_rx_copy(&skb, tp, pkt_size, addr)) {
-				pci_dma_sync_single_for_device(pdev, addr,
-					pkt_size, PCI_DMA_FROMDEVICE);
-				rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
-			} else {
-				pci_unmap_single(pdev, addr, tp->rx_buf_sz,
-						 PCI_DMA_FROMDEVICE);
-				tp->Rx_skbuff[entry] = NULL;
-			}
+			pci_unmap_single(pdev, addr, tp->rx_buf_sz,
+					 PCI_DMA_FROMDEVICE);
+			tp->Rx_skbuff[entry] = NULL;

 			skb_put(skb, pkt_size);
 			skb->protocol = eth_type_trans(skb, dev);

Thanks in advance.

--Junchang

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-15 14:24 Question about way that NICs deliver packets to the kernel Junchang Wang
@ 2010-07-15 14:33 ` Ben Hutchings
  2010-07-15 15:59   ` Stephen Hemminger
  2010-07-16  7:05   ` Junchang Wang
  2010-07-15 21:12 ` Francois Romieu
  1 sibling, 2 replies; 10+ messages in thread
From: Ben Hutchings @ 2010-07-15 14:33 UTC (permalink / raw)
  To: Junchang Wang; +Cc: romieu, netdev

On Thu, 2010-07-15 at 22:24 +0800, Junchang Wang wrote:
> Hi list,
> My understand of the way that NICs deliver packets to the kernel is
> as follows. Correct me if any of this is wrong. Thanks.
> 
> 1) The device buffer is fixed. When the kernel is acknowledged arrival of a 
> new packet, it dynamically allocate a new skb and copy the packet into it. 
> For example, 8139too.
> 
> 2) The device buffer is mapped by streaming DMA. When the kernel is 
> acknowledged arrival of a new packet, it unmaps the region previously mapped. 
> Obviously, there is NO memcpy operation. Additional cost is streaming DMA 
> map/unmap operations. For example, e100 and e1000.
> 
> Here comes my question:
> 1) Is there a principle indicating which one is better? Is streaming DMA
> map/unmap operations more expensive than memcpy operation?

DMA should result in lower CPU usage and higher maximum performance.

> 2) Why does r8169 bias towards the first approach even if it support both? I 
> convert r8169 to the second one and get a 5% performance boost. Below is result
> running netperf TCP_STREAM test with 1.6K byte packet length.
>         scheme 1    scheme 2    Imp.
> r8169     683M        718M       5%
[...]

You should also compare the CPU usage.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-15 14:33 ` Ben Hutchings
@ 2010-07-15 15:59   ` Stephen Hemminger
  2010-07-16  7:05   ` Junchang Wang
  1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2010-07-15 15:59 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Junchang Wang, romieu, netdev

On Thu, 15 Jul 2010 15:33:37 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Thu, 2010-07-15 at 22:24 +0800, Junchang Wang wrote:
> > Hi list,
> > My understand of the way that NICs deliver packets to the kernel is
> > as follows. Correct me if any of this is wrong. Thanks.
> > 
> > 1) The device buffer is fixed. When the kernel is acknowledged arrival of a 
> > new packet, it dynamically allocate a new skb and copy the packet into it. 
> > For example, 8139too.
> > 
> > 2) The device buffer is mapped by streaming DMA. When the kernel is 
> > acknowledged arrival of a new packet, it unmaps the region previously mapped. 
> > Obviously, there is NO memcpy operation. Additional cost is streaming DMA 
> > map/unmap operations. For example, e100 and e1000.
> > 
> > Here comes my question:
> > 1) Is there a principle indicating which one is better? Is streaming DMA
> > map/unmap operations more expensive than memcpy operation?
> 
> DMA should result in lower CPU usage and higher maximum performance.
> 
> > 2) Why does r8169 bias towards the first approach even if it support both? I 
> > convert r8169 to the second one and get a 5% performance boost. Below is result
> > running netperf TCP_STREAM test with 1.6K byte packet length.
> >         scheme 1    scheme 2    Imp.
> > r8169     683M        718M       5%
> [...]
> 
> You should also compare the CPU usage.

Also many drivers copy small receives into a new buffer
which saves space and often gives better performance.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-15 14:24 Question about way that NICs deliver packets to the kernel Junchang Wang
  2010-07-15 14:33 ` Ben Hutchings
@ 2010-07-15 21:12 ` Francois Romieu
  2010-07-16  7:35   ` Junchang Wang
  1 sibling, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2010-07-15 21:12 UTC (permalink / raw)
  To: Junchang Wang; +Cc: netdev

Junchang Wang <junchangwang@gmail.com> :
[...]
> 2) Why does r8169 bias towards the first approach even if it support both ?

It is a simple, straightforward fix against a 8169 hardware bug.

See commit c0cd884af045338476b8e69a61fceb3f34ff22f1.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-15 14:33 ` Ben Hutchings
  2010-07-15 15:59   ` Stephen Hemminger
@ 2010-07-16  7:05   ` Junchang Wang
  2010-07-16 17:58     ` Rick Jones
  1 sibling, 1 reply; 10+ messages in thread
From: Junchang Wang @ 2010-07-16  7:05 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: romieu, netdev

>
> You should also compare the CPU usage.
>
> Ben.
>
Hi Ben,
I added options -c -C to netperf's command line. Result is as follows:
                    scheme 1    scheme 2    Imp.
Throughput:     683M        718M       5%
CPU usage:     47.8%       45.6%

That really surprised me because "top" command showed the CPU usage
was fluctuating between 0.5% and 1.5% rather that between 45% and 50%.

How can I get the exact CPU usage?

Thanks.

-- 
--Junchang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-15 21:12 ` Francois Romieu
@ 2010-07-16  7:35   ` Junchang Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Junchang Wang @ 2010-07-16  7:35 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

> It is a simple, straightforward fix against a 8169 hardware bug.
>
> See commit c0cd884af045338476b8e69a61fceb3f34ff22f1.
>
Fortunately, it seems my device is unaffected by this issue. :)

Thanks Francois.

-- 
--Junchang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-16  7:05   ` Junchang Wang
@ 2010-07-16 17:58     ` Rick Jones
  2010-07-20  1:15       ` Junchang Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Rick Jones @ 2010-07-16 17:58 UTC (permalink / raw)
  To: Junchang Wang; +Cc: Ben Hutchings, romieu, netdev

Junchang Wang wrote:
>>You should also compare the CPU usage.
>>
>>Ben.
>>
> 
> Hi Ben,
> I added options -c -C to netperf's command line. Result is as follows:
>                     scheme 1    scheme 2    Imp.
> Throughput:     683M        718M       5%
> CPU usage:     47.8%       45.6%
> 
> That really surprised me because "top" command showed the CPU usage
> was fluctuating between 0.5% and 1.5% rather that between 45% and 50%.

Can you tell us a bit more about the system, and which version of netperf you 
are using?  Any chance that the CPU utilization you were looking at in top was 
just that being charged to netperf the process?  "Network processing" does not 
often get charged to the responsible process, so netperf reports system-wide CPU 
utilization on the assumption it is the only thing causing the CPUs to be utilized.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-16 17:58     ` Rick Jones
@ 2010-07-20  1:15       ` Junchang Wang
  2010-07-20 17:16         ` Rick Jones
  0 siblings, 1 reply; 10+ messages in thread
From: Junchang Wang @ 2010-07-20  1:15 UTC (permalink / raw)
  To: Rick Jones; +Cc: Ben Hutchings, romieu, netdev

On Fri, Jul 16, 2010 at 10:58:46AM -0700, Rick Jones wrote:
>>Hi Ben,
>>I added options -c -C to netperf's command line. Result is as follows:
>>                    scheme 1    scheme 2    Imp.
>>Throughput:     683M        718M       5%
>>CPU usage:     47.8%       45.6%
>>
>>That really surprised me because "top" command showed the CPU usage
>>was fluctuating between 0.5% and 1.5% rather that between 45% and 50%.
>

Hi rick,
very sorry for my late reply. Just recovered from the final exam.:)

>Can you tell us a bit more about the system, and which version of
>netperf you are using?  

The target machine is a Pentium Dual-core E2200 desktop with a r8169 
gigabit NIC. (I couldn't find a better server with old pci slot.)

Another machine is a Nehalem based system with Intel 82576 NIC.

The target machine executes netserver and Nehalem machine executes netperf.
The version of netperf is 2.4.5

>Any chance that the CPU utilization you were
>looking at in top was just that being charged to netperf the process?

What I see on target machine is as follows:

top - 21:37:12 up 21 min,  6 users,  load average: 0.43, 0.28, 0.19
Tasks: 152 total,   2 running, 149 sleeping,   0 stopped,   1 zombie
Cpu(s):  2.3%us,  1.5%sy,  0.1%ni, 89.5%id,  2.7%wa,  0.0%hi,  3.9%si,  0.0%
Mem:   2074064k total,   690200k used,  1383864k free,    39372k buffers
Swap:  2096476k total,        0k used,  2096476k free,   435044k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
3916 root      20   0  2228  584  296 R 84.6  0.0   0:07.12 netserver    

It shows the CPU usage of taget machine is around 10%.

while Nehalem machine's report is as follows:

TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.1 (192.168.2.1) port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

87380  16384  16384    10.05       679.79   1.63     48.27    1.571   11.634 

It shows the CPU usage of target machine is 48.27%.

>"Network processing" does not often get charged to the responsible
>process, so netperf reports system-wide CPU utilization on the
>assumption it is the only thing causing the CPUs to be utilized.

My understand of your commends is:
1)except running in ksoftirqd, network processing cannot be correctly counted
  because it runs in interrupt contexts that do not get charged to a correct
  process. So "top" misses lots of CPU usage in high interrupt rate network
  situation.
2)As you have mentioned in netperf's manual, netperf uses /proc/stat on Linux
  to retrieve time spent in idle mode. In other words, it accumulates cpu time
  spent in all other modes, including hardware interrupt, software interrupt,
  etc., making the CPU usage more accurate in high interrupt situation.
3)Since most processes in target machine are in sleeping mode, the CPU usage
  of network processing is in actually very close to 48.27%. Right?

Correct me if any of them are incorrect. Thanks.

--Junchang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-20  1:15       ` Junchang Wang
@ 2010-07-20 17:16         ` Rick Jones
  2010-07-25 14:18           ` Junchang Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Rick Jones @ 2010-07-20 17:16 UTC (permalink / raw)
  To: Junchang Wang; +Cc: Ben Hutchings, romieu, netdev

Junchang Wang wrote:
> On Fri, Jul 16, 2010 at 10:58:46AM -0700, Rick Jones wrote:
> 
>>>Hi Ben,
>>>I added options -c -C to netperf's command line. Result is as follows:
>>>                   scheme 1    scheme 2    Imp.
>>>Throughput:     683M        718M       5%
>>>CPU usage:     47.8%       45.6%
>>>
>>>That really surprised me because "top" command showed the CPU usage
>>>was fluctuating between 0.5% and 1.5% rather that between 45% and 50%.
>>
> 
> Hi rick,
> very sorry for my late reply. Just recovered from the final exam.:)
> 
> 
>>Can you tell us a bit more about the system, and which version of
>>netperf you are using?  
> 
> 
> The target machine is a Pentium Dual-core E2200 desktop with a r8169 
> gigabit NIC. (I couldn't find a better server with old pci slot.)
> 
> Another machine is a Nehalem based system with Intel 82576 NIC.
> 
> The target machine executes netserver and Nehalem machine executes netperf.
> The version of netperf is 2.4.5
> 
> 
>>Any chance that the CPU utilization you were
>>looking at in top was just that being charged to netperf the process?
> 
> 
> What I see on target machine is as follows:
> 
> top - 21:37:12 up 21 min,  6 users,  load average: 0.43, 0.28, 0.19
> Tasks: 152 total,   2 running, 149 sleeping,   0 stopped,   1 zombie
> Cpu(s):  2.3%us,  1.5%sy,  0.1%ni, 89.5%id,  2.7%wa,  0.0%hi,  3.9%si,  0.0%
> Mem:   2074064k total,   690200k used,  1383864k free,    39372k buffers
> Swap:  2096476k total,        0k used,  2096476k free,   435044k cached
> 
> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
> 3916 root      20   0  2228  584  296 R 84.6  0.0   0:07.12 netserver    

You said this was a dual-core system right?  So two cores, no threads?  If so, 
then that does look odd - if netserver is consuming 84% of a CPU (core) and 
there are only two CPUs (cores) in the system, how the system can be 89.5% idle 
is beyond me. The 48% reported by netperf below makes better sense. If you press 
"1" while top is running it should start to show per-CPU statistics

> It shows the CPU usage of taget machine is around 10%.
> 
> while Nehalem machine's report is as follows:
> 
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.1 (192.168.2.1) port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
> 87380  16384  16384    10.05       679.79   1.63     48.27    1.571   11.634 
> 
> It shows the CPU usage of target machine is 48.27%.

Clearly something is out of joint - let's go off-list (or on to 
netperf-talk@netperf.org) and hash that out to see what may be happening.  It 
will probably involve variations on grabbing the top-of-trunk, adding the debug 
option etc.

> 
> 
>>"Network processing" does not often get charged to the responsible
>>process, so netperf reports system-wide CPU utilization on the
>>assumption it is the only thing causing the CPUs to be utilized.
> 
> 
> My understand of your commends is:
> 1)except running in ksoftirqd, network processing cannot be correctly counted
>   because it runs in interrupt contexts that do not get charged to a correct
>   process. So "top" misses lots of CPU usage in high interrupt rate network
>   situation.

Top *shouldn't* miss it as far as reporting overall CPU utlization.  It just may 
not be charged to the process on who's behalf the work is done.

> 2)As you have mentioned in netperf's manual, netperf uses /proc/stat on Linux
>   to retrieve time spent in idle mode. In other words, it accumulates cpu time
>   spent in all other modes, including hardware interrupt, software interrupt,
>   etc., making the CPU usage more accurate in high interrupt situation.

That is the theory.  In practice however...  while the top output you've 
provided looks like there is an "issue" in top, netperf has been known to have a 
bug or three.

> 3)Since most processes in target machine are in sleeping mode, the CPU usage
>   of network processing is in actually very close to 48.27%. Right?

I do not expect there to be a huge discrepancy between the overall CPU 
utilization reported by top and the CPU utilization reported by netperf.  That 
there seems to be such a discrepancy has me wanting to make certain that netperf 
is operating correctly.

happy benchmarking,

rick jones

> 
> Correct me if any of them are incorrect. Thanks.
> 
> --Junchang
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about way that NICs deliver packets to the kernel
  2010-07-20 17:16         ` Rick Jones
@ 2010-07-25 14:18           ` Junchang Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Junchang Wang @ 2010-07-25 14:18 UTC (permalink / raw)
  To: Rick Jones, netdev; +Cc: Ben Hutchings, romieu

Hi list,

> Clearly something is out of joint - let's go off-list (or on to
> netperf-talk@netperf.org) and hash that out to see what may be happening.
>  It will probably involve variations on grabbing the top-of-trunk, adding
> the debug option etc.
>
The discrepancy between netperf and top has been worked out.

It seems top produce buggy data when I try to send output to a file.
For example, "top -b > output" gives out my previous buggy data in its
first iteration.

Actually, the report of top should be:

top - 21:37:15 up 21 min,  6 users,  load average: 0.43, 0.28, 0.19
Tasks: 152 total,   2 running, 149 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.2%us,  5.4%sy,  0.0%ni, 50.9%id,  0.0%wa,  0.0%hi, 43.5%si,  0.0%
Mem:   2074064k total,   690192k used,  1383872k free,    39372k buffers
Swap:  2096476k total,        0k used,  2096476k free,   435056k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3916 root      20   0  2228  584  296 R 86.3  0.0   0:09.72 netserver

I think 50.9% system idle makes sense because this is a dual-core system
and netserver is consuming 86.3% of a core. On average, the CPU usage
of the whole system reported by top can be regarded as from 46.2% to
50.1%.

netperf's report of 48% is right and testifies that "there is no huge
discrepancy
between the overall CPU utilization reported by top and the CPU utilization
reported by netperf."

Thanks Rick.

-- 
--Junchang

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-07-25 14:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-15 14:24 Question about way that NICs deliver packets to the kernel Junchang Wang
2010-07-15 14:33 ` Ben Hutchings
2010-07-15 15:59   ` Stephen Hemminger
2010-07-16  7:05   ` Junchang Wang
2010-07-16 17:58     ` Rick Jones
2010-07-20  1:15       ` Junchang Wang
2010-07-20 17:16         ` Rick Jones
2010-07-25 14:18           ` Junchang Wang
2010-07-15 21:12 ` Francois Romieu
2010-07-16  7:35   ` Junchang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).