Re: How many copies to get from NIC RX to user read()?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: How many copies to get from NIC RX to user read()?
@ 2002-07-10 15:52 niv
  2002-07-10 21:15 ` Hurwitz Justin W.
  0 siblings, 1 reply; 7+ messages in thread
From: niv @ 2002-07-10 15:52 UTC (permalink / raw)
  To: hurwitz; +Cc: linux-kernel

> I could've sworn I heard the stack was single-copy 
> on both the TX and RX sides. But, it doesn't look to 
> me like it is. Rather, it looks like there is one copy 
> in tcp_rcv_estabilshed() (via tcp_copy_to_iovec()), and a
> second copy in tcp_recvmsg() (which is called when the 
> user calls read()). Both of these copies are, I believe, 
> done by skb_copy_datagram_iovec().

tcp_recvmsg() only does the copy from the receive_queue
or the backlog queue. tcp_rcv_established() does the copy
directly into the iovec or queues it onto the receive_queue 
or backlog queue for tcp_recvmsg() to complete the work. So 
there arent two copies of the same data happening, just a 
question of one or the other function doing the work depending 
on whether there is currently a process doing a read or not..

hth,

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How many copies to get from NIC RX to user read()?
  2002-07-10 15:52 How many copies to get from NIC RX to user read()? niv
@ 2002-07-10 21:15 ` Hurwitz Justin W.
  2002-07-10 21:27   ` Nivedita Singhvi
  0 siblings, 1 reply; 7+ messages in thread
From: Hurwitz Justin W. @ 2002-07-10 21:15 UTC (permalink / raw)
  To: niv; +Cc: linux-kernel


So, to make sure I have this right:

When the data is processed from the NIC
  tcp_rcv_established() is called in processing it
    if a user process is waiting on the socket
      iovec copy data to the user
    else
      copy it to receive_queue or backlog_queue

When the user tries read (in any way) a socket
  iovec copy from receive_queue or backlog_queue


E.g., if the user is ready for the data, dump it straight from SKBs. Else, 
don't waste SKBs on a lazy (or busy) user and copy the data to a queue.

If this is right, I'm happy :) If it's wrong, please correct. 

Thx,
--Gus

On Wed, 10 Jul 2002 niv@us.ibm.com wrote:

> 
> > I could've sworn I heard the stack was single-copy 
> > on both the TX and RX sides. But, it doesn't look to 
> > me like it is. Rather, it looks like there is one copy 
> > in tcp_rcv_estabilshed() (via tcp_copy_to_iovec()), and a
> > second copy in tcp_recvmsg() (which is called when the 
> > user calls read()). Both of these copies are, I believe, 
> > done by skb_copy_datagram_iovec().
> 
> tcp_recvmsg() only does the copy from the receive_queue
> or the backlog queue. tcp_rcv_established() does the copy
> directly into the iovec or queues it onto the receive_queue 
> or backlog queue for tcp_recvmsg() to complete the work. So 
> there arent two copies of the same data happening, just a 
> question of one or the other function doing the work depending 
> on whether there is currently a process doing a read or not..
> 
> hth,
> 
> thanks,
> Nivedita
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How many copies to get from NIC RX to user read()?
  2002-07-10 21:15 ` Hurwitz Justin W.
@ 2002-07-10 21:27   ` Nivedita Singhvi
  0 siblings, 0 replies; 7+ messages in thread
From: Nivedita Singhvi @ 2002-07-10 21:27 UTC (permalink / raw)
  To: Hurwitz Justin W.; +Cc: linux-kernel


> So, to make sure I have this right:
> 
> When the data is processed from the NIC
>   tcp_rcv_established() is called in processing it
>     if a user process is waiting on the socket
>       iovec copy data to the user
>     else
>       copy it to receive_queue or backlog_queue

well, we append the skb to the tail of the queue.
this is not a copy operation. (just a few instructions).

> When the user tries read (in any way) a socket
>   iovec copy from receive_queue or backlog_queue
> 
> 
> E.g., if the user is ready for the data, dump it straight from
> SKBs. Else, 
> don't waste SKBs on a lazy (or busy) user and copy the data to a
> queue.

yep.

> If this is right, I'm happy :) If it's wrong, please correct. 
> 
> Thx,
> --Gus

I should add that my reading of the code is hardly
authoritative :). caveat emptor...

thanks,
Nivedita




^ permalink raw reply	[flat|nested] 7+ messages in thread

* How many copies to get from NIC RX to user read()?
@ 2002-07-09 22:29 Hurwitz Justin W.
  2002-07-09 23:11 ` Alan Cox
  2002-07-10  8:29 ` Matti Aarnio
  0 siblings, 2 replies; 7+ messages in thread
From: Hurwitz Justin W. @ 2002-07-09 22:29 UTC (permalink / raw)
  To: linux-kernel

Please Cc: me in your responses.

The story so far: 

I've been continuing to muck around with the stack, trying both to improve
overall performance, and specifically to improve rx relative to tx
performance, primarily in gig-and-beyond (e.g., Quadrics)  environments.

To this end, I have begun by profiling and analyzing the RX side stack.
The profiling is being done as I write, and the analysis is what prompts
me to write.

The direct question:

How many times is data copied between the time that it is received at the
NIC and when the user's call to read() returns the data?

The reason for the question:

I could've sworn I heard the stack was single-copy on both the TX and RX
sides. But, it doesn't look to me like it is. Rather, it looks like there
is one copy in tcp_rcv_estabilshed() (via tcp_copy_to_iovec()), and a
second copy in tcp_recvmsg() (which is called when the user calls read()).
Both of these copies are, I believe, done by skb_copy_datagram_iovec().

The ancilary questions:

If I am wrong about this- does anyone care to publicly humiliate me by
telling me how/why (and possibly calling me stupid)?

If I am right about this- is there a specific reason that it is
implemented this way? Are there any thoughts on changing it? Our specific
inclination is to keep the skbs around until the user calls read(), at
which point we do an iovec memcopy to the userspace buffer, eliminating a
copy- the danger here is if the user doesn't read from the socket, this
might needlessly lock up skbs. To avoid this, we can implement either some
watermark or timeout for skb-consilidation- if the user doesn't call
read() soon enough or before too many skbs are used we copy the skbs to a
socket buffer like normal.

Cheers,
--Gus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How many copies to get from NIC RX to user read()?
  2002-07-09 22:29 Hurwitz Justin W.
@ 2002-07-09 23:11 ` Alan Cox
  2002-07-10  8:29 ` Matti Aarnio
  1 sibling, 0 replies; 7+ messages in thread
From: Alan Cox @ 2002-07-09 23:11 UTC (permalink / raw)
  To: Hurwitz Justin W.; +Cc: linux-kernel

> How many times is data copied between the time that it is received at the
> NIC and when the user's call to read() returns the data?

Optimal case for TCP

NIC->buffer (DMA)
buffer->user (CPU)

IFF the TCP checksum can be done by the card


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How many copies to get from NIC RX to user read()?
  2002-07-09 22:29 Hurwitz Justin W.
  2002-07-09 23:11 ` Alan Cox
@ 2002-07-10  8:29 ` Matti Aarnio
  2002-07-11  2:12   ` David S. Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Matti Aarnio @ 2002-07-10  8:29 UTC (permalink / raw)
  To: Hurwitz Justin W.; +Cc: linux-kernel

On Tue, Jul 09, 2002 at 04:29:35PM -0600, Hurwitz Justin W. wrote:
> Please Cc: me in your responses.
> 
> The story so far: 
> 
> I've been continuing to muck around with the stack, trying both to improve
> overall performance, and specifically to improve rx relative to tx
> performance, primarily in gig-and-beyond (e.g., Quadrics)  environments.
...
> The direct question:
> 
> How many times is data copied between the time that it is received at the
> NIC and when the user's call to read() returns the data?
 
> The reason for the question:
> 
> I could've sworn I heard the stack was single-copy on both the TX and RX
> sides. But, it doesn't look to me like it is. Rather, it looks like there
> is one copy in tcp_rcv_estabilshed() (via tcp_copy_to_iovec()), and a
> second copy in tcp_recvmsg() (which is called when the user calls read()).
> Both of these copies are, I believe, done by skb_copy_datagram_iovec().

  I suspect that in many cases there is third copy right in the network
  card driver to realign data so that TCP frame begins at a 32-bit boundary.
  Perhaps that is only for RISC CPU systems (e.g. Alpha, primarily.)

  Can the GigE cards do ethernet-frame reception pre-alignment so that
  after the 14 byte ethernet header, the TCP frame begins at 32-bit 
  boundary ?

...
> Cheers,
> --Gus

/Matti Aarnio

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How many copies to get from NIC RX to user read()?
  2002-07-10  8:29 ` Matti Aarnio
@ 2002-07-11  2:12   ` David S. Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David S. Miller @ 2002-07-11  2:12 UTC (permalink / raw)
  To: matti.aarnio; +Cc: hurwitz, linux-kernel

   From: Matti Aarnio <matti.aarnio@zmailer.org>
   Date: Wed, 10 Jul 2002 11:29:16 +0300

     I suspect that in many cases there is third copy right in the network
     card driver to realign data so that TCP frame begins at a 32-bit boundary.
     Perhaps that is only for RISC CPU systems (e.g. Alpha, primarily.)
   
     Can the GigE cards do ethernet-frame reception pre-alignment so that
     after the 14 byte ethernet header, the TCP frame begins at 32-bit 
     boundary ?

All gigabit chips allow to start the receive DMA buffer on a 2-byte
aligned boundary.  The exception is the ns83820.  Andi Kleen had some
ideas of how to deal with even the ns83820 type chips without copying
anything more than the headers (ie. not the data portion).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-07-11  2:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-10 15:52 How many copies to get from NIC RX to user read()? niv
2002-07-10 21:15 ` Hurwitz Justin W.
2002-07-10 21:27   ` Nivedita Singhvi
  -- strict thread matches above, loose matches on Subject: below --
2002-07-09 22:29 Hurwitz Justin W.
2002-07-09 23:11 ` Alan Cox
2002-07-10  8:29 ` Matti Aarnio
2002-07-11  2:12   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox