netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* MSG_ZEROCOPY_FIXED
@ 2020-11-08 17:04 Victor Stewart
  2020-11-11  0:09 ` MSG_ZEROCOPY_FIXED Jonathan Lemon
  0 siblings, 1 reply; 5+ messages in thread
From: Victor Stewart @ 2020-11-08 17:04 UTC (permalink / raw)
  To: netdev

hi all,

i'm seeking input / comment on the idea of implementing full fledged
zerocopy UDP networking that uses persistent buffers allocated in
userspace... before I go off on a solo tangent with my first patches
lol.

i'm sure there's been lots of thought/discussion on this before. of
course Willem added MSG_ZEROCOPY on the send path (pin buffers on
demand / per send). and something similar to what I speak of exists
with TCP_ZEROCOPY_RECEIVE.

i envision something like a new flag like MSG_ZEROCOPY_FIXED that
"does the right thing" in the send vs recv paths.

Victor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MSG_ZEROCOPY_FIXED
  2020-11-08 17:04 MSG_ZEROCOPY_FIXED Victor Stewart
@ 2020-11-11  0:09 ` Jonathan Lemon
  2020-11-11  0:20   ` MSG_ZEROCOPY_FIXED Victor Stewart
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Lemon @ 2020-11-11  0:09 UTC (permalink / raw)
  To: Victor Stewart; +Cc: netdev

On Sun, Nov 08, 2020 at 05:04:41PM +0000, Victor Stewart wrote:
> hi all,
> 
> i'm seeking input / comment on the idea of implementing full fledged
> zerocopy UDP networking that uses persistent buffers allocated in
> userspace... before I go off on a solo tangent with my first patches
> lol.
> 
> i'm sure there's been lots of thought/discussion on this before. of
> course Willem added MSG_ZEROCOPY on the send path (pin buffers on
> demand / per send). and something similar to what I speak of exists
> with TCP_ZEROCOPY_RECEIVE.
> 
> i envision something like a new flag like MSG_ZEROCOPY_FIXED that
> "does the right thing" in the send vs recv paths.

See the netgpu patches that I posted earlier; these will handle
protocol independent zerocopy sends/receives.  I do have a working
UDP receive implementation which will be posted with an updated
patchset.
--
Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MSG_ZEROCOPY_FIXED
  2020-11-11  0:09 ` MSG_ZEROCOPY_FIXED Jonathan Lemon
@ 2020-11-11  0:20   ` Victor Stewart
  2020-11-11  0:59     ` MSG_ZEROCOPY_FIXED Jonathan Lemon
  2020-11-13 16:41     ` MSG_ZEROCOPY_FIXED David Ahern
  0 siblings, 2 replies; 5+ messages in thread
From: Victor Stewart @ 2020-11-11  0:20 UTC (permalink / raw)
  To: Jonathan Lemon; +Cc: netdev

On Wed, Nov 11, 2020 at 12:09 AM Jonathan Lemon
<jonathan.lemon@gmail.com> wrote:
>
> On Sun, Nov 08, 2020 at 05:04:41PM +0000, Victor Stewart wrote:
> > hi all,
> >
> > i'm seeking input / comment on the idea of implementing full fledged
> > zerocopy UDP networking that uses persistent buffers allocated in
> > userspace... before I go off on a solo tangent with my first patches
> > lol.
> >
> > i'm sure there's been lots of thought/discussion on this before. of
> > course Willem added MSG_ZEROCOPY on the send path (pin buffers on
> > demand / per send). and something similar to what I speak of exists
> > with TCP_ZEROCOPY_RECEIVE.
> >
> > i envision something like a new flag like MSG_ZEROCOPY_FIXED that
> > "does the right thing" in the send vs recv paths.
>
> See the netgpu patches that I posted earlier; these will handle
> protocol independent zerocopy sends/receives.  I do have a working
> UDP receive implementation which will be posted with an updated
> patchset.

amazing i'll check it out. thanks.

does your udp zerocopy receive use mmap-ed buffers then vm_insert_pfn
/ remap_pfn_range to remap the physical pages of the received payload
into the memory submitted by recvmsg for reception?

https://lore.kernel.org/io-uring/acc66238-0d27-cd22-dac4-928777a8efbc@gmail.com/T/#t

^^ and check the thread from today on the io_uring mailing list going
into the mechanics of zerocopy sendmsg i have in mind.

(TLDR; i think it should be io_uring "only" so that we can collapse it
into a single completion event, aka when the NIC ACKs the
transmission. and exploiting the asynchrony of io_uring is the only
way to do this? so you'd submit your sendmsg operation to io_uring and
instead of receiving a completion event when the send gets enqueued,
you'd only get it upon failure or NIC ACK).

> --
> Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MSG_ZEROCOPY_FIXED
  2020-11-11  0:20   ` MSG_ZEROCOPY_FIXED Victor Stewart
@ 2020-11-11  0:59     ` Jonathan Lemon
  2020-11-13 16:41     ` MSG_ZEROCOPY_FIXED David Ahern
  1 sibling, 0 replies; 5+ messages in thread
From: Jonathan Lemon @ 2020-11-11  0:59 UTC (permalink / raw)
  To: Victor Stewart; +Cc: netdev

On Wed, Nov 11, 2020 at 12:20:22AM +0000, Victor Stewart wrote:
> On Wed, Nov 11, 2020 at 12:09 AM Jonathan Lemon
> <jonathan.lemon@gmail.com> wrote:
> >
> > On Sun, Nov 08, 2020 at 05:04:41PM +0000, Victor Stewart wrote:
> > > hi all,
> > >
> > > i'm seeking input / comment on the idea of implementing full fledged
> > > zerocopy UDP networking that uses persistent buffers allocated in
> > > userspace... before I go off on a solo tangent with my first patches
> > > lol.
> > >
> > > i'm sure there's been lots of thought/discussion on this before. of
> > > course Willem added MSG_ZEROCOPY on the send path (pin buffers on
> > > demand / per send). and something similar to what I speak of exists
> > > with TCP_ZEROCOPY_RECEIVE.
> > >
> > > i envision something like a new flag like MSG_ZEROCOPY_FIXED that
> > > "does the right thing" in the send vs recv paths.
> >
> > See the netgpu patches that I posted earlier; these will handle
> > protocol independent zerocopy sends/receives.  I do have a working
> > UDP receive implementation which will be posted with an updated
> > patchset.
> 
> amazing i'll check it out. thanks.
> 
> does your udp zerocopy receive use mmap-ed buffers then vm_insert_pfn
> / remap_pfn_range to remap the physical pages of the received payload
> into the memory submitted by recvmsg for reception?

The application mmaps buffers, which are then pinned into the kernel.
The NIC receives directly into the buffers and then notifies the application.

For completions, the mechanism that I prefer is having one of the
sends tagged with SO_NOTIFY message.  Then a completion notification is 
generated when the buffer corresponding to the NOTIFY is released by
the protocol stack.

The notifiations could be posted as an io_uring CQE.  (work TBD)

> https://lore.kernel.org/io-uring/acc66238-0d27-cd22-dac4-928777a8efbc@gmail.com/T/#t
> 
> ^^ and check the thread from today on the io_uring mailing list going
> into the mechanics of zerocopy sendmsg i have in mind.
> 
> (TLDR; i think it should be io_uring "only" so that we can collapse it
> into a single completion event, aka when the NIC ACKs the
> transmission. and exploiting the asynchrony of io_uring is the only
> way to do this? so you'd submit your sendmsg operation to io_uring and
> instead of receiving a completion event when the send gets enqueued,
> you'd only get it upon failure or NIC ACK).

I think it's likely better to have two completions:
  "this buffer has been submitted", and 
  "this buffer is released by the protocol".

This simplifies handling of errors, cancellations, and short writes.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MSG_ZEROCOPY_FIXED
  2020-11-11  0:20   ` MSG_ZEROCOPY_FIXED Victor Stewart
  2020-11-11  0:59     ` MSG_ZEROCOPY_FIXED Jonathan Lemon
@ 2020-11-13 16:41     ` David Ahern
  1 sibling, 0 replies; 5+ messages in thread
From: David Ahern @ 2020-11-13 16:41 UTC (permalink / raw)
  To: Victor Stewart, Jonathan Lemon; +Cc: netdev

On 11/10/20 5:20 PM, Victor Stewart wrote:
> On Wed, Nov 11, 2020 at 12:09 AM Jonathan Lemon
> <jonathan.lemon@gmail.com> wrote:
>>
>> On Sun, Nov 08, 2020 at 05:04:41PM +0000, Victor Stewart wrote:
>>> hi all,
>>>
>>> i'm seeking input / comment on the idea of implementing full fledged
>>> zerocopy UDP networking that uses persistent buffers allocated in
>>> userspace... before I go off on a solo tangent with my first patches
>>> lol.
>>>
>>> i'm sure there's been lots of thought/discussion on this before. of
>>> course Willem added MSG_ZEROCOPY on the send path (pin buffers on
>>> demand / per send). and something similar to what I speak of exists
>>> with TCP_ZEROCOPY_RECEIVE.
>>>
>>> i envision something like a new flag like MSG_ZEROCOPY_FIXED that
>>> "does the right thing" in the send vs recv paths.
>>
>> See the netgpu patches that I posted earlier; these will handle
>> protocol independent zerocopy sends/receives.  I do have a working
>> UDP receive implementation which will be posted with an updated
>> patchset.
> 
> amazing i'll check it out. thanks.
> 
> does your udp zerocopy receive use mmap-ed buffers then vm_insert_pfn
> / remap_pfn_range to remap the physical pages of the received payload
> into the memory submitted by recvmsg for reception?
> 
> https://lore.kernel.org/io-uring/acc66238-0d27-cd22-dac4-928777a8efbc@gmail.com/T/#t
> 
> ^^ and check the thread from today on the io_uring mailing list going
> into the mechanics of zerocopy sendmsg i have in mind.
> 
> (TLDR; i think it should be io_uring "only" so that we can collapse it
> into a single completion event, aka when the NIC ACKs the
> transmission. and exploiting the asynchrony of io_uring is the only
> way to do this? so you'd submit your sendmsg operation to io_uring and
> instead of receiving a completion event when the send gets enqueued,
> you'd only get it upon failure or NIC ACK).
> 

Do you have a working implementation? Right now, io_uring send / sendmsg
can return incomplete sends (only partial buffer is sent):

https://lore.kernel.org/io-uring/5324a8ca-bd5c-0599-d4d3-1e837338a7b5@gmail.com/

That will need to be solved. A simple solution is to track the offset
and resubmit:

https://lore.kernel.org/io-uring/fb72cffc-87f9-6072-3f3a-6648aacd310e@gmail.com/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-13 16:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-08 17:04 MSG_ZEROCOPY_FIXED Victor Stewart
2020-11-11  0:09 ` MSG_ZEROCOPY_FIXED Jonathan Lemon
2020-11-11  0:20   ` MSG_ZEROCOPY_FIXED Victor Stewart
2020-11-11  0:59     ` MSG_ZEROCOPY_FIXED Jonathan Lemon
2020-11-13 16:41     ` MSG_ZEROCOPY_FIXED David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).