linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	cl@linux.com, rppt@linux.vnet.ibm.com, netdev@vger.kernel.org,
	linux-mm@kvack.org, willemdebruijn.kernel@gmail.com,
	bjorn.topel@intel.com, magnus.karlsson@intel.com,
	alexander.duyck@gmail.com, mgorman@techsingularity.net,
	tom@herbertland.com, bblanco@plumgrid.com, tariqt@mellanox.com,
	saeedm@mellanox.com, jesse.brandeburg@intel.com, METH@il.ibm.com,
	vyasevich@gmail.com, brouer@redhat.com
Subject: Re: Designing a safe RX-zero-copy Memory Model for Networking
Date: Wed, 14 Dec 2016 10:39:14 +0100	[thread overview]
Message-ID: <20161214103914.3a9ebbbf@redhat.com> (raw)
In-Reply-To: <58505535.1080908@gmail.com>

On Tue, 13 Dec 2016 12:08:21 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> On 16-12-13 11:53 AM, David Miller wrote:
> > From: John Fastabend <john.fastabend@gmail.com>
> > Date: Tue, 13 Dec 2016 09:43:59 -0800
> >   
> >> What does "zero-copy send packet-pages to the application/socket that
> >> requested this" mean? At the moment on x86 page-flipping appears to be
> >> more expensive than memcpy (I can post some data shortly) and shared
> >> memory was proposed and rejected for security reasons when we were
> >> working on bifurcated driver.  
> > 
> > The whole idea is that we map all the active RX ring pages into
> > userspace from the start.
> > 
> > And just how Jesper's page pool work will avoid DMA map/unmap,
> > it will also avoid changing the userspace mapping of the pages
> > as well.
> > 
> > Thus avoiding the TLB/VM overhead altogether.
> >   

Exactly.  It is worth mentioning that pages entering the page pool need
to be cleared (measured cost 143 cycles), in order to not leak any
kernel info.  The primary focus of this design is to make sure not to
leak kernel info to userspace, but with an "exclusive" mode also
support isolation between applications.


> I get this but it requires applications to be isolated. The pages from
> a queue can not be shared between multiple applications in different
> trust domains. And the application has to be cooperative meaning it
> can't "look" at data that has not been marked by the stack as OK. In
> these schemes we tend to end up with something like virtio/vhost or
> af_packet.

I expect 3 modes, when enabling RX-zero-copy on a page_pool. The first
two would require CAP_NET_ADMIN privileges.  All modes have a trust
domain id, that need to match e.g. when page reach the socket.

Mode-1 "Shared": Application choose lowest isolation level, allowing
 multiple application to mmap VMA area.

Mode-2 "Single-user": Application request it want to be the only user
 of the RX queue.  This blocks other application to mmap VMA area.

Mode-3 "Exclusive": Application request to own RX queue.  Packets are
 no longer allowed for normal netstack delivery.

Notice mode-2 still requires CAP_NET_ADMIN, because packets/pages are
still allowed to travel netstack and thus can contain packet data from
other normal applications.  This is part of the design, to share the
NIC between netstack and an accelerated userspace application using RX
zero-copy delivery.


> Any ACLs/filtering/switching/headers need to be done in hardware or
> the application trust boundaries are broken.

The software solution outlined allow the application to make the choice
of what trust boundary it wants.

The "exclusive" mode-3 make most sense together with HW filters.
Already today, we support creating a new RX queue based on ethtool
ntuple HW filter and then you simply attach your application that queue
in mode-3, and have full isolation.

 
> If the above can not be met then a copy is needed. What I am trying
> to tease out is the above comment along with other statements like
> this "can be done with out HW filter features".

Does this address your concerns?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-14  9:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05 14:31 Designing a safe RX-zero-copy Memory Model for Networking Jesper Dangaard Brouer
2016-12-12  8:38 ` Mike Rapoport
2016-12-12  9:40   ` Jesper Dangaard Brouer
2016-12-12 14:14     ` Mike Rapoport
2016-12-12 14:49       ` John Fastabend
2016-12-12 17:13         ` Jesper Dangaard Brouer
2016-12-12 18:06           ` Christoph Lameter
2016-12-13 16:10             ` Jesper Dangaard Brouer
2016-12-13 16:36               ` Christoph Lameter
2016-12-13 17:43               ` John Fastabend
2016-12-13 19:53                 ` David Miller
2016-12-13 20:08                   ` John Fastabend
2016-12-14  9:39                     ` Jesper Dangaard Brouer [this message]
2016-12-14 16:32                       ` John Fastabend
2016-12-14 16:45                         ` Alexander Duyck
2016-12-14 21:29                           ` Jesper Dangaard Brouer
2016-12-14 22:45                             ` Alexander Duyck
2016-12-15  8:28                               ` Jesper Dangaard Brouer
2016-12-15 15:59                                 ` Alexander Duyck
2016-12-15 16:38                                 ` Christoph Lameter
2016-12-14 21:04                         ` Jesper Dangaard Brouer
2016-12-13 18:39               ` Hannes Frederic Sowa
2016-12-14 17:00                 ` Christoph Lameter
2016-12-14 17:37                   ` David Laight
2016-12-14 19:43                     ` Christoph Lameter
2016-12-14 20:37                       ` Hannes Frederic Sowa
2016-12-14 21:22                         ` Christoph Lameter
2016-12-13  9:42         ` Mike Rapoport
2016-12-12 15:10       ` Jesper Dangaard Brouer
2016-12-13  8:43         ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161214103914.3a9ebbbf@redhat.com \
    --to=brouer@redhat.com \
    --cc=METH@il.ibm.com \
    --cc=alexander.duyck@gmail.com \
    --cc=bblanco@plumgrid.com \
    --cc=bjorn.topel@intel.com \
    --cc=cl@linux.com \
    --cc=davem@davemloft.net \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=tom@herbertland.com \
    --cc=vyasevich@gmail.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).