netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Tom Herbert <tom@herbertland.com>
Cc: Brenden Blanco <bblanco@plumgrid.com>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Ari Saha <as754m@att.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	john fastabend <john.fastabend@gmail.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Thomas Graf <tgraf@suug.ch>,
	Daniel Borkmann <daniel@iogearbox.net>,
	brouer@redhat.com
Subject: Re: [PATCH v6 01/12] bpf: add XDP prog type for early driver filter
Date: Mon, 11 Jul 2016 13:36:32 +0200	[thread overview]
Message-ID: <20160711133632.483bf2cb@redhat.com> (raw)
In-Reply-To: <CALx6S36GiWpAsQTePsX4E8kaVTpzjWhwf8T7XKo_UTQv_8-nyw@mail.gmail.com>

On Sun, 10 Jul 2016 15:27:38 -0500
Tom Herbert <tom@herbertland.com> wrote:

> On Sun, Jul 10, 2016 at 8:37 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > On Sat, 9 Jul 2016 08:47:52 -0500
> > Tom Herbert <tom@herbertland.com> wrote:
> >  
> >> On Sat, Jul 9, 2016 at 3:14 AM, Jesper Dangaard Brouer
> >> <brouer@redhat.com> wrote:  
> >> > On Thu,  7 Jul 2016 19:15:13 -0700
> >> > Brenden Blanco <bblanco@plumgrid.com> wrote:
> >> >  
> >> >> Add a new bpf prog type that is intended to run in early stages of the
> >> >> packet rx path. Only minimal packet metadata will be available, hence a
> >> >> new context type, struct xdp_md, is exposed to userspace. So far only
> >> >> expose the packet start and end pointers, and only in read mode.
> >> >>
> >> >> An XDP program must return one of the well known enum values, all other
> >> >> return codes are reserved for future use. Unfortunately, this
> >> >> restriction is hard to enforce at verification time, so take the
> >> >> approach of warning at runtime when such programs are encountered. The
> >> >> driver can choose to implement unknown return codes however it wants,
> >> >> but must invoke the warning helper with the action value.  
> >> >
> >> > I believe we should define a stronger semantics for unknown/future
> >> > return codes than the once stated above:
> >> >  "driver can choose to implement unknown return codes however it wants"
> >> >
> >> > The mlx4 driver implementation in:
> >> >  [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program
> >> >
> >> > On Thu,  7 Jul 2016 19:15:16 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
> >> >  
> >> >> +             /* A bpf program gets first chance to drop the packet. It may
> >> >> +              * read bytes but not past the end of the frag.
> >> >> +              */
> >> >> +             if (prog) {
> >> >> +                     struct xdp_buff xdp;
> >> >> +                     dma_addr_t dma;
> >> >> +                     u32 act;
> >> >> +
> >> >> +                     dma = be64_to_cpu(rx_desc->data[0].addr);
> >> >> +                     dma_sync_single_for_cpu(priv->ddev, dma,
> >> >> +                                             priv->frag_info[0].frag_size,
> >> >> +                                             DMA_FROM_DEVICE);
> >> >> +
> >> >> +                     xdp.data = page_address(frags[0].page) +
> >> >> +                                                     frags[0].page_offset;
> >> >> +                     xdp.data_end = xdp.data + length;
> >> >> +
> >> >> +                     act = bpf_prog_run_xdp(prog, &xdp);
> >> >> +                     switch (act) {
> >> >> +                     case XDP_PASS:
> >> >> +                             break;
> >> >> +                     default:
> >> >> +                             bpf_warn_invalid_xdp_action(act);
> >> >> +                     case XDP_DROP:
> >> >> +                             goto next;
> >> >> +                     }
> >> >> +             }  
> >> >
> >> > Thus, mlx4 choice is to drop packets for unknown/future return codes.
> >> >
> >> > I think this is the wrong choice.  I think the choice should be
> >> > XDP_PASS, to pass the packet up the stack.
> >> >
> >> > I find "XDP_DROP" problematic because it happen so early in the driver,
> >> > that we lost all possibilities to debug what packets gets dropped.  We
> >> > get a single kernel log warning, but we cannot inspect the packets any
> >> > longer.  By defaulting to XDP_PASS all the normal stack tools (e.g.
> >> > tcpdump) is available.
> >> >  
> >>
> >> It's an API issue though not a problem with the packet. Allowing
> >> unknown return codes to pass seems like a major security problem also.  
> >
> > We have the full power and flexibility of the normal Linux stack to
> > drop these packets.  And from a usability perspective it gives insight
> > into what is wrong and counters metrics.  Would you rather blindly drop
> > e.g. 0.01% of the packets in your data-centers without knowing.
> >  
> This is not blindly dropping packets; the bad action should be logged,
> counters incremented, and packet could be passed to the stack as an
> error if deeper inspection is needed. 

Well, the patch only logs a single warning.  There is no method of
counting or passing to the stack in this proposal.  And adding such
things is a performance regression risk, and DoS vector in itself.

> IMO, I would rather drop
> something not understood than accept it-- determinism is a goal also.
> 
> > We already talk about XDP as an offload mechanism.  Normally when
> > loading a (XDP) "offload" program it should be rejected, e.g. by the
> > validator.  BUT we cannot validate all return eBPF codes, because they
> > can originate from a table lookup.  Thus, we _do_ allow programs to be
> > loaded, with future unknown return code.
> >  This then corresponds to only part of the program can be offloaded,
> > thus the natural response is to fallback, handling this is the
> > non-offloaded slower-path.
> >
> > I see the XDP_PASS fallback as a natural way of supporting loading
> > newer/future programs on older "versions" of XDP.  
> 
> Then in this model we could only add codes that allow passing packets.
> For instance, what if a new return code means "Drop this packet and
> log it as critical because if you receive it the stack will crash"?

Drop is drop. I don't see how we would need to drop in a "new" way.
If you need to log a critical event do it in the eBPF program.

> ;-) IMO ignoring something not understood for the sake of
> extensibility is a red herring. In the long run doing this actually
> limits are ability to extend things for both APIs and protocols (a
> great example of this is VLXAN that mandates  unknown flags are
> ignored in RX so VXLAN-GPE has a be a new incompatible protocol to get
> a next protocol field).
> 
> >   E.g. I can have a XDP program that have a valid filter protection
> > mechanism, but also use a newer mechanism, and my server fleet contains
> > different NIC vendors, some NICs only support the filter part.  Then I
> > want to avoid having to compile and maintain different XDP/eBPF
> > programs per NIC vendor. (Instead I prefer having a Linux stack
> > fallback mechanism, and transparently XDP offload as much as the NIC
> > driver supports).
> >  
> As Brenden points out, fallbacks easily become DOS vectors.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-07-11 11:36 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08  2:15 [PATCH v6 00/12] Add driver bpf hook for early packet drop and forwarding Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 01/12] bpf: add XDP prog type for early driver filter Brenden Blanco
2016-07-09  8:14   ` Jesper Dangaard Brouer
2016-07-09 13:47     ` Tom Herbert
2016-07-10 13:37       ` Jesper Dangaard Brouer
2016-07-10 17:09         ` Brenden Blanco
2016-07-10 20:30           ` Tom Herbert
2016-07-11 10:15             ` Daniel Borkmann
2016-07-11 12:58               ` Jesper Dangaard Brouer
2016-07-10 20:27         ` Tom Herbert
2016-07-11 11:36           ` Jesper Dangaard Brouer [this message]
2016-07-10 20:56   ` Tom Herbert
2016-07-11 16:51     ` Brenden Blanco
2016-07-11 21:21       ` Daniel Borkmann
2016-07-10 21:04   ` Tom Herbert
2016-07-11 13:53     ` Jesper Dangaard Brouer
2016-07-08  2:15 ` [PATCH v6 02/12] net: add ndo to set xdp prog in adapter rx Brenden Blanco
2016-07-10 20:59   ` Tom Herbert
2016-07-11 10:35     ` Daniel Borkmann
2016-07-08  2:15 ` [PATCH v6 03/12] rtnl: add option for setting link xdp prog Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program Brenden Blanco
2016-07-09 14:07   ` Or Gerlitz
2016-07-10 15:40     ` Brenden Blanco
2016-07-10 16:38       ` Tariq Toukan
2016-07-09 19:58   ` Saeed Mahameed
2016-07-09 21:37     ` Or Gerlitz
2016-07-10 15:25     ` Tariq Toukan
2016-07-10 16:05       ` Brenden Blanco
2016-07-11 11:48         ` Saeed Mahameed
2016-07-11 21:49           ` Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 05/12] Add sample for adding simple drop program to link Brenden Blanco
2016-07-09 20:21   ` Saeed Mahameed
2016-07-11 11:09   ` Jamal Hadi Salim
2016-07-11 13:37     ` Jesper Dangaard Brouer
2016-07-16 14:55       ` Jamal Hadi Salim
2016-07-08  2:15 ` [PATCH v6 06/12] net/mlx4_en: add page recycle to prepare rx ring for tx support Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 07/12] bpf: add XDP_TX xdp_action for direct forwarding Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 08/12] net/mlx4_en: break out tx_desc write into separate function Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 09/12] net/mlx4_en: add xdp forwarding and data write support Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 10/12] bpf: enable direct packet data write for xdp progs Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 11/12] bpf: add sample for xdp forwarding and rewrite Brenden Blanco
2016-07-08  2:15 ` [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path Brenden Blanco
2016-07-08  3:56   ` Eric Dumazet
2016-07-08  4:16     ` Alexei Starovoitov
2016-07-08  6:56       ` Eric Dumazet
2016-07-08 16:49         ` Brenden Blanco
2016-07-10 20:48           ` Tom Herbert
2016-07-10 20:50           ` Tom Herbert
2016-07-11 14:54             ` Jesper Dangaard Brouer
2016-07-08 15:20     ` Jesper Dangaard Brouer
2016-07-08 16:02       ` [net-next PATCH RFC] mlx4: RX prefetch loop Jesper Dangaard Brouer
2016-07-11 11:09         ` Jesper Dangaard Brouer
2016-07-11 16:00           ` Brenden Blanco
2016-07-11 23:05           ` Alexei Starovoitov
2016-07-12 12:45             ` Jesper Dangaard Brouer
2016-07-12 16:46               ` Alexander Duyck
2016-07-12 19:52                 ` Jesper Dangaard Brouer
2016-07-13  1:37                   ` Alexei Starovoitov
2016-07-10 16:14 ` [PATCH v6 00/12] Add driver bpf hook for early packet drop and forwarding Tariq Toukan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160711133632.483bf2cb@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=as754m@att.com \
    --cc=bblanco@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).