Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>,
	Vasily Averin <vvs@parallels.com>,
	netfilter-devel@vger.kernel.org,
	Stephen Hemminger <stephen@networkplumber.org>,
	Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
Date: Sun, 4 May 2014 02:23:17 +0200	[thread overview]
Message-ID: <20140504002317.GD3514@breakpoint.cc> (raw)
In-Reply-To: <20140503233908.GA6297@localhost>

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > ---[patch rfc]---
> > > Currently bridge can silently drop ipv4 fragments.
> > > If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
> > > br_nf_pre_routing defragments incoming ipv4 fragments, but skb->nfct check
> > > in br_nf_dev_queue_xmit does not allow to re-fragment combined packet back,
> > > and therefore it is dropped in br_dev_queue_push_xmit without incrementing
> > > of any failcounters.
> > > 
> > > According to Patrick McHardy, bridge should not defragment and fragment
> > > packets unless conntrack is enabled.
> > > 
> > > This patch adds per network namespace flag to manage ipv4 defragmentation
> > > in bridge.
> > > 
> > > Signed-off-by: Vasily Averin <vvs@openvz.org>
> > 
> > Are we sure this is required rather than just removing the skb->nfct
> > test in br_nf_dev_queue_xmit() and be done with it?
> > 
> > Because that seems a lot saner to me, I fail to see how
> > 
> > if (skb->protocol == htons(ETH_P_IP) &&
> >            skb->len + nf_bridge_mtu_reduction(skb) >
> > 		    skb->dev->mtu && !skb_is_gso(skb)) {
> > 
> > Would evaluate as 'true' without nf_defrag_ipv4 module loaded.
> > 
> > [ its from br_nf_dev_queue_xmit function ]
> 
> I think we still may see IP packets larger than the mtu in that path.
> It would be a rare case since we need that the bridge has different
> (smaller) mtu than the sender, but still possible. The is_skb_forwardable()
> check in the current tree snapshot comes just a bit later, so if we
> remove that skb->nfct, the bridge will fragment large packets.

I have to confess that I never tried that out; I assumed the nic
would toss the over-mtu frame.

> In general, I believe bridges should silently drop packets that are
> larger than the mtu and they should perform no fragmentation handling,
> no gathering and no [re]fragmentation. They are transparent devices
> that operate at layer 2.

Not sure I agree.  Silent drops are bad (or perhaps I misunderstand you,
if we do 'silent drop' in br_nf_dev_queue_xmit there should at least be
a mib counter of some sort).

The last part stands of course, I agree that bridges should be
transparent and not do frag handling etc.

> The conntrack case is a special case that forces us to enable
> fragmentation handling since we get sort of a bridge that inspects
> layer 3 and 4 packet information. So we have sort of, let's call it, a
> mutant bridge.

Yes 8-/

> We also have the tproxy target and the socket match, they seem to
> require defragmentation as well, I'm afraid the skb->nfct check will
> not help for those cases. I think that we need some counter to know
> how many clients we have that require the gathering + fragmentation
> code, so if we have at least one, we have to enable it.

Last time I tried TPROXY on top of bridge it was a pain in the neck.

Essentially one has to build a 'brouter' and force packets
upwards the stack (DROP via ebtables in broute table).

Such packets will not be seen by the bridge since they're routed
normally via the ip stack for local delivery.

(-j TPROXY needs policy routing for the redirect to work).

It is also rather fragile in my experience (due to ebtables just
seeing ethernet frames doing 'broute DROP only for tcp port 80' doesn't work
universally since we don't see netfilter-defragmented packets at that stage).

All things considered I think that just doing the re-fragmentation (aka
just remove skb->nfct test) is really the least-sucky one of the options
we have.

If you do IP NAT/TPROXY/conntrack on bridges you're already asking for varying
degrees of layering violations, so I think it would at least be preferable to
have one that "works" :-)

next prev parent reply	other threads:[~2014-05-04  0:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20140430092905.GA4318@localhost>
2014-05-02 15:40 ` [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled Vasily Averin
2014-05-02 22:55   ` Florian Westphal
2014-05-03  7:15     ` Vasily Averin
2014-05-03  7:18     ` [PATCH RFC v2] " Vasily Averin
2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
2014-05-04  0:23       ` Florian Westphal [this message]
2014-05-04 11:15         ` Pablo Neira Ayuso
2014-05-04 20:06       ` Bart De Schuymer
2014-05-04 23:01         ` Pablo Neira Ayuso
2014-05-05 12:55       ` [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge Vasily Averin
2014-05-05 20:57         ` Florian Westphal
2014-05-07 13:27           ` Vasily Averin
2014-05-07 18:49             ` Bart De Schuymer
     [not found]       ` <cover.1399292146.git.vvs@openvz.org>
2014-05-05 12:55         ` [PATCH 1/7] nf: added per net namespace ipv4 defragmentation users counter Vasily Averin
2014-05-05 12:55         ` [PATCH 2/7] nf: initialization of " Vasily Averin
2014-05-05 12:56         ` [PATCH 3/7] nf: increment and decrement functions for " Vasily Averin
2014-05-05 12:56         ` [PATCH 4/7] nf: ipv4 defragmentation users counter changes in nf_conntrack_ipv4 module Vasily Averin
2014-05-05 12:56         ` [PATCH 5/7] nf: ipv4 defragmentation users counter changes in TPROXY target Vasily Averin
2014-05-05 12:56         ` [PATCH 6/7] nf: ipv4 defragmentation users counter changes in xt_socket match Vasily Averin
2014-05-05 12:56         ` [PATCH 7/7] nf: use counter to manage ipv4 defragmentation on bridge Vasily Averin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140504002317.GD3514@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=stephen@networkplumber.org \
    --cc=vvs@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).