netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 -next 0/5] don't exceed original maximum fragment size when refragmenting
@ 2015-05-04 20:54 Florian Westphal
  2015-05-04 20:54 ` [PATCH V2 -next 1/5] ip: reject too-big defragmented DF-skb when forwarding Florian Westphal
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Florian Westphal @ 2015-05-04 20:54 UTC (permalink / raw)
  To: netdev; +Cc: hannes, jesse

Hello,

We would like to propose this patchset again. Only minor details
changed since the last version, we incorporated the suggestion from
Jesse to always store the size of the largest fragment received,
regardless of the DF bit.

Thus we never generate bigger fragments as originally received
regardless if DF is set ot not.

We would like to summarize the current discussion on this topic and
again would like you to consider applying this patchset to net-next:

Several proposals were suggested:

#1 employ GRO engine
	- Reassembly would only work within one napi poll run. But
	  reassembly must happen even independently of the interface
	  the frame gets received. Delays cause single fragments to
	  arrive in different napi runs, which wouldn't be aggregated.

	- We would have to kill the 1:1 correspondence between
          aggregation and segmentation: within the TCP protocol we can
          stop aggregating frames at any point without any harm
          because of it being a streaming protocol. Fragmentation is
          different in the way that we need to reassemble the complete
          packet before processing, we cannot make sense of 'half skbs'.

#2 keep fragments attached to reassembled

The idea is to attach the original skbs to the reassembled one, so the
networking stack can choose which ones to use depending on the use
case. Forwarding would operate on the original ones while code dealing
with PACKET_HOST frames would use the reassembled one.

	- We have the overhead to carry more skbs around, which we
          currently don't do.

	- This information cannot be stored in any of the currently
          available fields in the skb or shared_info. That said, a new
          pointer would be necessary in every skb, independently if it
          is fragmented or not. This change does impact fast path and
          skb size.

	- sometimes using reassembled skb or the original ones could
          lead to TOCTTOU attacks in some situations, like packet is
          split in the TCP header, core stacks sees complete
          reassembled TCP packet but netfilter only part of the
          header, so different decisions might be done

	- it does impact fast path in netfilter for every packet:
	  pskb_may_pull is not enough to check if we can eat enough of
	  the header, actually because of overlapping or duplicate
	  fragments we have to touch all those fragments, thus
	  creating new slow paths in netfilter

	- all netfilter helpers would need to adapt in case e.g. a
          udp packet containing a sip message is fragmented.

	- in case we change fragment size, we don't have clear
          semantics and the only behaviour which makes sense is what
          this patchset does (i.e., refragment).

	- still, even such complex change does not allow us to act as
          transparent router/bridge: we still have to queue up
          fragments; in case we cannot reassemble we have to drop
          them (else firewall bypass is possible).

#3 max_frag_size vector

As it is based on the idea of keep fragments attached to reassembly it
inherits a lot of the problems stated in section #2.

	- Still needs an additional way to store this information in
          the skb, thus enlarging a structure we try to shrink.

	- TOCTTOU attacks are not possible because we do inspect the
          same data all the time

	- ... but at the same time, we cannot deal with overlapping or
          duplicated fragments (without making this complex again)

For years the linux kernel never correctly handled fragmented packets
in forwarding L3 or L2 cases. We never heard any complaints. These
patches try to make Linux a better internet citizen, correctly
handling some edge cases, without harming core code and affecting
performance.

Thus we consider our proposed patches superior in all aspects. We are
happy to discuss any ideas how to solve this otherwise.

We investigated alternate approaches to allow transparent refragmentation
for the common case of "well-formed" (i.e., non-overlapping, no duplicates, ..)
fragments.  Unfortunately it involves removing an ip defragmentation
optimization in case netfilter conntrack is active.

The two patches that enable this are included as [RFC] as part of this series
so they can be discussed.

Thanks,
Hannes, Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-04 23:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-04 20:54 [PATCH V2 -next 0/5] don't exceed original maximum fragment size when refragmenting Florian Westphal
2015-05-04 20:54 ` [PATCH V2 -next 1/5] ip: reject too-big defragmented DF-skb when forwarding Florian Westphal
2015-05-04 20:54 ` [PATCH V2 -next 2/5] ipv6: don't increase size when refragmenting forwarded ipv6 skbs Florian Westphal
2015-05-04 20:54 ` [PATCH V2 -next 3/5] ip_fragment: don't remove df bit from defragmented packets Florian Westphal
2015-05-04 20:54 ` [PATCH RFC 4/5] net: ip_fragment: attempt to preserve frag sizes for netfilter defragmented skbs Florian Westphal
2015-05-04 20:54 ` [PATCH RFC 5/5] net: set DF bit on fragment list skbs if DF was set earlier Florian Westphal
2015-05-04 23:29 ` [PATCH V2 -next 0/5] don't exceed original maximum fragment size when refragmenting David Miller
2015-05-04 23:48   ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).