* disturbing MTU experiment
[not found] <20030408154101.13919.89741.Mailman@kashyyyk>
@ 2003-04-08 16:44 ` Don Cohen
2003-04-08 17:47 ` netfilter
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Don Cohen @ 2003-04-08 16:44 UTC (permalink / raw)
To: netfilter-devel
(Let me know if there's a more appropriate place to send this.)
My understanding of IPv4 is that
- the forwarding path between two machines might have links of
different MTUs,
- when an IP datagram marked DF is too big for one of those links, the
machine that would otherwise forward it is supposed to return an
ICMP unreachable reply
- the sending machine is supposed to react to that ICMP reply, either
by sending a smaller datagram or by allowing fragmentation or of
course, giving up.
The following experiment ought to exhibit this behavior:
internet --- eth0:linux_firewall:eth1 --- client
- on linux firewall do
ifconfig eth1 mtu 1400
- on client machine try to use the internet, e.g., run a web browser
When I do this I find, to my dismay, that MANY sites don't work!
A tcpdump on the firewall shows the web server in the internet sending
a DF packet of size 1500, the firewall sending the ICMP reply, and the
server ignoring it, i.e., resending the large packet over and over.
1. Is there something wrong with my experiment or is a large part of
the internet really broken in this way? Do others out there see the
same thing?
If I am correct, then
2. What's the cause of this breakage? Are servers filtering ICMP due
to attacks? Are they behind firewalls that don't know how to forward
the ICMP packets? Are ISPs filtering the ICMP packets? Are the
servers themselves running broken network code? If so, which are the
OS's with this defect?
3. Are links with MTU < 1500 are extremely rare in the internet?
The fact that the internet mostly seems to work would suggest that.
Surely this has not always been the case. When did it happen?
Wouldn't this break various tunneling protocols? At least those that
try to encapsulate one packet in one IP packet. Maybe those are also
not much used?
4. What can be done about it? For instance, what would happen if
the machines that send ICMP replies were to go ahead and fragment
the DF packets (perhaps instead of sending out a second ICMP to the
same offender)?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: disturbing MTU experiment
2003-04-08 16:44 ` disturbing MTU experiment Don Cohen
@ 2003-04-08 17:47 ` netfilter
2003-04-08 18:40 ` Patrick Schaaf
2003-04-08 18:46 ` Jim Fleming
2003-04-08 20:43 ` disturbing MTU experiment Martin Josefsson
2 siblings, 1 reply; 8+ messages in thread
From: netfilter @ 2003-04-08 17:47 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 3257 bytes --]
On Tue, Apr 08, 2003 at 09:44:08AM -0700, Don Cohen wrote:
>
> (Let me know if there's a more appropriate place to send this.)
A general [[I|i]inter]networking mailing list perhaps, but I am sure
everybody here is familiar with the phenomenon you describe.
> My understanding of IPv4 is that
> - the forwarding path between two machines might have links of
> different MTUs,
> - when an IP datagram marked DF is too big for one of those links, the
> machine that would otherwise forward it is supposed to return an
> ICMP unreachable reply
And include a suggestion as to how big the datagram should be to pass
through the link.
> - the sending machine is supposed to react to that ICMP reply, either
> by sending a smaller datagram or by allowing fragmentation or of
> course, giving up.
>
> The following experiment ought to exhibit this behavior:
>
> internet --- eth0:linux_firewall:eth1 --- client
>
> - on linux firewall do
> ifconfig eth1 mtu 1400
> - on client machine try to use the internet, e.g., run a web browser
We all know what is coming... :-)
> When I do this I find, to my dismay, that MANY sites don't work!
Get in line buddy. :-)
> A tcpdump on the firewall shows the web server in the internet sending
> a DF packet of size 1500, the firewall sending the ICMP reply, and the
> server ignoring it, i.e., resending the large packet over and over.
Yup. Happens all to disturbingly often.
> 1. Is there something wrong with my experiment or is a large part of
> the internet really broken in this way? Do others out there see the
> same thing?
No, yes and yes.
> 2. What's the cause of this breakage?
It's called a "PMTU blackhole".
> Are servers filtering ICMP due
> to attacks?
Naive packet filter administrators are filtering out the ICMPs yes.
Instead of understanding for themselves what they should filter out
they follow "recipies" and listent to all the FUD about how
"dangerous" ICMP is and just filter it all, lock, stock and barrel.
> Are they behind firewalls that don't know how to forward
> the ICMP packets?
Oh, the firewalls know how to forward them, their administrators have
just disallowed it.
> Are ISPs filtering the ICMP packets?
This could be to a certain extent a problem, but in most cases it's in
the last mile (i.e. between the ISP and the end node).
> 3. Are links with MTU < 1500 are extremely rare in the internet?
Good question.
> The fact that the internet mostly seems to work would suggest that.
Not really. It could suggest that OR it could suggest that the portion
of the Internet that works is run by competent admins. But of course,
it's a combination of the two in reality.
> Surely this has not always been the case. When did it happen?
When the Internet stopped being a friendly community and firewalls
were being erected to keep the evils out.
> 4. What can be done about it?
There was/is a project on the Internet to identify and notify owners
of PMTU blackholes. I don't recall the URL however. Maybe some
Googlin' will turn it up.
In any case, Googlin' for "PMTU blackhole" will turn up lots more
information for you.
b.
--
Brian J. Murrell
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: disturbing MTU experiment
2003-04-08 17:47 ` netfilter
@ 2003-04-08 18:40 ` Patrick Schaaf
2003-04-08 20:46 ` netfilter
0 siblings, 1 reply; 8+ messages in thread
From: Patrick Schaaf @ 2003-04-08 18:40 UTC (permalink / raw)
To: netfilter-devel
> > 3. Are links with MTU < 1500 are extremely rare in the internet?
>
> Good question.
They are increasingly common. PPPoE DSL links run at 1496, and all kinds
of tunneling results in even smaller MTU, e.g. 1454 for maximally
conservative L2TP links.
> > 4. What can be done about it?
>
> There was/is a project on the Internet to identify and notify owners
> of PMTU blackholes. I don't recall the URL however.
http://mss.phildev.net/
http://lartc.org/howto/lartc.cookbook.mtu-mss.html
best regards
Patrick
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: disturbing MTU experiment
2003-04-08 16:44 ` disturbing MTU experiment Don Cohen
2003-04-08 17:47 ` netfilter
@ 2003-04-08 18:46 ` Jim Fleming
2003-04-08 21:57 ` Tom Marshall
2003-04-08 20:43 ` disturbing MTU experiment Martin Josefsson
2 siblings, 1 reply; 8+ messages in thread
From: Jim Fleming @ 2003-04-08 18:46 UTC (permalink / raw)
To: Don Cohen, netfilter-devel
Are you on the AM Internet or the FM InterNAT ?
128-bit DNS AAAA Record Flag Day Formats
2003:[IPv4]:[SDLL.OFFF.FFFF.TTTT]:[64-bit IPv8 or IPv16 Persistent Address]
[YMDD]:[IPv4]:[SDLL.OFFF.FFFF.TTTT]:[64-bit IPv8 or IPv16 Persistent Address]
1-bit to set the Reserved/Spare ("AM/FM") bit in Fragment Offset [S]
1-bit to set the Don't Fragment (DF) bit [D]
2-bits to select 1 of 4 common TTL values (255, 128, 32, 8) [LL]
1-bit for Options Control [O]
7-bits to set the Identification Field(dst) [FFFFFFF]
4-bits to set the TOS(dst) Field [TTTT]
Default SDLL.OFFF.FFFF.TTTT = 0000.0000.0000.0000
FFF.FFFF.TTTT = GGG.SSSS.SSSS
http://www.ntia.doc.gov/ntiahome/domainname/130dftmail/unir.txt
IPv8
0QQQQGGGSSSSSSSS[32-bits][Port]
IPv16
0QQQQGGGSSSSSSSS[32-bits][Port]
1AAAAAAAAAAAAAAA[32-bits][Port]
A...A=ASN=32769...65535
Jim Fleming
http://www.IPv8.info
----- Original Message -----
From: "Don Cohen" <don-netfil@isis.cs3-inc.com>
To: <netfilter-devel@lists.netfilter.org>
Sent: Tuesday, April 08, 2003 11:44 AM
Subject: disturbing MTU experiment
>
> (Let me know if there's a more appropriate place to send this.)
>
> My understanding of IPv4 is that
> - the forwarding path between two machines might have links of
> different MTUs,
> - when an IP datagram marked DF is too big for one of those links, the
> machine that would otherwise forward it is supposed to return an
> ICMP unreachable reply
> - the sending machine is supposed to react to that ICMP reply, either
> by sending a smaller datagram or by allowing fragmentation or of
> course, giving up.
>
> The following experiment ought to exhibit this behavior:
>
> internet --- eth0:linux_firewall:eth1 --- client
>
> - on linux firewall do
> ifconfig eth1 mtu 1400
> - on client machine try to use the internet, e.g., run a web browser
>
> When I do this I find, to my dismay, that MANY sites don't work!
> A tcpdump on the firewall shows the web server in the internet sending
> a DF packet of size 1500, the firewall sending the ICMP reply, and the
> server ignoring it, i.e., resending the large packet over and over.
>
> 1. Is there something wrong with my experiment or is a large part of
> the internet really broken in this way? Do others out there see the
> same thing?
>
> If I am correct, then
>
> 2. What's the cause of this breakage? Are servers filtering ICMP due
> to attacks? Are they behind firewalls that don't know how to forward
> the ICMP packets? Are ISPs filtering the ICMP packets? Are the
> servers themselves running broken network code? If so, which are the
> OS's with this defect?
>
> 3. Are links with MTU < 1500 are extremely rare in the internet?
> The fact that the internet mostly seems to work would suggest that.
> Surely this has not always been the case. When did it happen?
> Wouldn't this break various tunneling protocols? At least those that
> try to encapsulate one packet in one IP packet. Maybe those are also
> not much used?
>
> 4. What can be done about it? For instance, what would happen if
> the machines that send ICMP replies were to go ahead and fragment
> the DF packets (perhaps instead of sending out a second ICMP to the
> same offender)?
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: disturbing MTU experiment
2003-04-08 16:44 ` disturbing MTU experiment Don Cohen
2003-04-08 17:47 ` netfilter
2003-04-08 18:46 ` Jim Fleming
@ 2003-04-08 20:43 ` Martin Josefsson
2 siblings, 0 replies; 8+ messages in thread
From: Martin Josefsson @ 2003-04-08 20:43 UTC (permalink / raw)
To: Don Cohen; +Cc: Netfilter-devel
On Tue, 2003-04-08 at 18:44, Don Cohen wrote:
I'm just going to note that the NAT code in iptables appears to have a
small bug in the regard of icmp-errors. I've seen it with links that
have smaller mtu (1476, GRE tunnels) and packets with the DF flag set.
Host A <-> router B <- GRE tunnel -> router C <-> host D
If I add a DNAT-rule in router B so that certain connections are
redirected to host D things will go wrong.
host A sends a 1500byte packet to the ip it thinks it's connected to and
router B will try to send that to host D and will fail because of the
smaller mtu and DF flag.
Router B then generates an icmp fragmentation needed error-message and
sends it to host A. This is where we have the problem.
The icmp-error packet which was generated by router B has both the
internal header and the external header rewritten. The internal header
becomes correct (indicating the ip host A thinkgs it's connected to) but
the external header (the ip-header of the icmp-error) will be wrong. It
was also rewritten and now has the ip host A thinks it's connected to as
source-ip... oops, should be the ip of router B. Linux ip-stack doesn't
accept this icmp packet (havn't checked if conntrack thinks it qualifies
as related or not).
The NAT-code will do the right thing for icmp-errors that's passing
through the router but not for locally generated errors. I havn't gone
through the code enough yet to know how to solve it best...
--
/Martin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: disturbing MTU experiment
2003-04-08 18:40 ` Patrick Schaaf
@ 2003-04-08 20:46 ` netfilter
0 siblings, 0 replies; 8+ messages in thread
From: netfilter @ 2003-04-08 20:46 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 656 bytes --]
On Tue, Apr 08, 2003 at 08:40:51PM +0200, Patrick Schaaf wrote:
>
> They are increasingly common.
Of course, this is the corect answer. I was thinking about the link
at the other end, but of course, that is irrelevant.
> PPPoE DSL links run at 1496,
Right.
> and all kinds
> of tunneling results in even smaller MTU, e.g. 1454 for maximally
> conservative L2TP links.
Right.
> http://mss.phildev.net/
That's the one I was thinking of!
> http://lartc.org/howto/lartc.cookbook.mtu-mss.html
And this is a very workable (even though it's a band-aid to fix other
people's brokenness) solution.
b.
--
Brian J. Murrell
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: disturbing MTU experiment
2003-04-08 18:46 ` Jim Fleming
@ 2003-04-08 21:57 ` Tom Marshall
2003-04-08 22:08 ` NF_IP_PRE_ROUTING hook Dirk Morris
0 siblings, 1 reply; 8+ messages in thread
From: Tom Marshall @ 2003-04-08 21:57 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 758 bytes --]
This is obviously flamebait. Jim Fleming is currently in his second
suspension from posting on the IETF lists[1]. I thought that was enough
but apparently it's driven him to other lists in pursuit of arguments.
To answer the question, Jim, I don't even know if we're on the same planet.
Don't bother educating me on that issue because I won't see the reply.
[1] http://www1.ietf.org/mail-archive/ietf/Current/msg19652.html
On Tue, Apr 08, 2003 at 01:46:11PM -0500, Jim Fleming wrote:
> Are you on the AM Internet or the FM InterNAT ?
>
> [irrelevant stuff removed]
>
> Jim Fleming
> http://www.IPv8.info
--
A great many people think they are thinking when they are merely
rearranging their prejudices.
-- William James
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* NF_IP_PRE_ROUTING hook
2003-04-08 21:57 ` Tom Marshall
@ 2003-04-08 22:08 ` Dirk Morris
0 siblings, 0 replies; 8+ messages in thread
From: Dirk Morris @ 2003-04-08 22:08 UTC (permalink / raw)
To: netfilter-devel
I have hacked up some hooks and targets over the past few months
They all seem to work fine except in the NF_IP_PRE_ROUTING table
(fine in FORWARD and POST ROUTING)
Is there something different about hooks/targets working there?
they are all accessing the network headers in the skb,
maybe those arent filled out until after PREROUTING?
they consitently cause a kernel panic, but its hard for me to see where.
Thanks for any help.
-dirk
//* */ dmorris (* www.neogenen.com *)
main(){int _=0;for(;_!=1687193639&&putchar(" \
dn\nc@oge.m"[abs(_%11)]);_=(_*42913)+115127);}
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-04-08 22:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030408154101.13919.89741.Mailman@kashyyyk>
2003-04-08 16:44 ` disturbing MTU experiment Don Cohen
2003-04-08 17:47 ` netfilter
2003-04-08 18:40 ` Patrick Schaaf
2003-04-08 20:46 ` netfilter
2003-04-08 18:46 ` Jim Fleming
2003-04-08 21:57 ` Tom Marshall
2003-04-08 22:08 ` NF_IP_PRE_ROUTING hook Dirk Morris
2003-04-08 20:43 ` disturbing MTU experiment Martin Josefsson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.