netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* linux bridge and MTU
@ 2008-10-29 13:24 Michael Tokarev
  2008-10-29 15:26 ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Tokarev @ 2008-10-29 13:24 UTC (permalink / raw)
  To: netdev

There's an interesting interaction between different
MTU (max transmission unit) values on interfaces
which are bridged together.  I'm trying to understand
how it works.

Suppose there are 2 interfaces in the bridge, one is
with standard 1500 mtu and another is, say 3500.

As far as I can see, bridge interface sets its mtu to
be the smallest of all the components.  Which seems
to be the right ting to do.

But now the question is - is it possible to communicate
over the interface with larger MTU using full frames?

For example, here are a tcpdump from a single ping-pong
"pair" between host "B" which is connected to a larger-MTU
interface, and host "A" which is with the bridge described
above, using 3000-byte packets:

IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 3028) B > A ICMP echo request, id 35331, seq 2, length 3008

IP (tos 0x0, ttl 64, id 39747, offset 0, flags [+], proto ICMP (1), length 1500) B > A: ICMP echo reply, id 35331, seq 2, length 1480
IP (tos 0x0, ttl 64, id 39747, offset 1480, flags [+], proto ICMP (1), length 1500) B > A: icmp
IP (tos 0x0, ttl 64, id 39747, offset 2960, flags [none], proto ICMP (1), length 68) B > A: icmp

So, the reply comes in 3 packets according to 1500 MTU of
the bridge interface.

When forwarding from B to some host C connected to the other
interface with standard 1500 mtu, host A correctly sends
"fragmentation required" ICMP back, so that part works.
Also, host A obviously is able to receive larger frames.
But it can't SEND larger frames, even if the underlying
interface has proper MTU settings?

Is there a way to achieve this?

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux bridge and MTU
  2008-10-29 13:24 linux bridge and MTU Michael Tokarev
@ 2008-10-29 15:26 ` Stephen Hemminger
  2008-10-29 20:31   ` Michael Tokarev
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2008-10-29 15:26 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

On Wed, 29 Oct 2008 16:24:51 +0300
Michael Tokarev <mjt@tls.msk.ru> wrote:

> There's an interesting interaction between different
> MTU (max transmission unit) values on interfaces
> which are bridged together.  I'm trying to understand
> how it works.
> 
> Suppose there are 2 interfaces in the bridge, one is
> with standard 1500 mtu and another is, say 3500.
> 
> As far as I can see, bridge interface sets its mtu to
> be the smallest of all the components.  Which seems
> to be the right ting to do.
> 
> But now the question is - is it possible to communicate
> over the interface with larger MTU using full frames?
> 
> For example, here are a tcpdump from a single ping-pong
> "pair" between host "B" which is connected to a larger-MTU
> interface, and host "A" which is with the bridge described
> above, using 3000-byte packets:
> 
> IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 3028) B > A ICMP echo request, id 35331, seq 2, length 3008
> 
> IP (tos 0x0, ttl 64, id 39747, offset 0, flags [+], proto ICMP (1), length 1500) B > A: ICMP echo reply, id 35331, seq 2, length 1480
> IP (tos 0x0, ttl 64, id 39747, offset 1480, flags [+], proto ICMP (1), length 1500) B > A: icmp
> IP (tos 0x0, ttl 64, id 39747, offset 2960, flags [none], proto ICMP (1), length 68) B > A: icmp
> 
> So, the reply comes in 3 packets according to 1500 MTU of
> the bridge interface.
> 
> When forwarding from B to some host C connected to the other
> interface with standard 1500 mtu, host A correctly sends
> "fragmentation required" ICMP back, so that part works.
> Also, host A obviously is able to receive larger frames.
> But it can't SEND larger frames, even if the underlying
> interface has proper MTU settings?
> 
> Is there a way to achieve this?
> 

The bridge is a pure level 2 switch. It tries to conform to the 802.1d standard
and therefore is agnostic of higher level protocols. To quote spec

---------------------

6.3.8 Maximum Service Data Unit Size
The Maximum Service Data Unit Size that can be supported by an IEEE 802 LAN varies with the MAC
method and its associated parameters (speed, electrical characteristics, etc.). It may be constrained by the
owner of the LAN. The Maximum Service Data Unit Size supported by a Bridge between two LANs is the
smaller of that supported by the LANs. No attempt is made by a Bridge to relay a frame to a LAN that does
not support the size of Service Data Unit conveyed by that frame.

---------------------
You might be able to do something with netfilter.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux bridge and MTU
  2008-10-29 15:26 ` Stephen Hemminger
@ 2008-10-29 20:31   ` Michael Tokarev
  2008-10-29 20:44     ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Tokarev @ 2008-10-29 20:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen Hemminger wrote:
> On Wed, 29 Oct 2008 16:24:51 +0300
> Michael Tokarev <mjt@tls.msk.ru> wrote:
> 
>> There's an interesting interaction between different
>> MTU (max transmission unit) values on interfaces
>> which are bridged together.  I'm trying to understand
>> how it works.
[exchanging larger packets between different interfaces
  on the same bridge]

> The bridge is a pure level 2 switch. It tries to conform to the 802.1d standard
> and therefore is agnostic of higher level protocols. To quote spec

Yes it is.  But in linux, bridge is not just that, it's ALSO
a (virtual) network interface, with its own IP address(es),
netmask(s) and so on.  *And* with the MTU value.
> 
> ---------------------
> 
> 6.3.8 Maximum Service Data Unit Size
> The Maximum Service Data Unit Size that can be supported by an IEEE 802 LAN varies with the MAC
> method and its associated parameters (speed, electrical characteristics, etc.). It may be constrained by the
> owner of the LAN. The Maximum Service Data Unit Size supported by a Bridge between two LANs is the
> smaller of that supported by the LANs. No attempt is made by a Bridge to relay a frame to a LAN that does
> not support the size of Service Data Unit conveyed by that frame.

Yes that's what I observed, -- the MTU of the bridge *interface*
is set to the minimum MTU of all interfaces "connected to" this
bridge.  That part works as expected.

However, my question was somewhat different.  The host "external"
to a bridge is able to send larger packets (provided it's individual
interface has sufficient MTU). But the host that provides home for
that bridge can not, and can't even reply to larger packets.  Or,
rather, it does not TRYING to do so, so to say, knowing in advance
that the MTU is smaller than that.

What I'd expect from the bridge code is something like: to set
MTU of the bridge device to the LARGEST mtu of all the interfaces,
but tell the networking stack to fragment packet ONLY when such
packet will go to the smaller-MTU interface.  Since bridge in
linux is NOT a pure level2 thing, it is much more smarter than
that, and at least knows about MTU and routing.

Ok, let's see how it works in case of one of the "external" hosts,
connected to larger-MTU interface, sends a large packet to another
host connected to the same bridge but on smaller-mtu port
(hosts B and C in the above example):

   B <=== MTU=3000 ===> A (bridge) <=== MTU=1500 ====> C

B sends a large packet to C.  According to the MTU of its
local network segment, it sends out a 3000-byte packet.
And immediately receives an ICMP from A telling "fragmentation
needed".  So it corrects the MTU and goes on with smaller packets.

When B sends out a packet destined to A, or even to another
host connected to the same bridge and also with larger MTU,
the packet goes just fine.

I.e., 2 hosts on a "larger-MTU-part" of the bridge can send
and receive larger packets.  This is true ONLY when the
sending side is NOT the host running the bridge.  When
the sending host is A, it can't send larger packets.  Which
is somewhat strange, as it knows, unlike all the others,
the whole thing, and has much more chances to "work right".

> You might be able to do something with netfilter.

The whole thing has nothing to do with netfilter.  If I didn't
misunderstand what you meant.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux bridge and MTU
  2008-10-29 20:31   ` Michael Tokarev
@ 2008-10-29 20:44     ` Stephen Hemminger
  2008-10-29 21:10       ` Michael Tokarev
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2008-10-29 20:44 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

On Wed, 29 Oct 2008 23:31:56 +0300
Michael Tokarev <mjt@tls.msk.ru> wrote:

> Stephen Hemminger wrote:
> > On Wed, 29 Oct 2008 16:24:51 +0300
> > Michael Tokarev <mjt@tls.msk.ru> wrote:
> > 
> >> There's an interesting interaction between different
> >> MTU (max transmission unit) values on interfaces
> >> which are bridged together.  I'm trying to understand
> >> how it works.
> [exchanging larger packets between different interfaces
>   on the same bridge]
> 
> > The bridge is a pure level 2 switch. It tries to conform to the 802.1d standard
> > and therefore is agnostic of higher level protocols. To quote spec
> 
> Yes it is.  But in linux, bridge is not just that, it's ALSO
> a (virtual) network interface, with its own IP address(es),
> netmask(s) and so on.  *And* with the MTU value.
> > 
> > ---------------------
> > 
> > 6.3.8 Maximum Service Data Unit Size
> > The Maximum Service Data Unit Size that can be supported by an IEEE 802 LAN varies with the MAC
> > method and its associated parameters (speed, electrical characteristics, etc.). It may be constrained by the
> > owner of the LAN. The Maximum Service Data Unit Size supported by a Bridge between two LANs is the
> > smaller of that supported by the LANs. No attempt is made by a Bridge to relay a frame to a LAN that does
> > not support the size of Service Data Unit conveyed by that frame.
> 
> Yes that's what I observed, -- the MTU of the bridge *interface*
> is set to the minimum MTU of all interfaces "connected to" this
> bridge.  That part works as expected.
> 
> However, my question was somewhat different.  The host "external"
> to a bridge is able to send larger packets (provided it's individual
> interface has sufficient MTU). But the host that provides home for
> that bridge can not, and can't even reply to larger packets.  Or,
> rather, it does not TRYING to do so, so to say, knowing in advance
> that the MTU is smaller than that.
> 
> What I'd expect from the bridge code is something like: to set
> MTU of the bridge device to the LARGEST mtu of all the interfaces,
> but tell the networking stack to fragment packet ONLY when such
> packet will go to the smaller-MTU interface.  Since bridge in
> linux is NOT a pure level2 thing, it is much more smarter than
> that, and at least knows about MTU and routing.

The bridge device has no special back channel to the networking stack.
It can only advertise one MTU for the local interface. 


> Ok, let's see how it works in case of one of the "external" hosts,
> connected to larger-MTU interface, sends a large packet to another
> host connected to the same bridge but on smaller-mtu port
> (hosts B and C in the above example):
> 
>    B <=== MTU=3000 ===> A (bridge) <=== MTU=1500 ====> C
> 
> B sends a large packet to C.  According to the MTU of its
> local network segment, it sends out a 3000-byte packet.
> And immediately receives an ICMP from A telling "fragmentation
> needed".  So it corrects the MTU and goes on with smaller packets.

A never sees IP. It just drops packet.


> When B sends out a packet destined to A, or even to another
> host connected to the same bridge and also with larger MTU,
> the packet goes just fine.
> 
> I.e., 2 hosts on a "larger-MTU-part" of the bridge can send
> and receive larger packets.  This is true ONLY when the
> sending side is NOT the host running the bridge.  When
> the sending host is A, it can't send larger packets.  Which
> is somewhat strange, as it knows, unlike all the others,
> the whole thing, and has much more chances to "work right".
> 
> > You might be able to do something with netfilter.
> 
> The whole thing has nothing to do with netfilter.  If I didn't
> misunderstand what you meant.
> 

The reason I mentioned netfilter is it that it provides a way to
load special rules on a per interface/per-direction basis to alter
behaviour. It is the tool to put non-standard behaviour in.
One could argue that firewalling is really just one case of non-standard
behaviour.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: linux bridge and MTU
  2008-10-29 20:44     ` Stephen Hemminger
@ 2008-10-29 21:10       ` Michael Tokarev
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Tokarev @ 2008-10-29 21:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen Hemminger wrote:
> On Wed, 29 Oct 2008 23:31:56 +0300
> Michael Tokarev <mjt@tls.msk.ru> wrote:
[]
>> Ok, let's see how it works in case of one of the "external" hosts,
>> connected to larger-MTU interface, sends a large packet to another
>> host connected to the same bridge but on smaller-mtu port
>> (hosts B and C in the above example):
>>
>>    B <=== MTU=3000 ===> A (bridge) <=== MTU=1500 ====> C
>>
>> B sends a large packet to C.  According to the MTU of its
>> local network segment, it sends out a 3000-byte packet.
>> And immediately receives an ICMP from A telling "fragmentation
>> needed".  So it corrects the MTU and goes on with smaller packets.
> 
> A never sees IP. It just drops packet.

No.  A sees the IP just fine.  In all cases, as far as the
receiving packet is smaller than the MTU of the *interface*
it is being received from.  Not the bridge, but the interface
"connected to" the bridge.  And here was my main point being --
why A can receive large packets just fine, but can't SEND them
back (as you figured, I understand the underlying mechanic, and
see where it all comes from, but it still looks pretty...
strange, unnatural).

>> When B sends out a packet destined to A, or even to another
>> host connected to the same bridge and also with larger MTU,
>> the packet goes just fine.

^^^^^^
here

[]
>>> You might be able to do something with netfilter.
>> The whole thing has nothing to do with netfilter.  If I didn't
>> misunderstand what you meant.
> 
> The reason I mentioned netfilter is it that it provides a way to
> load special rules on a per interface/per-direction basis to alter
> behaviour. It is the tool to put non-standard behaviour in.
> One could argue that firewalling is really just one case of non-standard
> behaviour.

That'd work if bridge *interface* allowed to set MTU
manually.  I'd set the bridge MTU to the max, and used
netfilter (or even routing rules, since each routing
entry includes mtu value) to lower the MTU where appropriate.
But bridge does not allow to alter the MTU at all.

For the record: This all started when I saw how inefficient
the network is between a virtual guest system and the host,
running kvm and hw virtualization and all the optimizations
I were able to find.  I noticed the default MTU value,
and thought that increasing it seems like a good idea,
should help increasing performance due to less context
switches and the like.  After some tweaks on kvm/virtio/net
side (there was no mtu handler for virtio_net devices),
I was able to increase the MTU value, and the thing just...
flied, with performance increasing almost linearly when
increasing the MTU.  But it does not quite "want" to
work one step further.  I know there are work-arounds
for that, by using separate networks for host<=>guest
and guest<=>the_rest_of_the_world, but that seems
over-complicated.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-10-29 21:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-29 13:24 linux bridge and MTU Michael Tokarev
2008-10-29 15:26 ` Stephen Hemminger
2008-10-29 20:31   ` Michael Tokarev
2008-10-29 20:44     ` Stephen Hemminger
2008-10-29 21:10       ` Michael Tokarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).