From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Chen Subject: Re: [Fwd: Re: [PATCH] Multicast packet reassembly can fail] Date: Wed, 28 Oct 2009 13:25:39 -0500 Message-ID: <1256754339.3153.481.camel@linux-1lbu> References: <1256740748.3153.418.camel@linux-1lbu> <4AE86420.3040607@gmail.com> <1256749559.3153.447.camel@linux-1lbu> <4AE87ECD.7080408@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: mhuth@mvista.com, netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from hu47.mvista.com ([206.112.117.47]:56257 "HELO gateway-1237.mvista.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with SMTP id S1753221AbZJ1SR4 convert rfc822-to-8bit (ORCPT ); Wed, 28 Oct 2009 14:17:56 -0400 In-Reply-To: <4AE87ECD.7080408@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 2009-10-28 at 18:26 +0100, Eric Dumazet wrote: > Steve Chen a =C3=A9crit : > > On Wed, 2009-10-28 at 16:32 +0100, Eric Dumazet wrote: > >> If each fragment is received twice on host, once by eth0, once by = eth1, > >> should we deliver datagram once or twice ? > >=20 > > The application received it once. IIRC the duplicate packet is dro= p in > > the routing code. > >=20 > >> Once should be enough, even if in the non fragmented case, it will > >> be delivered twice (kernel cannot detect duplicates, user app migh= t do itself) > >=20 > > Routing code drops the duplicate packet for none-fragmented case as > > well. >=20 > Really ? How so ? Receiving two copies of the same packet is legal. I will have to double check exactly where the packet drop happens. I thought it was somewhere in routing, but it could be in netfilter. >=20 > >=20 > >> > >>> For this specific case, src/dst address, protocol, IP ID and frag= ment > >>> offset are all identical. The only difference is the ingress int= erface. > >>> A good follow up question would be why would anyone in their righ= t mind > >>> multicast to the same destination? well, I don't know. I can no= t get > >>> the people who reported the problem to tell me either. Since so= meone > >>> found the need to do this, perhaps others may find it useful too= =2E > >>> > >> Then, if a 2000 bytes message is fragmented in two packets, one co= ming > >> from eth0, one coming from eth1, I suspect your patch drops the me= ssage, > >> unless eth0/eth1 are part of a bonding device... > >=20 > > Actually, the patch tries to prevent packet drop for this exact > > scenario. Please consider the following scenarios > > 1. Packet comes in the fragment reassemble code in the following o= rder > > (eth0 frag1), (eth0 frag2), (eth1 frag1), (eth1 frag2) > > Packet from both interfaces get reassembled and gets further proces= sed. >=20 > Yes your patch does this, so each multicast application receives two = copies of the > same datagram. >=20 > >=20 > > 2. Packet can some times arrive in (perhaps other orders as well) > > (eth0 frag1), (eth1 frag1), (eth0 frag2), (eth1 frag2) > > Without this patch, eth0 frag 1/2 are overwritten by eth1 frag1/2, = and > > packet from eth1 is dropped in the routing code. >=20 > Really ? how so ? I dont see how it can happen, unless you use RPF ? >=20 > current situation should be : >=20 > (eth0 frag1) : We create a context, store frag1 in it > (eth1 frag1) : We find this context, and drop frag1 since we already = have the data > (maybe the bug is here, if we cannot cope with a du= plicate ?) > (eth0 frag2) : We find this context, store frag2 -> complete datagram= and deliver it > (eth1 frag2) : We find context, drop frag2 since datagram was complet= ed. Yes, this is exactly what is happening in the current code. >=20 > (or maybe we create a new context that will timeout la= ter, maybe this is your problem ?) >=20 > Net effect : We deliver the datagram correctly. >=20 >=20 > >=20 > >> That would break common routing setups, using two links to aggrega= te bandwidth ? > >=20 > > I don't believe it would. The aggregate bandwidth will work the sa= me as > > before. The attributes (src/dst addr, protocol, interface, etc.) s= hould > > generate a unique hash key. If hash collision should happen with t= he > > addition of iif << 5, the code still compare the original src addr = along > > with interface number, so there should be no issues. >=20 > What about the obvious : >=20 > (eth0 frag1), (eth1 frag2) >=20 > Your patch creates two contexts since hashes are different, > that will timeout and no packet delivered at all >=20 I see the point you are making. I assumed, probably incorrectly, that since eth0 and eth1 have different IP address. I would get a complete series of fragments for each interface. Perhaps, I should really be looking up the stack to see why packets were dropped. Please correct m= e if I'm mistaken. The normal behavior is that application should be receiving either 2 (scenario 1) or 1 (scenario 2) packets. Regards, Steve