From mboxrd@z Thu Jan 1 00:00:00 1970 From: dormando Subject: Re: [PATCH] ipv4: fix a race in ip4_datagram_release_cb() Date: Sun, 22 Jun 2014 12:07:22 -0700 (PDT) Message-ID: References: <1402407781.3645.426.camel@edumazet-glaptop2.roam.corp.google.com> <1402448128.3645.437.camel@edumazet-glaptop2.roam.corp.google.com> <1402449173.3645.440.camel@edumazet-glaptop2.roam.corp.google.com> <1402450009.3645.444.camel@edumazet-glaptop2.roam.corp.google.com> <1402466090.3645.456.camel@edumazet-glaptop2.roam.corp.google.com> <1402490462.3645.463.camel@edumazet-glaptop2.roam.corp.google.com> <1402492373.3645.466.camel@edumazet-glaptop2.roam.corp.google.com> <1402544609.3645.473.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Alexey Preobrazhensky , Steffen Klassert , David Miller , paulmck@linux.vnet.ibm.com, netdev@vger.kernel.org, Kostya Serebryany , Dmitry Vyukov , Lars Bull , Eric Dumazet , Bruce Curtis , =?ISO-8859-2?Q?Maciej_=AFenczykowski?= , Alexei Starovoitov To: Eric Dumazet Return-path: Received: from rydia.net ([69.46.88.68]:55214 "EHLO mail.rydia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751838AbaFVTHZ (ORCPT ); Sun, 22 Jun 2014 15:07:25 -0400 In-Reply-To: <1402544609.3645.473.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 11 Jun 2014, Eric Dumazet wrote: > On Wed, 2014-06-11 at 18:55 -0700, dormando wrote: > > > I sent the udpkill utility in an off-list mail (in case that got binned by > > anyone). > > > > Just threw this patch on top of the other two, on 3.10.42. udpkill's been > > running for an hour without fault. I've just put traffic back onto the > > machine am leaving udpkill enabled for a while longer. > > > > So, this is an improvement :) > > Nice. I suspect regression came with 3.6 ip route cache removal, but I > am lazy to point the exact commit. > > Update on testing: I only have two machines that crash on their own frequently (more like one, even). Unfortunately something happened to the datacenter it's in and it was offline for a week. The machine normally crashes after 1.5-4d, averaging 2d. It's done about three days total time without a new crash. I also have the kernel running in another datacenter for ~10 days.. but it takes 30-150 days to crash in that one. So, inconclusive, but still promising. If the machine survives the week it probably means it's fixed, or at least greatly reduced. I saw that one of your patches got queued for stable, but all three were necessary to fix udpkill. What's your plan for cleanup/upstreaming? Did you folks end up running udpkill under the tester thing? thanks, -Dormando