From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Jones Subject: Re: __pskb_pull_tail oops from 2.6.35 Date: Mon, 3 Oct 2011 12:13:46 -0400 Message-ID: <20111003161346.GA30201@redhat.com> References: <20110927200328.GA22678@redhat.com> <20110927.160804.528213323197711241.davem@davemloft.net> <20110927201500.GA27713@redhat.com> <20110927.161848.1967387021236457958.davem@davemloft.net> <20110927202405.GB27713@redhat.com> <1317155839.2472.5.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Miller , netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from mx1.redhat.com ([209.132.183.28]:22447 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756673Ab1JCQNv (ORCPT ); Mon, 3 Oct 2011 12:13:51 -0400 Content-Disposition: inline In-Reply-To: <1317155839.2472.5.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Sep 27, 2011 at 10:37:19PM +0200, Eric Dumazet wrote: > > > > It looks like it died in put_page.. > > > > > > > > <1>[ 262.574991] IP: [] put_page+0x10/0x7c > > > > > > > > which is only called in one place.. > > > > > > > > 1267 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > > > > 1268 if (skb_shinfo(skb)->frags[i].size <= eat) { > > > > 1269 put_page(skb_shinfo(skb)->frags[i].page); > > > > 1270 eat -= skb_shinfo(skb)->frags[i].size; > > > > 1271 } else { > > > > > > That's a pretty serious corruption, all frag array entries from 0 to > > > nr_frags should have valid, non-NULL page pointers. > > > > > > Maybe a LRO/GRO bug? There were a couple of those. > > > > I'll see if I can talk him into trying a self-built kernel, as we're not > > rebasing f14 at this point in its life-cycle. If it turns out to still affect > > 3.x, I'll bring it up again. > > This could be a struct skb_shared_info -> nr_frags corruption > > (Something was overflowing skb head and overflowing very beginning of > skb_shared_info in rare circumstances) > > We had such bug in the past, I cant remember details right now. Just to close this discussion, the user reported that he built a 3.1.0rc7 kernel, and couldn't reproduce this bug any more, so it was something that got fixed that didn't make it to the longterm stable releases. Dave