From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Harout Hedeshian" Subject: RE: [RFC] Handling device free after a packet is passed to the network stack Date: Mon, 30 Mar 2015 08:19:47 -0600 Message-ID: <004401d06af4$94620280$bd260780$@codeaurora.org> References: <1427510968.25985.159.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8BIT Cc: To: "'Eric Dumazet'" , Return-path: Received: from smtp.codeaurora.org ([198.145.29.96]:53738 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751923AbbC3OTu convert rfc822-to-8bit (ORCPT ); Mon, 30 Mar 2015 10:19:50 -0400 In-Reply-To: <1427510968.25985.159.camel@edumazet-glaptop2.roam.corp.google.com> Content-Language: en-us Sender: netdev-owner@vger.kernel.org List-ID: > -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > On Behalf Of Eric Dumazet > Sent: Friday, March 27, 2015 8:49 PM > To: subashab@codeaurora.org > Cc: netdev@vger.kernel.org > Subject: Re: [RFC] Handling device free after a packet is passed to the > network stack > > On Fri, 2015-03-27 at 19:57 +0000, subashab@codeaurora.org wrote: > > We have been coming across a couple of scenarios where the device is > > freed and the corresponding packets which were already queued up the > > stack encounter crashes when they find that contents of skb->dev are > > no longer valid. > > > > Specifically, we have observed an instance where a cpu hotplug occurs > > along with the network driver module unloading. When the packets are > > being queued up the stack using netif_rx_ni from dev_cpu_callback, > > get_rps_cpus crashes as it encounters invalid data at skb->dev since > > it would have been freed. > > > > We would like to know if the kernel provides some mechanisms to > > safeguard against such scenarios. > > This is supposed to be handled in flush_backlog() (net/core/dev.c) It looks like this only takes care of packets in the softnet_data queues (or am I missing something). Wouldn't we run into similar issues if the packet were stuck in, for example, TCP OFO socket queue (not sure if anything tries to access skb->dev after that)? Or perhaps some other queue which may later try to dereference skb->dev. It seems to me that the only way to be absolutely sure that there is no reference on this net_device would be to have the drivers dev_hold() before earch netif_rx/netif_rx_ni/netif_receive_skb; and then later have __kfree_skb() do a dev_put(). Of course, this approach has a number of issues: - 2 extra operations per packet - Anything holding onto an SKB can prevent a net_device from deregistering/freeing - requires skbs to have a valid dev before they can be freed (or we simply null check this field in __kfree_skb()) > Maybe you hit some race condition in on_each_cpu(). > > Really, I do not think we have tested such stress conditions ever. > > cpu hotplug was very unstable 2 years ago (last time I tried it, this > was constantly failing outside of networking land) > > I would try following patch, although I am not sure what would prevent > dev_cpu_callback() being called too late. > > diff --git a/net/core/dev.c b/net/core/dev.c index > a0408d497dae04e7caa145f05c915b058aa2d356..fa74887adfe10709bce55ccf1e34d6 > 5c6b7a8fba 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -7149,11 +7149,17 @@ static int dev_cpu_callback(struct > notifier_block *nfb, > > /* Process offline CPU's input_pkt_queue */ > while ((skb = __skb_dequeue(&oldsd->process_queue))) { > - netif_rx_ni(skb); > + if (skb->dev->reg_state != NETREG_REGISTERED) > + kfree_skb(skb); > + else > + netif_rx_ni(skb); > input_queue_head_incr(oldsd); > } > while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) { > - netif_rx_ni(skb); > + if (skb->dev->reg_state != NETREG_REGISTERED) > + kfree_skb(skb); > + else > + netif_rx_ni(skb); > input_queue_head_incr(oldsd); > } > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in the > body of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html