From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Borkmann Subject: Re: Kernel crash while using tc script Date: Wed, 20 May 2015 13:53:26 +0200 Message-ID: <555C75B6.7040407@iogearbox.net> References: <555BAE72.40609@iogearbox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: netdev , tgraf To: Vijay Subramanian Return-path: Received: from www62.your-server.de ([213.133.104.62]:51764 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751182AbbETLx2 (ORCPT ); Wed, 20 May 2015 07:53:28 -0400 In-Reply-To: <555BAE72.40609@iogearbox.net> Sender: netdev-owner@vger.kernel.org List-ID: On 05/19/2015 11:43 PM, Daniel Borkmann wrote: > On 05/19/2015 10:11 PM, Vijay Subramanian wrote: >> Hi, >> >> It seems latest net-next kernel crashes while unloading modules. >> Please see simple script below to reproduce the crash. >> >> =============================== >> >> #!/bin/bash >> >> while true; do >> >> # modules will be loaded automatically >> >> tc qdisc add dev eth1 root handle 1: prio >> >> tc filter add dev eth1 parent 1: u32 match u32 0 0 flowid 1 >> >> >> tc qdisc del dev eth1 root >> >> rmmod cls_u32 >> >> rmmod sch_prio >> >> >> done >> >> ========================= >> >> It seems there is some refcounting or locking issue issue. I am unable >> to easily post the dump but sometimes it points to crashes in various >> functions in prio_class_ops (sch_prio.c), such as prio_walk(), >> prio_dump_class etc. Probably, sch_prio call back functions are >> invoked when they should not be. >> >> I bisected this down to following commit: >> >> commit 78fd1d0ab072d4d9b5f0b7c14a1516665170b565 >> Author: Thomas Graf >> Date: Tue Oct 21 22:05:38 2014 +0200 >> netlink: Re-add locking to netlink_lookup() and seq walker >> >> If there are suggestions for me to try or you need more info, let me know. > > Hmm, seems rather like the synchronize_net() removal in netlink_release() > must have uncovered a bug elsewhere. What seems to be happening is a race here between RCU and classifier module refcounting, i.e. between the two commands (under the requirement at least one u32 classifier is attached to the qdisc): tc qdisc del dev foo root rmmod cls_u32 On qdisc deletion, prio_destroy() is called which invokes tcf_destroy_chain() that walks all classifiers and calls tcf_destroy() on them. You can take any of them, it's not u32 specific, for example, cls_bpf_destroy(): They register an RCU callback, e.g. call_rcu(&prog->rcu, __cls_bpf_delete_prog) for later invocation. The problem however is that this callback very likely is executed *after* the module_put() from tcf_destroy()! That means, kernel thinks that no-one is holding reference on that module anymore and thus rmmod cls_u32 can succeed. Problem is that when RCU callback is invoked by the kernel, that address of the function is not valid anymore, as the module already disappeared from under us. The synchronize_net() that was called when the netlink socket (for removing the qdisc) got destroyed, was hiding that from us. ;) Out of curiosity, I did a test by only moving __cls_bpf_delete_prog() callback into a built-in section that won't disappear on module unload and the panic is not triggered anymore. Will be cooking a patch.