From mboxrd@z Thu Jan 1 00:00:00 1970 From: Girish Moodalbail Subject: Re: KASAN: use-after-free Read in rds_tcp_dev_event Date: Tue, 14 Nov 2017 10:02:59 -0800 Message-ID: References: <001a1148d244ade0aa055d6a69b9@google.com> <9e71dff9-7ba8-a3c2-6862-fb8557546a54@oracle.com> <20171114132221.GB1980@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: syzbot , davem@davemloft.net, netdev@vger.kernel.org, rds-devel@oss.oracle.com, santosh.shilimkar@oracle.com, syzkaller-bugs@googlegroups.com To: Sowmini Varadhan Return-path: Received: from userp1040.oracle.com ([156.151.31.81]:37834 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755934AbdKNSB2 (ORCPT ); Tue, 14 Nov 2017 13:01:28 -0500 In-Reply-To: <20171114132221.GB1980@oracle.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 11/14/17 5:22 AM, Sowmini Varadhan wrote: > > > A few questions. > > - First off, why am I not seeing the original mail in this thread > even when I search the mail archives, e.g., > https://lkml.org/lkml/2017/11/13/954 > > - Girish Moodalbail writes: > >> The issue here is that we are trying to reference a network namespace >> (struct net *) that is long gone (i.e., L532 below -- c_net is the culprit). > > The netns is not "long gone", we are still processing > the NETDEV_UNREGISTER_FINAL for loopback. Obviously, I was not talking about the current namespace. Say there are two namespaces - ns1 and ns2 and that both have RDS connections. Deletion of ns1 will be fine. However when ns2 is being deleted, in the rds_tcp_dev_event() callback we walk through the global list and some nodes in that list will be referring to ns1 (that is "long gone"). If you read my earlier email, I was talking about ns1 which is already gone, and we are trying to access it from ns2. ~Girish > As I said in my > earlier mail, the idea is to extract the list of unique conns > that belong to the netns and then destroy both the conn, and > all associated paths. Thus there can only be a single thread > going through rds_tcp_kill_sock at any time (since we should > only get the unregister_final/loopback one time for the netns). > (See alos comment block in rds_tcp_dev_event about network activity > quiescing). Thus there should be no concurrency issue. > > However when I just ehecked this, there may be some rds connection > refcounting bug. When I quickly tested this, I'm not seeing the > expected calls to conn_path_destroy. I'll need some time to take > a look, this has been known to work, so something got broken along > the way > >> I think we should move away from global list to a per-namespace list. The >> global list are used only in two places (both of which are per-namespace >> operations): > > let's first understand the real root-cause before we start > redesigning data-structures. > > --Sowmini > > >