From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [net-next 1/1] tipc: fix race condition at topology server receive Date: Tue, 16 Jan 2018 14:42:54 -0500 (EST) Message-ID: <20180116.144254.1952039888762980221.davem@davemloft.net> References: <1516035388-18818-1-git-send-email-jon.maloy@ericsson.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, mohan.krishna.ghanta.krishnamurthy@ericsson.com, tung.q.nguyen@dektech.com.au, hoang.h.le@dektech.com.au, canh.d.luu@dektech.com.au, ying.xue@windriver.com, tipc-discussion@lists.sourceforge.net To: jon.maloy@ericsson.com Return-path: Received: from shards.monkeyblade.net ([184.105.139.130]:58760 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751842AbeAPTm4 (ORCPT ); Tue, 16 Jan 2018 14:42:56 -0500 In-Reply-To: <1516035388-18818-1-git-send-email-jon.maloy@ericsson.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Jon Maloy Date: Mon, 15 Jan 2018 17:56:28 +0100 > We have identified a race condition during reception of socket > events and messages in the topology server. > > - The function tipc_close_conn() is releasing the corresponding > struct tipc_subscriber instance without considering that there > may still be items in the receive work queue. When those are > scheduled, in the function tipc_receive_from_work(), they are > using the subscriber pointer stored in struct tipc_conn, without > first checking if this is valid or not. This will sometimes > lead to crashes, as the next call of tipc_conn_recvmsg() will > access the now deleted item. > We fix this by making the usage of this pointer conditional on > whether the connection is active or not. I.e., we check the condition > test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg(). > > - Since the two functions may be running on different cores, the > condition test described above is not enough. tipc_close_conn() > may come in between and delete the subscriber item after the condition > test is done, but before tipc_conn_recv_msg() is finished. This > happens less frequently than the problem described above, but leads > to the same symptoms. > > We fix this by using the existing sk_callback_lock for mutual > exclusion in the two functions. In addition, we have to move > a call to tipc_conn_terminate() outside the mentioned lock to > avoid deadlock. > > Acked-by: Ying Xue > Signed-off-by: Jon Maloy Applied, thanks Jon.