From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [net 1/1] tipc: eliminate message disordering during binding table update Date: Mon, 22 Oct 2018 19:29:42 -0700 (PDT) Message-ID: <20181022.192942.1130906428214974308.davem@davemloft.net> References: <1539971740-23060-1-git-send-email-jon.maloy@ericsson.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, gordan.mihaljevic@dektech.com.au, tung.q.nguyen@dektech.com.au, hoang.h.le@dektech.com.au, canh.d.luu@dektech.com.au, ying.xue@windriver.com, tipc-discussion@lists.sourceforge.net To: jon.maloy@ericsson.com Return-path: Received: from shards.monkeyblade.net ([23.128.96.9]:39822 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725768AbeJWKu4 (ORCPT ); Tue, 23 Oct 2018 06:50:56 -0400 In-Reply-To: <1539971740-23060-1-git-send-email-jon.maloy@ericsson.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Jon Maloy Date: Fri, 19 Oct 2018 19:55:40 +0200 > We have seen the following race scenario: > 1) named_distribute() builds a "bulk" message, containing a PUBLISH > item for a certain publication. This is based on the contents of > the binding tables's 'cluster_scope' list. > 2) tipc_named_withdraw() removes the same publication from the list, > bulds a WITHDRAW message and distributes it to all cluster nodes. > 3) tipc_named_node_up(), which was calling named_distribute(), sends > out the bulk message built under 1) > 4) The WITHDRAW message arrives at the just detected node, finds > no corresponding publication, and is dropped. > 5) The PUBLISH item arrives at the same node, is added to its binding > table, and remains there forever. > > This arrival disordering was earlier taken care of by the backlog queue, > originally added for a different purpose, which was removed in the > commit referred to below, but we now need a different solution. > In this commit, we replace the rcu lock protecting the 'cluster_scope' > list with a regular RW lock which comprises even the sending of the > bulk message. This both guarantees both the list integrity and the > message sending order. We will later add a commit which cleans up > this code further. > > Note that this commit needs recently added commit d3092b2efca1 ("tipc: > fix unsafe rcu locking when accessing publication list") to apply > cleanly. > > Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table") > Reported-by: Tuong Lien Tong > Acked-by: Ying Xue > Signed-off-by: Jon Maloy Applied.