From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sowmini Varadhan Subject: [PATCH net-next 0/2] rds: use RCU between work-enqueue and connection teardown Date: Thu, 4 Jan 2018 06:52:58 -0800 Message-ID: Cc: davem@davemloft.net, rds-devel@oss.oracle.com, sowmini.varadhan@oracle.com, santosh.shilimkar@oracle.com To: netdev@vger.kernel.org Return-path: Received: from aserp2120.oracle.com ([141.146.126.78]:54110 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753547AbeADPKh (ORCPT ); Thu, 4 Jan 2018 10:10:37 -0500 Sender: netdev-owner@vger.kernel.org List-ID: This patchset follows up on the root-cause mentioned in https://www.spinics.net/lists/netdev/msg472849.html Patch1 implements some code refactoring that was suggeseted as an enhancement in http://patchwork.ozlabs.org/patch/843157/ It replaces the c_destroy_in_prog bit in rds_connection with an atomically managed flag in rds_conn_path. Patch2 builds on Patch1 and uses RCU to make sure that work is only enqueued if the connection destroy is not already in progress: the test-flag-and-enqueue is done under rcu_read_lock, while destroy first sets the flag, uses synchronize_rcu to wait for existing reader threads to complete, and then starts all the work-cancellation. Since I have not been able to reproduce the original stack traces reported by syszbot, and these are fixes for a race condition that are based on code-inspection I am not marking these as reported-by at this time. Sowmini Varadhan (2): rds: Use atomic flag to track connections being destroyed rds: Ensure that send/recv/reconnect work cannot be requeued from softirq or proc context net/rds/cong.c | 10 +++++++--- net/rds/connection.c | 24 +++++++++++++++++++----- net/rds/rds.h | 4 ++-- net/rds/send.c | 37 ++++++++++++++++++++++++++++++++----- net/rds/tcp_connect.c | 2 +- net/rds/tcp_recv.c | 8 ++++++-- net/rds/tcp_send.c | 5 ++++- net/rds/threads.c | 20 +++++++++++++++----- 8 files changed, 86 insertions(+), 24 deletions(-)