From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754161Ab1GLRPg (ORCPT ); Tue, 12 Jul 2011 13:15:36 -0400 Received: from mail.candelatech.com ([208.74.158.172]:60136 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751520Ab1GLRPe (ORCPT ); Tue, 12 Jul 2011 13:15:34 -0400 Message-ID: <4E1C8108.3020006@candelatech.com> Date: Tue, 12 Jul 2011 10:14:48 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-2.fc11 Thunderbird/3.0.4 MIME-Version: 1.0 To: "Myklebust, Trond" CC: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC] sunrpc: Fix race between work-queue and rpc_killall_tasks. References: <1309992581-25199-1-git-send-email-greearb@candelatech.com> <1309995932.5447.6.camel@lade.trondhjem.org> <4E173BE6.9000005@candelatech.com> <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A198C7C@SACMVEXC2-PRD.hq.netapp.com> <4E177EA8.60109@candelatech.com> <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A198E69@SACMVEXC2-PRD.hq.netapp.com> In-Reply-To: <2E1EB2CF9ED1CB4AA966F0EB76EAB4430A198E69@SACMVEXC2-PRD.hq.netapp.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/08/2011 03:14 PM, Myklebust, Trond wrote: >> [] print_trailer+0x131/0x13a >> [] object_err+0x35/0x3e >> [] verify_mem_not_deleted+0x7a/0xb7 >> [] rpcb_getport_done+0x23/0x126 [sunrpc] >> [] rpc_exit_task+0x3f/0x6d [sunrpc] >> [] __rpc_execute+0x80/0x253 [sunrpc] >> [] ? rpc_execute+0x42/0x42 [sunrpc] >> [] rpc_async_schedule+0x10/0x12 [sunrpc] >> [] process_one_work+0x230/0x41d >> [] ? process_one_work+0x17b/0x41d >> [] worker_thread+0x133/0x217 >> [] ? manage_workers+0x191/0x191 >> [] kthread+0x7d/0x85 >> [] kernel_thread_helper+0x4/0x10 >> [] ? retint_restore_args+0x13/0x13 >> [] ? __init_kthread_worker+0x56/0x56 >> [] ? gs_change+0x13/0x13 > > The calldata gets freed in the rpc_final_put_task() which shouldn't ever be run while the task is still referenced in __rpc_execute > > IOW: it should be impossible to call rpc_exit_task() after rpc_final_put_task I added lots of locking around the calldata, work-queue logic, and such, and still the problem persists w/out hitting any of the debug warnings or poisoned values I put in. It almost seems like tk_calldata is just assigned to two different tasks. While poking through the code, I noticed that 'map' is static in rpcb_getport_async. That would seem to cause problems if two threads called this method at the same time, possibly causing tk_calldata to be assigned to two different tasks??? Any idea why it is static? I'm going to start another test run with this non-static to see if that resolves things... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com