From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiaowei.hu Date: Wed, 22 Feb 2012 08:58:26 +0800 Subject: [Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock. In-Reply-To: <4F443AC6.1070909@oracle.com> References: <1329804728-6146-1-git-send-email-xiaowei.hu@oracle.com> <4F43D90B.8040802@oracle.com> <4F44387E.7040502@oracle.com> <4F443AC6.1070909@oracle.com> Message-ID: <4F443DB2.5060207@oracle.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com here is the whole lockres and backtrace: KERNEL: vmlinux DUMPFILE: vmcore1 CPUS: 16 DATE: Tue Jan 17 20:48:22 2012 UPTIME: 100 days, 21:11:24 LOAD AVERAGE: 7.11, 7.46, 7.85 TASKS: 1210 NODENAME: sgi-not-efped05 RELEASE: 2.6.18-92.el5 VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008 MACHINE: x86_64 (2400 Mhz) MEMORY: 63.1 GB PANIC: "" PID: 6252 COMMAND: "ocfs2dc" TASK: ffff810c7de5b860 [THREAD_INFO: ffff810c53928000] CPU: 6 STATE: TASK_RUNNING (PANIC) crash64> bt PID: 6252 TASK: ffff810c7de5b860 CPU: 6 COMMAND: "ocfs2dc" #0 [ffff810c53929bb0] crash_kexec at ffffffff800aa977 #1 [ffff810c53929c70] __die at ffffffff800650af #2 [ffff810c53929cb0] die at ffffffff8006b6ae #3 [ffff810c53929ce0] do_invalid_op at ffffffff8006bc6e #4 [ffff810c53929da0] error_exit at ffffffff8005dde9 [exception RIP: ocfs2_prepare_cancel_convert+263] RIP: ffffffff8840649f RSP: ffff810c53929e50 RFLAGS: 00010082 RAX: 00000000ffffffff RBX: ffff8105d2a28280 RCX: ffff8105d2a28298 RDX: ffff810c3f1526a8 RSI: ffff8105d2a28280 RDI: ffff810c3f152000 RBP: ffff8105d2a28290 R8: ffff810c53928000 R9: 000000000000003a R10: ffff81102fd84038 R11: ffff81102fb0c860 R12: 0000000000000246 R13: ffff810c3f152000 R14: 0000000000000000 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffff810c53929ee8] kthread at ffffffff8003253d #6 [ffff810c53929f48] kernel_thread at ffffffff8005dfb1 crash64> mod -s ocfs2 /home/xiaowhu/usr/lib/debug/lib/modules/2.6.18-92.el5/kernel/fs/ocfs2/ocfs2.ko.debug MODULE NAME SIZE OBJECT FILE ffffffff8845bb80 ocfs2 492072 /home/xiaowhu/usr/lib/debug/lib/modules/2.6.18-92.el5/kernel/fs/ocfs2/ocfs2.ko.debug crash64> struct ocfs2_lock_res ffff8105d2a28280 struct ocfs2_lock_res { l_priv = 0xffff8105d2a285a0, l_ops = 0xffffffff8845b360, l_lock = { raw_lock = { slock = 4294967295 } }, l_blocked_list = { next = 0xffff8105d2a28298, prev = 0xffff8105d2a28298 }, l_mask_waiters = { next = 0xffff8104b2979a98, prev = 0xffff8104b2979a98 }, l_type = OCFS2_LOCK_TYPE_META, l_flags = 326, l_name = "M0000000000000000a1faac00000000", l_level = -1, l_ro_holders = 0, l_ex_holders = 0, l_lksb = { status = DLM_NORMAL, flags = 0, lockid = 0xffff810a606c0740, lvb = "\005\000\000\001\000\000\000\000\000\000)\v\000\000\000e\023?\253\224\207i\023\022\333i\204\312\001@\277\021f\373\316\000\000\000\000\000\000\000\000\000\000\017\070A\355\000\002\000\000\000\000\257L\302\331\000\000\000" }, l_action = OCFS2_AST_ATTACH, l_unlock_action = OCFS2_UNLOCK_INVALID, l_requested = 3, l_blocking = 5, l_event = { lock = { raw_lock = { slock = 1 } }, task_list = { next = 0xffff8105d2a28360, prev = 0xffff8105d2a28360 } }, l_debug_list = { next = 0xffff81033cabad08, prev = 0xffff8105d2a284a8 }, l_lock_num_prmode = 0, l_lock_num_exmode = 0, l_lock_num_prmode_failed = 0, l_lock_num_exmode_failed = 0, l_lock_total_prmode = 0, l_lock_total_exmode = 0, l_lock_max_prmode = 0, l_lock_max_exmode = 0, l_lock_refresh = 0 } On 02/22/2012 08:45 AM, Sunil Mushran wrote: > Both AST and BAST can only be sent by the master. And we ensure the > master sends the ASTs before BAST. > > Do you have the full lockres dump? > > On 02/21/2012 04:36 PM, Xiaowei.hu wrote: >> Hi Sunil, >> >> I mean it execute in this way: >> >> nodeA ocfs2_dlm_lock() and released the res spin lock,here A doesn't >> hold spin locks, >> then it start to execute the proxy ast handler , process bast request >> from nodeB, >> then dlmthread flushed the bast, after this node A start to queue its >> ast in ocfs2_dlm_lock() function. >> >> Thanks, >> Xiaowei >> On 02/22/2012 01:48 AM, Sunil Mushran wrote: >>> > bast queued and flushed,before the ast was queued >>> >>> Unlikely with o2dlm. dlmthread always sends ASTs before BASTs. >>> >>> Can you recreate the entire lockres? A full dump may yield more >>> information. >>> >>> Sunil >>> >>> On 02/20/2012 10:12 PM, xiaowei.hu at oracle.com wrote: >>>> I am trying to fix bug13611997,CT's machine run into BUG in ocfs2dc >>>> thread, BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&& >>>> lockres->l_action != OCFS2_AST_DOWNCONVERT); I analysized the vmcore >>>> , the lockres->l_action = OCFS2_AST_ATTACH and l_flags=326(which >>>> means >>>> OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED|OCFS2_LOCK_INITIALIZED|OCFS2_LOCK_QUEUED), >>>> >>>> after compared with the code , this status could be only possible >>>> during ocfs2_cluster_lock,here is the race situation: >>>> >>>> NodeA NodeB >>>> ocfs2_cluster_lock on a new lockres M >>>> spin_lock_irqsave(&lockres->l_lock, flags); >>>> gen = lockres_set_pending(lockres); >>>> lockres->l_action = OCFS2_AST_ATTACH; >>>> lockres_or_flags(lockres, OCFS2_LOCK_BUSY); >>>> spin_unlock_irqrestore(&lockres->l_lock, flags); >>>> >>>> ocfs2_dlm_lock() finished and returned. >>>> **and lockres_clear_pending(lockres, gen, osb); >>>> request a lock on the same lockres M >>>> It's blocked by nodeA, and a ast proxy was send to A >>>> >>>> bast queued and flushed,before the ast was queued >>>> then the ocfs2dc was scheduled >>>> there is a chance to execute this code path: >>>> ocfs2_downconvert_thread() >>>> ocfs2_downconvert_thread_do_work() >>>> ocfs2_blocking_ast() >>>> ocfs2_process_blocked_lock() >>>> ocfs2_unblock_lock() >>>> spin_lock_irqsave(&lockres->l_lock, flags); >>>> if (lockres->l_flags& OCFS2_LOCK_BUSY) >>>> ret = ocfs2_prepare_cancel_convert(osb, lockres); >>>> BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&& >>>> lockres->l_action != OCFS2_AST_DOWNCONVERT); >>>> here trigger the BUG() >>>> >>>> Solution: >>>> One possible solution for this is to remove the lockres_clear_pending >>>> marked by 2 stars, and left this clear work to the ast function.In >>>> this way could make sure the bast function wait for ast , let it >>>> clear OCFS2_LOCK_BUSY and set OCFS2_LOCK_ATTACHED first, before enter >>>> downconvert process. >>>> >>>> >>> >> >