From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753552Ab1HDTnk (ORCPT ); Thu, 4 Aug 2011 15:43:40 -0400 Received: from mx1.fusionio.com ([66.114.96.30]:52363 "EHLO mx1.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753288Ab1HDTnd (ORCPT ); Thu, 4 Aug 2011 15:43:33 -0400 X-ASG-Debug-ID: 1312487012-03d6a510a6251740001-xx1T2L X-Barracuda-Envelope-From: JAxboe@fusionio.com Message-ID: <4E3AF65F.3080103@fusionio.com> Date: Thu, 4 Aug 2011 21:43:27 +0200 From: Jens Axboe MIME-Version: 1.0 To: Vivek Goyal CC: "Paul E. McKenney" , Paul Bolle , linux kernel mailing list , "michal@harddata.com" Subject: Re: Mysterious CFQ crash and RCU References: <20110519222404.GG12600@redhat.com> <20110521210013.GJ2271@linux.vnet.ibm.com> <20110523152141.GB4019@redhat.com> <20110523153848.GC2310@linux.vnet.ibm.com> <1306401337.27271.3.camel@t41.thuisdomein> <20110603050724.GB2304@linux.vnet.ibm.com> <1307191830.23387.24.camel@t41.thuisdomein> <20110604160326.GA6093@linux.vnet.ibm.com> <1307227686.28359.23.camel@t41.thuisdomein> <20110604230633.GD6093@linux.vnet.ibm.com> <20110804150504.GB429@redhat.com> X-ASG-Orig-Subj: Re: Mysterious CFQ crash and RCU In-Reply-To: <20110804150504.GB429@redhat.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Barracuda-Connect: mail1.int.fusionio.com[10.101.1.21] X-Barracuda-Start-Time: 1312487012 X-Barracuda-URL: http://10.101.1.180:8000/cgi-mod/mark.cgi X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.70907 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2011-08-04 17:05, Vivek Goyal wrote: > On Sat, Jun 04, 2011 at 04:06:33PM -0700, Paul E. McKenney wrote: >> On Sun, Jun 05, 2011 at 12:48:05AM +0200, Paul Bolle wrote: >>> On Sat, 2011-06-04 at 09:03 -0700, Paul E. McKenney wrote: >>>> More like "based on these diagnostics, I see no evidence of the RCU >>>> implementation misbehaving." Which is of course different than "I can >>>> prove that the RCU implementation is not misbehaving". That said, the >>>> fact that you are running on a single CPU makes it hard for me to see >>>> any latitude for RCU-implementation misbehavior. >>>> >>>> Clearly something is wrong somewhere. >>> >>> Yes. >>> >>>> Given the fact that on a single-CPU >>>> system, synchronize_rcu() is a no-op, and given that you weren't able >>>> to reproduce with CONFIG_TREE_PREEMPT_RCU=y, my guess is that there is >>>> a synchronize_rcu() that occasionally (illegally) gets executed within >>>> an RCU read-side critical section. >>> >>> I think I finally found it! >> >> Very cool!!! >> >>> The culprit seems to be io_context.ioc_data (not the most clear of >>> names!). It seems to be a single entry "last-hit cache" of an hlist >>> called cic_list. (There are three, subtly different, cic_lists in the >>> CFQ code!) It is not entirely clear, but that last-hit cache can get out >>> of sync with the hlist it is supposed to cache. My guess it that every >>> now and then a member of the hlist gets deleted while it's still in that >>> (single entry) cache. If it then gets retrieved from that cache it >>> already points to poisoned memory. For some strange reason this only >>> results in an Oops if one or more debugging options are set (as are set >>> in the Fedora Rawhide non-stable kernels that I ran into this). I have >>> no clue whatsoever, why that is ... >> >> Oooh... If I understand you correctly, this is exactly analogous to my >> first-ever RCU bug way back in the early 90s. This was a hashed version >> of a BSD-style routing table, where each hash bucket maintained a pointer >> to the last routing entry used. When I deleted a routing entry, I forgot >> to check the last-entry pointer. >> >> The amazing thing is that the guy who found and fixed this bug generated >> a correct fix despite having absolutely no idea what RCU was or how it >> worked. All he knew was that if he waited for the sun to reach Oregon, >> his users were going to be way beyond upset. ;-) >> >>> Anyhow, after ripping out ioc_data this bug seems to have disappeared! >>> Jens, Vivek, could you please have a look at this? In the mean time I >>> hope to pinpoint this issue and draft a small patch to really solve it >>> (ie, not by simply ripping out ioc_data). >> >> Again, very cool!!! > > Hi Paul, > > Remember this CFQ issue where we suspected that we are freeing up an > object before rcu period is over. We thought we fixed it. Guess what, > it seems be to be back. > > ************************************************************************** > [21265.163144] BUG: unable to handle kernel paging request at 6b6b6b43 > [21265.164022] IP: [] cic_free_func+0x10/0x6c [cfq_iosched] > [21265.164022] *pde = 00000000 > [21265.164022] Oops: 0000 [#1] SMP > [21265.164022] Modules linked in: cfq_iosched tcp_lp fuse arc4 ppdev > snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device > microcode > snd_pcm joydev snd_timer ath5k pcspkr snd_page_alloc thinkpad_acpi > i2c_i801 > ath mac80211 snd soundcore cfg80211 rfkill parport_pc parport e1000 > iTCO_wd > t iTCO_vendor_support binfmt_misc uinput yenta_socket video radeon ttm > drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] > [21265.164022] > [21265.164022] Pid: 18514, comm: mandb Not tainted > 3.0.0-1.local2.fc17.i686 #1 > IBM / > [21265.164022] EIP: 0060:[] EFLAGS: 00010286 CPU: 0 > [21265.164022] EIP is at cic_free_func+0x10/0x6c [cfq_iosched] > [21265.164022] EAX: f242e400 EBX: 6b6b6b43 ECX: 00000246 EDX: 6b6b6b43 > [21265.164022] ESI: f242e400 EDI: f7d55fd0 EBP: df92df04 ESP: df92def0 > [21265.164022] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [21265.164022] Process mandb (pid: 18514, ti=df92c000 task=ed1554c0 > task.ti=df92c000) > [21265.164022] Stack: > [21265.164022] c0468df7 c0af0cb0 6b6b6b6b f242e400 f7d55fd0 df92df18 > f7d54510 > f242e400 > [21265.164022] ed1554c0 ed155898 df92df20 f7d54530 df92df2c c05cc8ba > f242e400 > df92df40 > [21265.164022] c05cc94a ed1554c0 00000012 df92df74 df92df8c c04423e2 > eedf2b5c > eedf2b00 > [21265.164022] Call Trace: > [21265.164022] [] ? lock_is_held+0x73/0x7b > [21265.164022] [] ? cfqq_process_refs+0x18/0x18 [cfq_iosched] > [21265.164022] [] call_for_each_cic+0x3e/0x4c [cfq_iosched] > [21265.164022] [] cfq_free_io_context+0x12/0x14 [cfq_iosched] > [21265.164022] [] put_io_context+0x34/0x5c > [21265.164022] [] exit_io_context+0x68/0x6d > [21265.164022] [] do_exit+0x62f/0x66a > [21265.164022] [] do_group_exit+0x63/0x86 > [21265.164022] [] sys_exit_group+0x18/0x18 > [21265.164022] [] sysenter_do_call+0x12/0x38 > [21265.164022] Code: 5f 5d c3 55 89 e5 3e 8d 74 26 00 8b 50 3c 03 50 40 8b > 00 > 29 d0 79 02 0f 0b 5d c3 55 89 e5 57 56 53 83 ec 08 3e 8d 74 26 00 89 d3 <8 > b> 12 89 c6 f6 c2 01 75 02 0f 0b 8d 78 08 89 f8 89 55 ec e8 c8 > [21265.164022] EIP: [] cic_free_func+0x10/0x6c [cfq_iosched] > SS:ESP > 0068:df92def0 > [21265.164022] CR2: 000000006b6b6b43 > [21265.363749] ---[ end trace a779435da6d0a152 ]--- > > ************************************************************************** > Again details have been added to old bz. > > https://bugzilla.redhat.com/show_bug.cgi?id=577968 > > Now two users have run into this issue. PaulB, again can reliably > reproduce this issue even with 3.0 kernels. He just needs to run > "mandb -c" on this machine. > > He has again run the CFQ debug V2 patch which seems to be telling that > cfq_io_context object was freed while in rcu read section. > > ************************************************************************** > [ 562.598881] debug: Will exit io context cic=0xf0469230, Next pointer is > (null) > [ 562.598881] debug: cic=0xf0469230 exit io context done. Next pointer is > (null) > [ 562.598881] debug: Will decouple-free cic cic=0xf0469230, Next pointer > is (null) > [ 562.633403] debug: cic=0xf0469230 decouple-free cic done. Next pointer > is (null) > [ 562.644949] debug: freeing up cic=f0469230 > [ 562.652971] cic=f0469230 freed while in RCU read lock > > ***************************************************************************** > > I am hoping that I have written the debug patch right. As usual i am > running out of ideas on how to debug it further. Anything debug > information you would like to have a look at? Darn! Vivek, can you re-send the debug patch for inspection? -- Jens Axboe