From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guoqing Jiang Subject: Re: md-cluster Oops 4.9.13 Date: Wed, 12 Apr 2017 09:32:32 +0800 Message-ID: <58ED83B0.6080209@suse.com> References: <58E45E0C.6030705@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Marc Smith Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On 04/10/2017 09:25 PM, Marc Smith wrote: > Hi, > > Sorry for the delay... I was hoping to cherry-pick this and test > against 4.9.x, but it didn't apply cleanly, although it looks trivial > to do it by hand. Is it recommended/okay to test this patch against > 4.9.x? Will the fix eventually be merged into 4.9.x? I think you can have a try with the patch then see what will happen, the better way is try with the latest code though people don't like always update kernel, but it is not a material for stable 4.9.x from my understanding. Thanks, Guoqing > > > --Marc > > On Tue, Apr 4, 2017 at 11:01 PM, Guoqing Jiang wrote: >> >> On 04/04/2017 10:06 PM, Marc Smith wrote: >>> Hi, >>> >>> I encountered an oops this morning when stopping a MD array >>> (md-cluster)... there were 4 md-cluster array started, and they were >>> in the middle of a rebuild. I stopped the first one and then stopped >>> the second one immediately after and got the oops, here is a >>> transcript of what was on my terminal session: >>> >>> [root@brimstone-1b ~]# mdadm --stop /dev/md/array1 >>> mdadm: stopped /dev/md/array1 >>> [root@brimstone-1b ~]# mdadm --stop /dev/md/array2 >>> >>> Message from syslogd@brimstone-1b at Tue Apr 4 09:54:40 2017 ... >>> brimstone-1b kernel: [649162.174685] BUG: unable to handle kernel NULL >>> pointer dereference at 0000000000000098 >>> >>> Using Linux 4.9.13 and here is the output from the kernel messages: >>> >>> --snip-- >>> [649158.014731] dlm: 5b3b8f94-7875-b323-5bb8-29fa6866f4a8: leaving the >>> lockspace group... >>> [649158.015233] dlm: 5b3b8f94-7875-b323-5bb8-29fa6866f4a8: group event >>> done 0 0 >>> [649158.015303] dlm: 5b3b8f94-7875-b323-5bb8-29fa6866f4a8: >>> release_lockspace final free >>> [649158.015331] md: unbind >>> [649158.042540] md: export_rdev(nvme0n1p1) >>> [649158.042546] md: unbind >>> [649158.048501] md: export_rdev(nvme1n1p1) >>> [649161.759022] md127: detected capacity change from 1000068874240 to 0 >>> [649161.759025] md: md127 stopped. >>> [649162.174685] BUG: unable to handle kernel NULL pointer dereference >>> at 0000000000000098 >>> [649162.174727] IP: [] recv_daemon+0x1e9/0x373 >> >> Looks like the recv_daemon is still running after stop array, commit >> 48df498 "md: move bitmap_destroy to the beginning of __md_stop" >> ensure it won't happen. >> >> >> [snip] >> >>> Perhaps this is already fixed in later versions? Let me know if you >>> need any additional information. >> >> Could you pls try with the latest version? Please let me know if you >> still see it, thanks. >> >> Regards, >> Guoqing >>