From: Xiao Ni <xni@redhat.com>
To: NeilBrown <neilb@suse.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
Date: Thu, 14 Sep 2017 00:55:15 -0400 (EDT) [thread overview]
Message-ID: <446747392.10694917.1505364915884.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <87o9qe9p3j.fsf@notabene.neil.brown.name>
----- Original Message -----
> From: "NeilBrown" <neilb@suse.com>
> To: "Xiao Ni" <xni@redhat.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Thursday, September 14, 2017 7:05:20 AM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
>
> On Wed, Sep 13 2017, Xiao Ni wrote:
> >
> > Hi Neil
> >
> > Sorry for the bad news. The test is still running and it's stuck again.
>
> Any details? Anything at all? Just a little hint maybe?
>
> Just saying "it's stuck again" is very nearly useless.
>
Hi Neil
It doesn't show any useful information in /var/log/messages
echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control
There aren't any messages too.
It looks like another problem.
[root@dell-pr1700-02 ~]# ps auxf | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 8381 0.0 0.0 0 0 ? D Sep13 0:00 \_ [kworker/u8:1]
root 8966 0.0 0.0 0 0 ? D Sep13 0:00 \_ [jbd2/md0-8]
root 824 0.0 0.1 216856 8492 ? Ss Sep03 0:06 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
root 836 0.0 0.0 195052 3200 ? Ssl Sep03 0:00 /usr/sbin/gssproxy -D
root 1225 0.0 0.0 106008 7436 ? Ss Sep03 0:00 /usr/sbin/sshd -D
root 12411 0.0 0.0 112672 2264 pts/0 S+ 00:50 0:00 \_ grep --color=auto D
root 8987 0.0 0.0 109000 2728 pts/2 D+ Sep13 0:04 \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
root 8983 0.0 0.0 7116 2080 ? Ds Sep13 0:00 /usr/sbin/mdadm --grow --continue /dev/md0
[root@dell-pr1700-02 ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1] loop0[0]
2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
[>....................] reshape = 0.0% (1/509952) finish=1059.5min speed=7K/sec
unused devices: <none>
It looks like the reshape doesn't start. This time I didn't add the codes to check
the information of mddev->suspended and active_stripes. I just added the patches
to source codes. Do you have other suggestions to check more things?
Best Regards
Xiao
next prev parent reply other threads:[~2017-09-14 4:55 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-12 1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12 1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
2017-09-12 1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
2017-09-12 1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
2017-09-12 1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
2017-09-12 2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
2017-09-13 2:11 ` Xiao Ni
2017-09-13 15:09 ` Xiao Ni
2017-09-13 23:05 ` NeilBrown
2017-09-14 4:55 ` Xiao Ni [this message]
2017-09-14 5:32 ` NeilBrown
2017-09-14 7:57 ` Xiao Ni
2017-09-16 13:15 ` Xiao Ni
2017-10-05 5:17 ` NeilBrown
2017-10-06 3:53 ` Xiao Ni
2017-10-06 4:32 ` NeilBrown
2017-10-09 1:21 ` Xiao Ni
2017-10-09 4:57 ` NeilBrown
2017-10-09 5:32 ` Xiao Ni
2017-10-09 5:52 ` NeilBrown
2017-10-10 6:05 ` Xiao Ni
2017-10-10 21:20 ` NeilBrown
[not found] ` <960568852.19225619.1507689864371.JavaMail.zimbra@redhat.com>
2017-10-13 3:48 ` NeilBrown
2017-10-16 4:43 ` Xiao Ni
2017-09-30 9:46 ` Xiao Ni
2017-10-05 5:03 ` NeilBrown
2017-10-06 3:40 ` Xiao Ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=446747392.10694917.1505364915884.JavaMail.zimbra@redhat.com \
--to=xni@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).