From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: adilger.kernel@dilger.ca, song@kernel.org,
linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-ext4@vger.kernel.org
Subject: Re: 6.5.0rc5 fs hang - ext4? raid?
Date: Tue, 15 Aug 2023 17:50:35 +0000 [thread overview]
Message-ID: <ZNu668KGiNcwCSVe@gallifrey> (raw)
In-Reply-To: <ZNt11WbPn7LCXPvB@gallifrey>
* Dr. David Alan Gilbert (dave@treblig.org) wrote:
> * Theodore Ts'o (tytso@mit.edu) wrote:
> > On Mon, Aug 14, 2023 at 09:02:53PM +0000, Dr. David Alan Gilbert wrote:
> > > dg 29594 29592 0 18:40 pts/0 00:00:00 /usr/bin/ar --plugin /usr/libexec/gcc/x86_64-redhat-linux/13/liblto_plugin.so -csrDT src/intel/perf/libintel_perf.a src/intel/perf/libintel_perf.a.p/meson-generated_.._intel_perf_metrics.c.o src/intel/perf/libintel_perf.a.p/intel_perf.c.o src/intel/perf/libintel_perf.a.p/intel_perf_query.c.o src/intel/perf/libintel_perf.a.p/intel_perf_mdapi.c.o
> > >
> > > [root@dalek dg]# cat /proc/29594/stack
> > > [<0>] md_super_wait+0xa2/0xe0
> > > [<0>] md_bitmap_unplug+0xd2/0x120
> > > [<0>] flush_bio_list+0xf3/0x100 [raid1]
> > > [<0>] raid1_unplug+0x3b/0xb0 [raid1]
> > > [<0>] __blk_flush_plug+0xd7/0x150
> > > [<0>] blk_finish_plug+0x29/0x40
> > > [<0>] ext4_do_writepages+0x401/0xc90
> > > [<0>] ext4_writepages+0xad/0x180
> >
> > If you want a few seconds and try grabbing cat /proc/29594/stack
> > again, what does the stack trace stay consistent as above?
>
> I'll get back to that and retry it.
Yeh, the stack is consistent; this time around it's an 'ar' in a kernel
build:
[root@dalek dg]# cat /proc/17970/stack
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xad/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd7/0x150
[<0>] blk_finish_plug+0x29/0x40
[<0>] ext4_do_writepages+0x401/0xc90
[<0>] ext4_writepages+0xad/0x180
[<0>] do_writepages+0xd2/0x1e0
[<0>] filemap_fdatawrite_wbc+0x63/0x90
[<0>] __filemap_fdatawrite_range+0x5c/0x80
[<0>] ext4_release_file+0x74/0xb0
[<0>] __fput+0xf5/0x2a0
[<0>] task_work_run+0x5d/0x90
[<0>] exit_to_user_mode_prepare+0x1e6/0x1f0
[<0>] syscall_exit_to_user_mode+0x1b/0x40
[<0>] do_syscall_64+0x6c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[root@dalek dg]# cat /proc/17970/stack
[<0>] md_super_wait+0xa2/0xe0
[<0>] md_bitmap_unplug+0xad/0x120
[<0>] flush_bio_list+0xf3/0x100 [raid1]
[<0>] raid1_unplug+0x3b/0xb0 [raid1]
[<0>] __blk_flush_plug+0xd7/0x150
[<0>] blk_finish_plug+0x29/0x40
[<0>] ext4_do_writepages+0x401/0xc90
[<0>] ext4_writepages+0xad/0x180
[<0>] do_writepages+0xd2/0x1e0
[<0>] filemap_fdatawrite_wbc+0x63/0x90
[<0>] __filemap_fdatawrite_range+0x5c/0x80
[<0>] ext4_release_file+0x74/0xb0
[<0>] __fput+0xf5/0x2a0
[<0>] task_work_run+0x5d/0x90
[<0>] exit_to_user_mode_prepare+0x1e6/0x1f0
[<0>] syscall_exit_to_user_mode+0x1b/0x40
[<0>] do_syscall_64+0x6c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > Also, if you have iostat installed (usually part of the sysstat
> > package), does "iostat 1" show any I/O activity on the md device?
iostat is showing something odd, most devices are at 0,
except for 3 of the dm's that are stuck at 100% utilisation with
apparently nothing going on:
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.03 53.06 0.00 46.84
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
...
dm-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
dm-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
dm-18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
....
dm-20 is the /dev/mapper/main-more which is the RAID on which the
fs runs, 16 and 17 are main-more_rmeta_0 and main-more_rimage_0
so something screwy is going on there.
Dave
> > What about the underying block dvices used by the md device? If the
> > md device is attached to HDD's where you can see the activity light,
> > can you see (or hear) any disk activity?
>
> It's spinning rust, and I hear them go quiet when the hang happens.
>
> Dave
>
> > This sure seems like either the I/O driver isn't processing requests,
> > or some kind of hang in the md layer....
> >
> > - Ted
> --
> -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert | Running GNU/Linux | Happy \
> \ dave @ treblig.org | | In Hex /
> \ _________________________|_____ http://www.treblig.org |_______/
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
next prev parent reply other threads:[~2023-08-15 17:51 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-14 21:02 6.5.0rc5 fs hang - ext4? raid? Dr. David Alan Gilbert
2023-08-15 0:16 ` Bagas Sanjaya
2023-08-15 0:23 ` Dr. David Alan Gilbert
2023-08-15 1:11 ` Bagas Sanjaya
2023-08-15 11:47 ` Yu Kuai
2023-08-15 12:23 ` Dr. David Alan Gilbert
2023-08-15 12:47 ` Yu Kuai
2023-08-15 12:51 ` Theodore Ts'o
2023-08-15 12:55 ` Dr. David Alan Gilbert
2023-08-15 17:50 ` Dr. David Alan Gilbert [this message]
2023-08-16 1:31 ` Dr. David Alan Gilbert
2023-08-16 14:34 ` Jens Axboe
2023-08-16 14:44 ` Dr. David Alan Gilbert
2023-08-16 15:06 ` Dr. David Alan Gilbert
2023-08-16 15:30 ` Jens Axboe
2023-08-28 13:52 ` Bagas Sanjaya
2023-08-28 13:55 ` Jens Axboe
2023-08-28 14:00 ` Dr. David Alan Gilbert
2023-08-28 14:02 ` Bagas Sanjaya
2023-08-28 14:08 ` Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNu668KGiNcwCSVe@gallifrey \
--to=dave@treblig.org \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox