public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [bug report] e2fsck: The process is deadlocked
@ 2022-11-09 10:40 zhanchengbin
  2022-11-09 15:43 ` Theodore Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: zhanchengbin @ 2022-11-09 10:40 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linfeilong, liuzhiqiang26

Hi Tytso,
The process is deadlocked, and an I/O error occurs when logs
are replayed. Because in the I/O error handling function, I/O
is sent again and catch the mutexlock.
stack:
(gdb) bt
#0  0x0000ffffa740bc34 in ?? () from /usr/lib64/libc.so.6
#1  0x0000ffffa7412024 in pthread_mutex_lock () from /usr/lib64/libc.so.6
#2  0x0000ffffa7654e54 in mutex_lock (kind=CACHE_MTX, 
data=0xaaaaf5c98f30) at unix_io.c:151
#3  unix_write_blk64 (channel=0xaaaaf5c98e60, block=2, count=4, 
buf=0xaaaaf5c9d170) at unix_io.c:1092
#4  0x0000ffffa762e610 in ext2fs_flush2 (flags=0, fs=0xaaaaf5c98cc0) at 
closefs.c:401
#5  ext2fs_flush2 (fs=0xaaaaf5c98cc0, flags=0) at closefs.c:279
#6  0x0000ffffa762eb14 in ext2fs_close2 (fs=fs@entry=0xaaaaf5c98cc0, 
flags=flags@entry=0) at closefs.c:510
#7  0x0000ffffa762eba4 in ext2fs_close_free 
(fs_ptr=fs_ptr@entry=0xffffc8cbab30) at closefs.c:472
#8  0x0000aaaadcc39bd8 in preenhalt (ctx=ctx@entry=0xaaaaf5c98460) at 
util.c:365
#9  0x0000aaaadcc3bc5c in e2fsck_handle_write_error (channel=<optimized 
out>, block=262152, count=<optimized out>, data=<optimized out>, 
size=<optimized out>, actual=<optimized out>, error=5)
     at ehandler.c:114
#10 0x0000ffffa7655044 in reuse_cache (block=262206, 
cache=0xaaaaf5c98f80, data=0xaaaaf5c98f30, channel=0xaaaaf5c98e60) at 
unix_io.c:583
#11 unix_write_blk64 (channel=0xaaaaf5c98e60, block=262206, 
count=<optimized out>, buf=<optimized out>) at unix_io.c:1097
#12 0x0000aaaadcc3702c in ll_rw_block (rw=rw@entry=1, 
op_flags=op_flags@entry=0, nr=<optimized out>, nr@entry=1, 
bhp=0xffffc8cbac60, bhp@entry=0xffffc8cbac58) at journal.c:184
#13 0x0000aaaadcc375e8 in brelse (bh=<optimized out>, 
bh@entry=0xaaaaf5cac4a0) at journal.c:217
#14 0x0000aaaadcc3ebe0 in do_one_pass 
(journal=journal@entry=0xaaaaf5c9f590, info=info@entry=0xffffc8cbad60, 
pass=pass@entry=PASS_REPLAY) at recovery.c:693
#15 0x0000aaaadcc3ee74 in jbd2_journal_recover (journal=0xaaaaf5c9f590) 
at recovery.c:310
#16 0x0000aaaadcc386a8 in recover_ext3_journal (ctx=0xaaaaf5c98460) at 
journal.c:1653
#17 e2fsck_run_ext3_journal (ctx=0xaaaaf5c98460) at journal.c:1706
#18 0x0000aaaadcc207e0 in main (argc=<optimized out>, argv=<optimized 
out>) at unix.c:1791

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report] e2fsck: The process is deadlocked
  2022-11-09 10:40 [bug report] e2fsck: The process is deadlocked zhanchengbin
@ 2022-11-09 15:43 ` Theodore Ts'o
  2022-11-10  3:39   ` zhanchengbin
  0 siblings, 1 reply; 3+ messages in thread
From: Theodore Ts'o @ 2022-11-09 15:43 UTC (permalink / raw)
  To: zhanchengbin; +Cc: linux-ext4, linfeilong, liuzhiqiang26

On Wed, Nov 09, 2022 at 06:40:31PM +0800, zhanchengbin wrote:
> Hi Tytso,
> The process is deadlocked, and an I/O error occurs when logs
> are replayed. Because in the I/O error handling function, I/O
> is sent again and catch the mutexlock.

What version of e2fsprogs are you using, and do you have a reliable
reproducer?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bug report] e2fsck: The process is deadlocked
  2022-11-09 15:43 ` Theodore Ts'o
@ 2022-11-10  3:39   ` zhanchengbin
  0 siblings, 0 replies; 3+ messages in thread
From: zhanchengbin @ 2022-11-10  3:39 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4, linfeilong, liuzhiqiang26

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

Version is 1.46.4, I think whether to try to release the mutex lock
in the ext2fs_close_free, such as CACHE_MTX,BOUNCE_MTX,STATS_MTX. But
you need to decide if it's the device you're checking, because I've
checked everyplace where ext2fs_close_free is called, in addition to
the call in the program end and exception branch, it is also called
when the journal device is close.

Reliable reproducer is in attachment.

  -zhanchengbin.

On 2022/11/9 23:43, Theodore Ts'o wrote:
> On Wed, Nov 09, 2022 at 06:40:31PM +0800, zhanchengbin wrote:
>> Hi Tytso,
>> The process is deadlocked, and an I/O error occurs when logs
>> are replayed. Because in the I/O error handling function, I/O
>> is sent again and catch the mutexlock.
> 
> What version of e2fsprogs are you using, and do you have a reliable
> reproducer?
> 
> Thanks,
> 
> 					- Ted
> 
> .
> 

[-- Attachment #2: test.sh --]
[-- Type: text/plain, Size: 2072 bytes --]

#!/bin/bash
disk="sdb"
dir=/mnt/${disk}
mkfs.ext4 -F /dev/$disk
[ -d $dir ] || mkdir $dir

echo 1 > /sys/kernel/debug/fail_make_request/verbose
echo 5 > /sys/kernel/debug/fail_make_request/probability
echo 10 > /sys/kernel/debug/fail_make_request/interval
echo 10000000 > /sys/kernel/debug/fail_make_request/times

function set_sys()
{
	local queue_dir=/sys/block/$1/queue
	interval=$2
	while true
	do
		sleep $interval
		let s_num=RANDOM%4
		case $s_num in
			0)
			scheduler=mq-deadline
			;;
			1)
			scheduler=bfq
			;;
			2)
			scheduler=kyber
			;;
			3)
			scheduler=none
			;;
		esac
		echo $scheduler > $queue_dir/scheduler
	done
}
set_sys $disk 120 &>/dev/null &

i=0
while true
do
	let flag=i%5
	if [ $flag -le 2 ]; then
		tune2fs -l /dev/$disk || exit 1
	fi

	mount -o errors=remount-ro /dev/$disk $dir || exit 1
	tune2fs -l /dev/$disk || exit 1
	fsstress -d $dir/fss -l 20 -n 500 -p 8 > /dev/null 2>&1 &
	sleep $((1 + RANDOM % 3))
	mount | grep $dir | grep '(ro' && exit 1

	echo 1 > /sys/block/$disk/make-it-fail
	sleep $((1 + RANDOM % 3))
	ps -e | grep -w fsstress > /dev/null 2>&1
	while [ $? -eq 0 ]
	do
		sleep 1
		mount | grep $dir | grep '(ro' && killall -9 fsstress
		ps -e | grep -w fsstress > /dev/null 2>&1
	done

	echo 0 > /sys/block/$disk/make-it-fail
	while true
	do
		umount $dir && break
		killall -9 fsstress > /dev/null 2>&1
		sleep 0.1
	done

	if [ $flag -le 1 ]; then
		tune2fs -l /dev/$disk || exit 1
	fi

	echo 1 > /sys/block/$disk/make-it-fail
	echo 10 > /sys/kernel/debug/fail_make_request/probability
	echo 1 > /sys/kernel/debug/fail_make_request/interval
	count=100
	while [ $count -ge 0 ]; do
		fsck -a /dev/$disk
		((count = count - 1))
	done
	echo 5 > /sys/kernel/debug/fail_make_request/probability
	echo 10 > /sys/kernel/debug/fail_make_request/interval
	echo 0 > /sys/block/$disk/make-it-fail

        fsck -a /dev/$disk &> fsck-${disk}.log
        ret=$?
        if [ $ret -ne 0 -a $ret -ne 1 ]; then
		exit 1
        fi

	fsck -fn /dev/$disk
        ret=$?
        if [ $ret -ne 0 ]; then
		exit 1
        fi
	((i=i+1))
done

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-10  3:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-09 10:40 [bug report] e2fsck: The process is deadlocked zhanchengbin
2022-11-09 15:43 ` Theodore Ts'o
2022-11-10  3:39   ` zhanchengbin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox