From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6D088C25B76 for ; Wed, 5 Jun 2024 03:25:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Date:Message-ID:Subject:From:To: Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=GrnU+MA8OZLhNoO7yNtMfusyXz4/4gVEsAp3K/5n5iI=; b=k665zaUKk8s28n hi7CORllKG1VIcRMI1J06LPD8fikt3kdj3l1k6MP2ySSIgA6Jal2n0GluaD3ZI03j6T1lCaXmDEoa CMUK0dUDWOs2IolRF6at2uOIPMX/8C8lE77X3UTkCHknwzMA42jSQo/b+ILTgKOp7gAhWuBVIKQdS fE8TIsPKynuBZSEN5RvIrWc6lCx1E02bKvCER5iMBzRIKf09OWfe8W+HqRW1ybrFUlTWYiDt4nEFu iG8B0wPueY4a+VT6J13qVj1EL/HoU7/T2AtmD1OqNmlt+rcdUFngRS7m8ElZZEPljuZpv6TWWsQyo ifeXXEJjeSrrsfvjAndw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sEhH7-00000004XBY-1vDa; Wed, 05 Jun 2024 03:25:21 +0000 Received: from szxga01-in.huawei.com ([45.249.212.187]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sEhH3-00000004XB9-3yd2 for linux-mtd@lists.infradead.org; Wed, 05 Jun 2024 03:25:20 +0000 Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4VvCRB3YvVzwRkJ; Wed, 5 Jun 2024 11:21:10 +0800 (CST) Received: from kwepemm600013.china.huawei.com (unknown [7.193.23.68]) by mail.maildlp.com (Postfix) with ESMTPS id B2E1318007A; Wed, 5 Jun 2024 11:25:06 +0800 (CST) Received: from [10.174.178.46] (10.174.178.46) by kwepemm600013.china.huawei.com (7.193.23.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Wed, 5 Jun 2024 11:25:05 +0800 To: , , , "zhangyi (F)" From: Zhihao Cheng Subject: UBIFS: problem report: about lpt LEB scanning failed (no issue) Message-ID: <97ca7fe4-4ad4-edd1-e97a-1d540aeabe2d@huawei.com> Date: Wed, 5 Jun 2024 11:25:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 X-Originating-IP: [10.174.178.46] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600013.china.huawei.com (7.193.23.68) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240604_202518_365631_F0AEC49B X-CRM114-Status: GOOD ( 15.41 ) X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-mtd" Errors-To: linux-mtd-bounces+linux-mtd=archiver.kernel.org@lists.infradead.org Problem description Recently I was testing UBIFS with fsstress on a nor flash(simulated by mtdram, 64M size,16K PEB, which means big lpt mode for UBIFS), the utilization rate of one CPU(fsstress program) is 100%, and the fsstress program cannot be killed. The fsstress program stucks in a dead loop: do_commit -> ubifs_lpt_start_commit: while (need_write_all(c)) { mutex_unlock(&c->lp_mutex); err = lpt_gc(c); if (err) return err; mutex_lock(&c->lp_mutex); } Then I found that lpt_gc_lnum handles the same LEB(lnum 8) every time, and the c->ltab[i].dirty for LEB 8 is not equal to c->leb_size after invoking lpt_gc_lnum(). After analyzing the lpt nodes on LEB 8, lpt_gc_lnum returns early before scanning all lpt nodes. The lpt LEB 8 is shown as(partial): [ 104.740309] LEB 8:14383 len 13, nnode num 31, [ 104.740689] dirty 1 [ 104.740905] LEB 8:14396 len 13, nnode num 7, [ 104.741277] dirty 1 [ 104.741486] LEB 8:14409 len 13, nnode num 1, [ 104.741870] dirty 1 [ 104.742078] LEB 8:14422 len 16, pnode num 745 [ 104.742475] dirty 1 [ 104.742682] B type 8 0 [ 104.742925] LEB 8:14438, pad 2 bytes min_io_size 8 [ 104.743301] LEB 8:14440, free 1368 bytes // Actually, the left 1368 bytes are not 0xff, the scanning function(dump_lpt_leb) parses lpt nodes in a wrong way [ 104.743674] (pid 1095) finish dumping LEB 8 The binary image for LEB 8 is(partial): 0x3840 = 14400 00003840: 6a e4 60 cf 91 b1 f3 82 03 17 59 11 40 ac b9 fc 99 11 83 c3 83 03 ff ff 90 6e c3 ec 04 f3 26 a1 j.`.......Y.@.......... ..n....&. 00003860: bf 09 41 a2 6f 94 15 09 58 ee 5f ce 97 7e 09 b8 86 a0 d8 2c 62 3b 47 37 62 e5 e8 59 86 be 82 fe ..A.o...X._..~.....,b;G 7b..Y.... 00003880: 17 6d 63 95 ce 80 76 6e ad e6 44 af f6 43 06 ab 41 28 04 99 72 1f 31 91 cb 96 b1 ef 43 6e 22 2c .mc...vn..D..C..A(..r.1 .....Cn", 000038a0: 26 57 d0 9c b5 76 8b 08 1d fc 41 07 8c ba 26 3b 45 e1 7b 23 de d5 19 63 f3 6c e8 95 b7 02 5a 89 &W...v....A...&;E.{#... c.l....Z. 000038c0: 83 81 0e 72 7c 4b 59 a3 c4 c0 e1 e5 22 7c 27 8d 85 ad c2 93 25 ac 5b 32 c8 02 07 2f 24 f9 e0 f6 ...r|KY....."|'.....%.[ 2.../$... 000038e0: e3 87 f2 bb 62 23 d5 e4 2e b7 8c 41 61 43 2a a4 2f ce 92 4f 62 47 88 a2 11 a6 51 1f da 51 e7 a4 ....b#.....AaC*./..ObG. ...Q..Q.. Let's parse above data by lpt_gc_lnum(). The nnode(1) is at 8: 14409~14421, corresponding data is '17 59 11 40 ac b9 fc 99 11 83 c3 83 03', the type field is the lower UBIFS_LPT_TYPE_BITS(4) bits in '0x11' according to ubifs_pack_nnode(), and the data looks good and it can be parsed as a nnode. The next 2 bytes(8: 14422~14423) are 0xff, which means that lpt data is written into flash with an alignment of 8 bytes(See write_cnodes). After modifying the code of lpt_gc_lnum(), let UBIFS skip the 2 bytes(0xff), UBIFS could parse all lpt nodes in LEB 8. But in fact, UBIFS parses these 2 bytes(0xff) as the crc field of pnode(8: 14422~14437), and the crc16 result of the pnode is just 0xffff, so the field(8: 14422~14437) is parsed as a pnode, and the left lpt nnodes cannot be parsed because of the wrong parsing offset. Why it can happen? The root cause is that the implementation of lpt area disk layout is simple, it would be better if UBIFS has a length field in LPT node. Otherwise, it could be possible that the crc16 result is right both for offset_A~offset_B(node X) and offset_A+2~ offset_C(node Y). Will it happen on a nand flash? In theory, I would say 'yes'. But I never meet it after testing for a whole day. I guess that the min_io_size for nand is (at least) 512, the length of pending bytes(0xff) is hardly less than 3 bytes, so it is hard to reproduce that the crc16 result is right both for offset_A~offset_B(node X) and offset_A+2~ offset_C(node Y). How to reproduce it? You can generate a problem image by a script test.sh (When you see hung task warning or the utilization rate of one CPU becomes 100%, it means the problem occurs). #!/bin/sh DEV=/dev/ubi0_0 KEY_FILE=/tmp/key MNT=/root/temp mtdram_patt="mtdram test device" function fatal() { echo "Error: $1" 1>&2 exit 1 } function find_mtd_device() { printf "%s" "$(grep "$1" /proc/mtd | sed -e "s/^mtd\([0-9]\+\):.*$/\1/")" } # Load mtdram with specified size and PEB size # Usage: load_mtdram # 1. Flash size is specified in MiB # 2. PEB size is specified in KiB function load_mtdram() { local size="$1"; shift local peb_size="$1"; shift size="$(($size * 1024))" modprobe mtdram total_size="$size" erase_size="$peb_size" } function run_test() { local size="$1"; local peb_size="$2"; local page_size="$3"; echo "======================================================================" printf "%s" "MTDRAM ${size}MiB PEB size ${peb_size}KiB" echo "" load_mtdram "$size" "$peb_size" || echo "cannot load mtdram" mtdnum="$(find_mtd_device "$mtdram_patt")" flash_eraseall /dev/mtd$mtdnum modprobe ubi mtd="$mtdnum,$page_size" || fatal "modprobe ubi fail" ubimkvol -N vol_test -m -n 0 /dev/ubi0 || fatal "mkvol fail" modprobe ubifs || fatal "modprobe ubifs fail" mount -t ubifs $DEV $MNT || fatal "mount ubifs fail" fsstress -d $MNT -l0 -p4 -n10000 & sleep $((RANDOM % 120)) ps -e | grep -w fsstress > /dev/null 2>&1 while [ $? -eq 0 ] do killall -9 fsstress > /dev/null 2>&1 sleep 1 ps -e | grep -w fsstress > /dev/null 2>&1 done while true do res=`mount | grep "$MNT"` if [[ "$res" == "" ]] then break; fi umount $MNT sleep 0.1 done modprobe -r ubifs modprobe -r ubi modprobe -r mtdram echo "----------------------------------------------------------------------" } while true do run_test "64" "16" "512" done https://bugzilla.kernel.org/show_bug.cgi?id=218935 Or you can mount the problem image(disk.tar.gz) directly by following script: #!/bin/sh DEV=/dev/ubi0_0 KEY_FILE=/tmp/key MNT=/root/temp mtdram_patt="mtdram test device" function fatal() { echo "Error: $1" 1>&2 exit 1 } function find_mtd_device() { printf "%s" "$(grep "$1" /proc/mtd | sed -e "s/^mtd\([0-9]\+\):.*$/\1/")" } # Load mtdram with specified size and PEB size # Usage: load_mtdram # 1. Flash size is specified in MiB # 2. PEB size is specified in KiB function load_mtdram() { local size="$1"; shift local peb_size="$1"; shift size="$(($size * 1024))" modprobe mtdram total_size="$size" erase_size="$peb_size" } function run_test() { local size="$1"; local peb_size="$2"; local page_size="$3"; echo "======================================================================" printf "%s" "MTDRAM ${size}MiB PEB size ${peb_size}KiB" echo "" load_mtdram "$size" "$peb_size" || echo "cannot load mtdram" mtdnum="$(find_mtd_device "$mtdram_patt")" flash_eraseall /dev/mtd$mtdnum tar xvzf disk.tar.gz dd if=disk of=/dev/mtd0 bs=1M modprobe ubi mtd=0,512 mount /dev/ubi0_0 /root/temp } run_test "64" "16" "512" PS: I report the problem as no issue, because I don't think we can fix it without modifying disk layout. I think it's just a designment nit, no need to fix it. I just want people know the problem if someone meet it one day. ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/