From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Thommandra Gowtham <trgowtham123@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS suddenly moving to read-only
Date: Tue, 11 Aug 2020 18:38:28 +0800 [thread overview]
Message-ID: <f8742974-69b2-a0e9-ff99-4c61dc4f9ff0@gmx.com> (raw)
In-Reply-To: <CA+XNQ=iupWN6ck5M0hUQ-+470F9PKdoKKUUt+tmQOWoC=zterg@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 7266 bytes --]
On 2020/8/11 下午5:39, Thommandra Gowtham wrote:
> Hi,
>
> Need some help to understand if there are any issues in BTRFS/Linux Kernel.
>
> Running BTRFS as root filesystem and we see that suddenly the entire
> disk is moved to read-only due to errors.
>
> Did the SSD run out of life? If yes, then
> - What are the best BTRFS options for frequent small amount of
> writes(log files) on low quality SSD? If we want to increase the life
> of the Disk.
If using systemd, you can config systemd to go memory only journal, so
that the life of the ssd can be expanded.
> - How do we determine the Disk health apart from SMART attributes? Can
> we do a Disk write/read test to figure it out?
AFAIK SMART is the only thing we can rely on now.
>
> mount options used:
> rw,noatime,compress=lzo,ssd,space_cache,commit=60,subvolid=263
>
> # btrfs --version
> btrfs-progs v4.4
>
> Ubuntu 16.04: 4.15.0-36-generic #1 SMP Mon Oct 22 21:20:30 PDT 2018
> x86_64 x86_64 x86_64 GNU/Linux
>
> mkstemp: Read-only file system
> [35816007.175210] print_req_error: I/O error, dev sda, sector 4472632
> [35816007.182192] BTRFS error (device sda4): bdev /dev/sda4 errs: wr
> 66, rd 725, flush 0, corrupt 0, gen 0
This means some read error happened.
Do you have extra log context?
> [35816007.192913] print_req_error: I/O error, dev sda, sector 4472632
> [35816007.199855] BTRFS error (device sda4): bdev /dev/sda4 errs: wr
> 66, rd 726, flush 0, corrupt 0, gen 0
> [35816007.210675] print_req_error: I/O error, dev sda, sector 10180680
> [35816007.217748] BTRFS error (device sda4): bdev /dev/sda4 errs: wr
> 66, rd 727, flush 0, corrupt 0, gen 0
> [35816007.461941] print_req_error: I/O error, dev sda, sector 4472048
> [35816007.468903] BTRFS error (device sda4): bdev /dev/sda4 errs: wr
> 66, rd 728, flush 0, corrupt 0, gen 0
> [35816007.479611] systemd[7035]: serial-getty@ttyS0.service: Failed at
> step EXEC spawning /sbin/agetty: Input/output error
> [35816007.712006] print_req_error: I/O error, dev sda, sector 4472048
This means, we failed to read some data from sda.
It's not the error from btrfs checksum verification, but directly read
error from the device driver.
So the command, agetty can't be executed due to we failed to read the
content of that executable file.
>
> # dmesg | tail
> bash: /bin/dmesg: Input/output error
>
> Doesn't Input/output error mean the disk is inaccessible?
This means, we can't even access /bin/dmesg the file itself.
>
> # btrfs fi show
> Label: 'rpool' uuid: 42d39990-e4eb-414b-8b17-0c4a2f76cc76
> Total devices 1 FS bytes used 11.80GiB
> devid 1 size 27.20GiB used 19.01GiB path /dev/sda4
>
> # smartctl -a /dev/sda
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.15.0-36-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> Short INQUIRY response, skip product id
> A mandatory SMART command failed: exiting. To continue, add one or
> more '-T permissive' options.
>
>
> We were able to get smartctl o/p after a power-cycle
Did you get dmesg/agetty run after a power-cycle?
Or it still triggers the same -EIO error?
BTW, if the smartctl doesn't record above read error as error, maybe
it's some unstable cables causing temporary errors?
Thanks,
Qu
>
> # smartctl -a /dev/sda
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.15.0-36-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model: FS032GM242I-AC
> Serial Number: AA010520170000000489
> Firmware Version: O1026A
> User Capacity: 31,488,000,000 bytes [31.4 GB]
> Sector Size: 512 bytes logical/physical
> Rotation Rate: Solid State Device
> Device is: Not in smartctl database [for details use: -P showall]
> ATA Version is: ACS-2 (minor revision not indicated)
> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is: Sun Aug 9 04:26:10 2020 EDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> ...
> SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age
> Offline - 0
> 5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age
> Offline - 0
> 9 Power_On_Hours 0x0000 100 100 000 Old_age
> Offline - 735
> 12 Power_Cycle_Count 0x0000 100 100 000 Old_age
> Offline - 20
> 160 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 0
> 161 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 58
> 163 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 2
> 164 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 1045371
> 165 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 1075
> 166 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 972
> 167 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 1030
> 168 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 3000
> 169 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 66
> 175 Program_Fail_Count_Chip 0x0000 100 100 000 Old_age
> Offline - 0
> 176 Erase_Fail_Count_Chip 0x0000 100 100 000 Old_age
> Offline - 0
> 177 Wear_Leveling_Count 0x0000 100 100 050 Old_age
> Offline - 3733
> 178 Used_Rsvd_Blk_Cnt_Chip 0x0000 100 100 000 Old_age
> Offline - 0
> 181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age
> Offline - 0
> 182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age
> Offline - 0
> 192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age
> Offline - 5
> 194 Temperature_Celsius 0x0000 100 100 000 Old_age
> Offline - 40
> 195 Hardware_ECC_Recovered 0x0000 100 100 000 Old_age
> Offline - 0
> 196 Reallocated_Event_Count 0x0000 100 100 016 Old_age
> Offline - 0
> 197 Current_Pending_Sector 0x0000 100 100 000 Old_age
> Offline - 0
> 198 Offline_Uncorrectable 0x0000 100 100 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0000 100 100 050 Old_age
> Offline - 0
> 232 Available_Reservd_Space 0x0000 100 100 000 Old_age
> Offline - 100
> 241 Total_LBAs_Written 0x0000 100 100 000 Old_age
> Offline - 766189
> 242 Total_LBAs_Read 0x0000 100 100 000 Old_age
> Offline - 11847
> 245 Unknown_Attribute 0x0000 100 100 000 Old_age
> Offline - 1045371
>
> Regards,
> Gowtham
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-08-11 10:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-11 9:39 BTRFS suddenly moving to read-only Thommandra Gowtham
2020-08-11 10:38 ` Qu Wenruo [this message]
2020-08-11 15:12 ` Thommandra Gowtham
2020-08-11 22:58 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f8742974-69b2-a0e9-ff99-4c61dc4f9ff0@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=trgowtham123@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox