From: Marc Lehmann <schmorp@schmorp.de>
To: linux-f2fs-devel@lists.sourceforge.net
Subject: sync/umount hang on 3.18.21, 1.4TB gone after crash
Date: Wed, 23 Sep 2015 23:58:51 +0200	[thread overview]
Message-ID: <20150923215850.GC2360@schmorp.de> (raw)
Hi!
I moved one of the SMR disks to another box with a 3.18.21 kernel.
I formatted and mounted like this:
   /opt/f2fs-tools/sbin/mkfs.f2fs -lTEST -s90 -t0 -a0 /dev/vg_test/test
   mount -t f2fs -onoatime,flush_merge,no_heap /dev/vg_test/test /mnt
I then copied (tar | tar) 2.1TB of data to the disk, which took about 6
hours, which is about the read speed of this data set (so the speed was very
good).
When I came back after ~10 hours, I found a number of hung task messages
in syslog, and when I entered sync, sync was consuming 100% system time.
I took a snapshot of /sys/kernel/debug/f2fs/status before sync, and the
values arfe "frozen", i.e. they didn't change.
I was able to read from the mounted filesystem normally, and I was able to
read and write the block device itself, so the disk is responsive.
After ~1h in this state, I tried to umount, which made the filesystem
mountpoint go away, but umount hangs, and /sys/kernel/debug/f2fs/status still
doesn't change.
This is the output of /sys/kernel/debug/f2fs/status:
http://ue.tst.eu/d88ce0e21a7ca0fb74b1ecadfa475df0.txt
I then deleted the device, but the echo 1 >/sys/block/sde/device/delete was
also hanging.
Here are /proc/.../stack outputs of sync, umount and bash(echo):
   sync:
   [<ffffffffffffffff>] 0xffffffffffffffff
   umount:
   [<ffffffff8139ba03>] call_rwsem_down_write_failed+0x13/0x20
   [<ffffffff811e7ee6>] deactivate_super+0x46/0x70
   [<ffffffff81204733>] cleanup_mnt+0x43/0x90
   [<ffffffff812047d2>] __cleanup_mnt+0x12/0x20
   [<ffffffff8108e8a4>] task_work_run+0xc4/0xe0
   [<ffffffff81012fa7>] do_notify_resume+0x97/0xb0
   [<ffffffff8178896f>] int_signal+0x12/0x17
   [<ffffffffffffffff>] 0xffffffffffffffff
   bash (delete):
   [<ffffffff810d8917>] msleep+0x37/0x50
   [<ffffffff8135d686>] __blk_drain_queue+0xa6/0x1a0
   [<ffffffff8135da05>] blk_cleanup_queue+0x1b5/0x1c0
   [<ffffffff8152082a>] __scsi_remove_device+0x5a/0xe0
   [<ffffffff815208d6>] scsi_remove_device+0x26/0x40
   [<ffffffff81520917>] sdev_store_delete+0x27/0x30
   [<ffffffff814bf748>] dev_attr_store+0x18/0x30
   [<ffffffff8125bc4d>] sysfs_kf_write+0x3d/0x50
   [<ffffffff8125b154>] kernfs_fop_write+0xe4/0x160
   [<ffffffff811e51a7>] vfs_write+0xb7/0x1f0
   [<ffffffff811e5c26>] SyS_write+0x46/0xb0
   [<ffffffff817886cd>] system_call_fastpath+0x16/0x1b
   [<ffffffffffffffff>] 0xffffffffffffffff
After a forced reboot, I did a fsck, and got this, which looks good except
for the "Wrong segment type" message, which hopefully is harmless.
http://ue.tst.eu/4c750d2301a581cb07249d607aa0e6d0.txt
After mounting, status was this (and was changing):
http://ue.tst.eu/6462606ac3aa85bde0d6674365c86318.txt
Note that 1.4TB of data are missing(!)
This large amount of missing data was certainly unexpected. I assume f2fs
stopped checkpointing earlier, and only after a checkpoint the data is
safe, but being able to write 1.4TB of data without it ever reaching the
disk is very unexpected behaviour for a filesystem (which normally loses
about half a minute of data at most).
Minor question, since the disk actually has 4K physical sectors, and fsck
says sector size = 512, is there a way to teach f2fs that the physical
sector size is actually 4k, or does this not matter because f2fs will do
page-sized writes anyways?
In any case, any insights would be appreciated. I will attwmpt to upgrade
this box to linux 4.2.1 to see if that helps, but 3.18.x is the onl
kernel known to work with smr drives without any issues.
-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
next             reply	other threads:[~2015-09-23 21:58 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23 21:58 Marc Lehmann [this message]
2015-09-23 23:11 ` write performance difference 3.18.21/4.2.1 Marc Lehmann
2015-09-24 18:28   ` Jaegeuk Kim
2015-09-24 23:20     ` Marc Lehmann
2015-09-24 23:27       ` Marc Lehmann
2015-09-25  6:50     ` Marc Lehmann
2015-09-25  9:47       ` Chao Yu
2015-09-25 18:20         ` Jaegeuk Kim
2015-09-26  3:22         ` Marc Lehmann
2015-09-26  5:25           ` write performance difference 3.18.21/git f2fs Marc Lehmann
2015-09-26  5:57             ` Marc Lehmann
2015-09-26  7:52             ` Jaegeuk Kim
2015-09-26 13:59               ` Marc Lehmann
2015-09-28 17:59                 ` Jaegeuk Kim
2015-09-29 11:02                   ` Marc Lehmann
2015-09-29 23:13                     ` Jaegeuk Kim
2015-09-30  9:02                       ` Chao Yu
2015-10-01 12:11                       ` Marc Lehmann
2015-10-01 18:51                         ` Marc Lehmann
2015-10-02  8:53                           ` 100% system time hang with git f2fs Marc Lehmann
2015-10-02 16:51                             ` Jaegeuk Kim
2015-10-03  6:29                               ` Marc Lehmann
2015-10-02 16:46                           ` write performance difference 3.18.21/git f2fs Jaegeuk Kim
2015-10-04  9:40                             ` near disk full performance (full 8TB) Marc Lehmann
2015-09-26  7:48           ` write performance difference 3.18.21/4.2.1 Jaegeuk Kim
2015-09-25 18:26       ` Jaegeuk Kim
2015-09-24 18:50 ` sync/umount hang on 3.18.21, 1.4TB gone after crash Jaegeuk Kim
2015-09-25  6:00   ` Marc Lehmann
2015-09-25  6:01     ` Marc Lehmann
2015-09-25 18:42     ` Jaegeuk Kim
2015-09-26  3:08       ` Marc Lehmann
2015-09-26  7:27         ` Jaegeuk Kim
2015-09-25  9:13   ` Chao Yu
2015-09-25 18:30     ` Jaegeuk Kim
  -- strict thread matches above, loose matches on Subject: below --
2015-08-08 20:50 general stability of f2fs? Marc Lehmann
2015-08-10 20:31 ` Jaegeuk Kim
2015-08-10 20:53   ` Marc Lehmann
2015-08-10 21:58     ` Jaegeuk Kim
2015-08-13  0:26       ` Marc Lehmann
2015-08-14 23:07         ` Jaegeuk Kim
2015-09-20 23:59   ` finally testing with SMR drives Marc Lehmann
2015-09-21  8:17     ` SMR drive test 1; 512GB partition; very slow + unfixable corruption Marc Lehmann
2015-09-21  8:19       ` Marc Lehmann
2015-09-21  9:58         ` SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning Marc Lehmann
2015-09-22 20:22           ` SMR drive test 3: full 8TB partition, mount problems, fsck error after delete Marc Lehmann
2015-09-22 23:08             ` Jaegeuk Kim
2015-09-23  3:50               ` Marc Lehmann
2015-09-23  1:12           ` SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning Jaegeuk Kim
2015-09-23  4:15             ` Marc Lehmann
2015-09-23  6:00               ` Marc Lehmann
2015-09-23  8:55                 ` Chao Yu
2015-09-23 23:30                   ` Marc Lehmann
2015-09-23 23:43                     ` Marc Lehmann
2015-09-24 17:21                       ` Jaegeuk Kim
2015-09-25  8:28                         ` Chao Yu
2015-09-25  8:05                     ` Chao Yu
2015-09-26  3:42                       ` Marc Lehmann
2015-09-23 22:08                 ` Jaegeuk Kim
2015-09-23 23:39                   ` Marc Lehmann
2015-09-24 17:27                     ` Jaegeuk Kim
2015-09-25  5:42                       ` Marc Lehmann
2015-09-25 17:45                         ` Jaegeuk Kim
2015-09-26  3:32                           ` Marc Lehmann
2015-09-26  7:36                             ` Jaegeuk Kim
2015-09-26 13:53                               ` Marc Lehmann
2015-09-28 18:33                                 ` Jaegeuk Kim
2015-09-29  7:36                                   ` Marc Lehmann
2015-09-23  6:06               ` Marc Lehmann
2015-09-23  9:10                 ` Chao Yu
2015-09-23 21:30                   ` Jaegeuk Kim
2015-09-23 23:11                   ` Marc Lehmann
2015-09-23 21:29               ` Jaegeuk Kim
2015-09-23 23:24                 ` Marc Lehmann
2015-09-24 17:51                   ` Jaegeuk Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=20150923215850.GC2360@schmorp.de \
    --to=schmorp@schmorp.de \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).