linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Marc Lehmann <schmorp@schmorp.de>
Cc: linux-f2fs-devel@lists.sourceforge.net
Subject: Re: sync/umount hang on 3.18.21, 1.4TB gone after crash
Date: Fri, 25 Sep 2015 11:42:02 -0700	[thread overview]
Message-ID: <20150925184202.GD6998@jaegeuk-mac02> (raw)
In-Reply-To: <20150925060018.GC486@schmorp.de>

On Fri, Sep 25, 2015 at 08:00:19AM +0200, Marc Lehmann wrote:
> On Thu, Sep 24, 2015 at 11:50:23AM -0700, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > When I came back after ~10 hours, I found a number of hung task messages
> > > in syslog, and when I entered sync, sync was consuming 100% system time.
> > 
> > Hmm, at this time, it would be good to check what process is stuck through
> > sysrq.
> 
> It was only intermittently, but here they are. The first one is almost
> certainly the sync that I originally didn't have a backtrace for, the
> second one is one that came up frequently during the f2fs test.
> 
>    INFO: task sync:10577 blocked for more than 120 seconds.
>          Tainted: G        W  OE   4.2.1-040201-generic #201509211431
>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    sync            D ffff88082ec964c0     0 10577  10549 0x00000000
>     ffff88000210fdc8 0000000000000082 ffff88062ef2a940 ffff88010337e040
>     0000000000000246 ffff880002110000 ffff8806294915f8 ffff8805c939b800
>     ffff88000210fe54 ffffffff8121a910 ffff88000210fde8 ffffffff817a5a37
>    Call Trace:
>     [<ffffffff8121a910>] ? SyS_tee+0x360/0x360
>     [<ffffffff817a5a37>] schedule+0x37/0x80
>     [<ffffffff81211f09>] wb_wait_for_completion+0x49/0x80
>     [<ffffffff810b6f90>] ? prepare_to_wait_event+0xf0/0xf0
>     [<ffffffff81213134>] sync_inodes_sb+0x94/0x1b0
>     [<ffffffff8121a910>] ? SyS_tee+0x360/0x360
>     [<ffffffff8121a925>] sync_inodes_one_sb+0x15/0x20
>     [<ffffffff811ed1b9>] iterate_supers+0xb9/0x110
>     [<ffffffff8121ac65>] sys_sync+0x35/0x90
>     [<ffffffff817a9272>] entry_SYSCALL_64_fastpath+0x16/0x75
> 
>    INFO: task watchdog/1:14743 blocked for more than 120 seconds.
>          Tainted: P           OE  3.18.21-031821-generic #201509020527
>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    watchdog/1      D ffff88082ec93300     0 14743      2 0x00000000
>     ffff8801a2383c48 0000000000000046 ffff880273a50000 0000000000013300
>     ffff8801a2383fd8 0000000000013300 ffff8802e642a800 ffff880273a50000
>     0000000000001000 ffffffff81c23d80 ffffffff81c23d84 ffff880273a50000
>    Call Trace:
>     [<ffffffff817847f9>] schedule_preempt_disabled+0x29/0x70
>     [<ffffffff81786435>] __mutex_lock_slowpath+0x95/0x100
>     [<ffffffff810a8ac9>] ? enqueue_entity+0x289/0xb20
>     [<ffffffff817864c3>] mutex_lock+0x23/0x37
>     [<ffffffff81029823>] x86_pmu_event_init+0x343/0x430
>     [<ffffffff811680db>] perf_init_event+0xcb/0x130
>     [<ffffffff811684d8>] perf_event_alloc+0x398/0x440
>     [<ffffffff810a8431>] ? put_prev_entity+0x31/0x3f0
>     [<ffffffff811249b0>] ? restart_watchdog_hrtimer+0x60/0x60
>     [<ffffffff81169156>] perf_event_create_kernel_counter+0x26/0x100
>     [<ffffffff8112477d>] watchdog_nmi_enable+0xcd/0x170
>     [<ffffffff81124865>] watchdog_enable+0x45/0xa0
>     [<ffffffff81093f09>] smpboot_thread_fn+0xb9/0x1a0
>     [<ffffffff8108ff9c>] ? __kthread_parkme+0x4c/0x80
>     [<ffffffff81093e50>] ? SyS_setgroups+0x180/0x180
>     [<ffffffff81090219>] kthread+0xc9/0xe0
>     [<ffffffff81090150>] ? kthread_create_on_node+0x180/0x180
>     [<ffffffff81788618>] ret_from_fork+0x58/0x90
>     [<ffffffff81090150>] ? kthread_create_on_node+0x180/0x180
> 
> The watchdog might or might not be unrelated, but it is either a 4.2.1
> thing (new kernel) or f2fs related. I only had them during the f2fs test,
> and often, not before or after.
> 
> (I don't know what that kernel thread does, but the system was somewhat
> sluggish during the test, and other, unrelated servcies, were negatively
> affected).
> 
> > It seems there was no fsync after sync at all. That's why f2fs recovered back to
> > the latest checkpoint. Anyway, I'm thinking that it's worth to add a kind of
> > periodic checkpoints.
> 
> Well, would it sync more often if this problem hadn't occured? Most
> filesystems (or rather, the filesystems I use, btrfs, xfs, ext* and zfs)
> seem to have their own regular commit interval, or otherwise commit
> frequently if it is cheap enough.

AFAIK, there-in *commit* means syncing metadata, not userdata. Doesn't it?
So, even if you saw no data loss, filesystem doesn't guarantee all the data were
completely recovered, since sync or fsync was not called for that file.

I think you need to tune the system-wide parameters related to flusher mentioned
by Chao for your workloads.
And, we need to expect periodic checkpoints are able to recover the previously
flushed data.

Thanks,

> 
> -- 
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

  parent reply	other threads:[~2015-09-25 18:42 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23 21:58 sync/umount hang on 3.18.21, 1.4TB gone after crash Marc Lehmann
2015-09-23 23:11 ` write performance difference 3.18.21/4.2.1 Marc Lehmann
2015-09-24 18:28   ` Jaegeuk Kim
2015-09-24 23:20     ` Marc Lehmann
2015-09-24 23:27       ` Marc Lehmann
2015-09-25  6:50     ` Marc Lehmann
2015-09-25  9:47       ` Chao Yu
2015-09-25 18:20         ` Jaegeuk Kim
2015-09-26  3:22         ` Marc Lehmann
2015-09-26  5:25           ` write performance difference 3.18.21/git f2fs Marc Lehmann
2015-09-26  5:57             ` Marc Lehmann
2015-09-26  7:52             ` Jaegeuk Kim
2015-09-26 13:59               ` Marc Lehmann
2015-09-28 17:59                 ` Jaegeuk Kim
2015-09-29 11:02                   ` Marc Lehmann
2015-09-29 23:13                     ` Jaegeuk Kim
2015-09-30  9:02                       ` Chao Yu
2015-10-01 12:11                       ` Marc Lehmann
2015-10-01 18:51                         ` Marc Lehmann
2015-10-02  8:53                           ` 100% system time hang with git f2fs Marc Lehmann
2015-10-02 16:51                             ` Jaegeuk Kim
2015-10-03  6:29                               ` Marc Lehmann
2015-10-02 16:46                           ` write performance difference 3.18.21/git f2fs Jaegeuk Kim
2015-10-04  9:40                             ` near disk full performance (full 8TB) Marc Lehmann
2015-09-26  7:48           ` write performance difference 3.18.21/4.2.1 Jaegeuk Kim
2015-09-25 18:26       ` Jaegeuk Kim
2015-09-24 18:50 ` sync/umount hang on 3.18.21, 1.4TB gone after crash Jaegeuk Kim
2015-09-25  6:00   ` Marc Lehmann
2015-09-25  6:01     ` Marc Lehmann
2015-09-25 18:42     ` Jaegeuk Kim [this message]
2015-09-26  3:08       ` Marc Lehmann
2015-09-26  7:27         ` Jaegeuk Kim
2015-09-25  9:13   ` Chao Yu
2015-09-25 18:30     ` Jaegeuk Kim
  -- strict thread matches above, loose matches on Subject: below --
2015-08-08 20:50 general stability of f2fs? Marc Lehmann
2015-08-10 20:31 ` Jaegeuk Kim
2015-08-10 20:53   ` Marc Lehmann
2015-08-10 21:58     ` Jaegeuk Kim
2015-08-13  0:26       ` Marc Lehmann
2015-08-14 23:07         ` Jaegeuk Kim
2015-09-20 23:59   ` finally testing with SMR drives Marc Lehmann
2015-09-21  8:17     ` SMR drive test 1; 512GB partition; very slow + unfixable corruption Marc Lehmann
2015-09-21  8:19       ` Marc Lehmann
2015-09-21  9:58         ` SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning Marc Lehmann
2015-09-22 20:22           ` SMR drive test 3: full 8TB partition, mount problems, fsck error after delete Marc Lehmann
2015-09-22 23:08             ` Jaegeuk Kim
2015-09-23  3:50               ` Marc Lehmann
2015-09-23  1:12           ` SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning Jaegeuk Kim
2015-09-23  4:15             ` Marc Lehmann
2015-09-23  6:00               ` Marc Lehmann
2015-09-23  8:55                 ` Chao Yu
2015-09-23 23:30                   ` Marc Lehmann
2015-09-23 23:43                     ` Marc Lehmann
2015-09-24 17:21                       ` Jaegeuk Kim
2015-09-25  8:28                         ` Chao Yu
2015-09-25  8:05                     ` Chao Yu
2015-09-26  3:42                       ` Marc Lehmann
2015-09-23 22:08                 ` Jaegeuk Kim
2015-09-23 23:39                   ` Marc Lehmann
2015-09-24 17:27                     ` Jaegeuk Kim
2015-09-25  5:42                       ` Marc Lehmann
2015-09-25 17:45                         ` Jaegeuk Kim
2015-09-26  3:32                           ` Marc Lehmann
2015-09-26  7:36                             ` Jaegeuk Kim
2015-09-26 13:53                               ` Marc Lehmann
2015-09-28 18:33                                 ` Jaegeuk Kim
2015-09-29  7:36                                   ` Marc Lehmann
2015-09-23  6:06               ` Marc Lehmann
2015-09-23  9:10                 ` Chao Yu
2015-09-23 21:30                   ` Jaegeuk Kim
2015-09-23 23:11                   ` Marc Lehmann
2015-09-23 21:29               ` Jaegeuk Kim
2015-09-23 23:24                 ` Marc Lehmann
2015-09-24 17:51                   ` Jaegeuk Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150925184202.GD6998@jaegeuk-mac02 \
    --to=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=schmorp@schmorp.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).