All of lore.kernel.org
 help / color / mirror / Atom feed
* f2fs stability problems keep me from testing
@ 2015-11-17 17:24 Marc Lehmann
  2015-11-18 10:00 ` Chao Yu
  2015-11-19  8:04 ` Chao Yu
  0 siblings, 2 replies; 10+ messages in thread
From: Marc Lehmann @ 2015-11-17 17:24 UTC (permalink / raw)
  To: linux-f2fs-devel

Hi!

I have trouble executing the tests I wanted to run with the current 3.18
checkout. This morning, the box was completely unresponsive - I had to
reboot, not knowing the cause (the only difference is that f2fs is in more
or less production use for a few days).

An hour ago, I was awake when similar problems started - interactive
login was impossible, but I was able to execute a few commands in an open
shell, which make me suspect f2fs to be the culprit. Both times, the
f2fs filesystem was streaming video at low speed (<1mb/s) with no other
activity.

Anyway, here are the four experiments I did, after finding out that the
problem seems to the the f2fs fs (the other 4 filesystems on the box were
responsive, as was the underlying disk itself).

1. ls /cold1, find /cold1 (/cold1 is the f2fs mountpoint) gave empty results
   here is an strace of find /cold1:

   openat(AT_FDCWD, "/cold1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
   fchdir(5)                               = 0
   getdents(5, /* 0 entries */, 32768)     = 0
   close(5)                                = 0

   so /cold1 is an empty directory. not good.

2. so no files in /cold1, let's see what happens when I list /cold1/var, a
   directory known to exist:

   openat(AT_FDCWD, "/cold1/var", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
   fchdir(5)                               = 0
   getdents(5, /* 0 entries */, 32768)     = 0
   close(5)                                = 0

   so f2fs knowsn that /cold1/var exists, but readdir gives no results. very
   troubling.

3. "sync&" - this did hang, with no apparent activity

4. cat /proc/<sync-pid>/task/*stack:

   [<ffffffff8121d4d8>] sync_inodes_sb+0xa8/0x1c0
   [<ffffffff81224249>] sync_inodes_one_sb+0x19/0x20
   [<ffffffff811f6192>] iterate_supers+0xb2/0x110
   [<ffffffff812244d5>] sys_sync+0x35/0x90
   [<ffffffff817a684d>] system_call_fastpath+0x16/0x1b
   [<ffffffffffffffff>] 0xffffffffffffffff

5. dmesg showed no related messages whatsoever - it still had the kernel
   messages generated from boot, and nothing else.

6. at this point I lost my shell and control over the box completely, and had to be rebooted

So something in the current f2fs tree (I checked that
/sys/fs/f2fs/dm-17/ra_nid_pages exists, so it is a more or less current
shapshot) is still locking up and/or returning corrupt data. If it was
a simple locking failure, though, I would expect readdir and other
operations to also block, not return bad data.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-17 17:24 f2fs stability problems keep me from testing Marc Lehmann
@ 2015-11-18 10:00 ` Chao Yu
  2015-11-19  0:38   ` Marc Lehmann
  2015-11-19  1:42   ` Marc Lehmann
  2015-11-19  8:04 ` Chao Yu
  1 sibling, 2 replies; 10+ messages in thread
From: Chao Yu @ 2015-11-18 10:00 UTC (permalink / raw)
  To: 'Marc Lehmann', linux-f2fs-devel

Hi,

> -----Original Message-----
> From: Marc Lehmann [mailto:schmorp@schmorp.de]
> Sent: Wednesday, November 18, 2015 1:25 AM
> To: linux-f2fs-devel@lists.sourceforge.net
> Subject: [f2fs-dev] f2fs stability problems keep me from testing
> 
> Hi!
> 
> I have trouble executing the tests I wanted to run with the current 3.18
> checkout. This morning, the box was completely unresponsive - I had to
> reboot, not knowing the cause (the only difference is that f2fs is in more
> or less production use for a few days).
> 
> An hour ago, I was awake when similar problems started - interactive
> login was impossible, but I was able to execute a few commands in an open
> shell, which make me suspect f2fs to be the culprit. Both times, the
> f2fs filesystem was streaming video at low speed (<1mb/s) with no other
> activity.
> 
> Anyway, here are the four experiments I did, after finding out that the
> problem seems to the the f2fs fs (the other 4 filesystems on the box were
> responsive, as was the underlying disk itself).
> 
> 1. ls /cold1, find /cold1 (/cold1 is the f2fs mountpoint) gave empty results
>    here is an strace of find /cold1:
> 
>    openat(AT_FDCWD, "/cold1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>    fchdir(5)                               = 0
>    getdents(5, /* 0 entries */, 32768)     = 0
>    close(5)                                = 0
> 
>    so /cold1 is an empty directory. not good.
> 
> 2. so no files in /cold1, let's see what happens when I list /cold1/var, a
>    directory known to exist:
> 
>    openat(AT_FDCWD, "/cold1/var", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>    fchdir(5)                               = 0
>    getdents(5, /* 0 entries */, 32768)     = 0
>    close(5)                                = 0
> 
>    so f2fs knowsn that /cold1/var exists, but readdir gives no results. very
>    troubling.

Could you share more info about this issue? like mount option, fsck data,
stat info about the two files if they still exist.

Thanks,

> 
> 3. "sync&" - this did hang, with no apparent activity
> 
> 4. cat /proc/<sync-pid>/task/*stack:
> 
>    [<ffffffff8121d4d8>] sync_inodes_sb+0xa8/0x1c0
>    [<ffffffff81224249>] sync_inodes_one_sb+0x19/0x20
>    [<ffffffff811f6192>] iterate_supers+0xb2/0x110
>    [<ffffffff812244d5>] sys_sync+0x35/0x90
>    [<ffffffff817a684d>] system_call_fastpath+0x16/0x1b
>    [<ffffffffffffffff>] 0xffffffffffffffff
> 
> 5. dmesg showed no related messages whatsoever - it still had the kernel
>    messages generated from boot, and nothing else.
> 
> 6. at this point I lost my shell and control over the box completely, and had to be rebooted
> 
> So something in the current f2fs tree (I checked that
> /sys/fs/f2fs/dm-17/ra_nid_pages exists, so it is a more or less current
> shapshot) is still locking up and/or returning corrupt data. If it was
> a simple locking failure, though, I would expect readdir and other
> operations to also block, not return bad data.
> 
> --
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-18 10:00 ` Chao Yu
@ 2015-11-19  0:38   ` Marc Lehmann
  2015-11-19  1:29     ` Chao Yu
  2015-11-19  1:42   ` Marc Lehmann
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Lehmann @ 2015-11-19  0:38 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel

On Wed, Nov 18, 2015 at 06:00:40PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> Could you share more info about this issue? like mount option, fsck data,
> stat info about the two files if they still exist.

Sure, the mount options are (mkfs.f2fs -s64 -t0 -a0 btw).

   -onoinline_data,noatime,flush_merge,no_heap

I ran fsck, and it have all "Ok" messages, I can run it again, but would
avoid it if all you want is to know whether fsck shows a problem, but can run
it again if it is of any interest (the disk is in near-constant use):

The two directories have not changed since then:

   # stat /cold1
     File: ‘/cold1’
     Size: 4096            Blocks: 16         IO Block: 4096   directory
   Device: fc11h/64529d    Inode: 3           Links: 3
   Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
   Access: 2015-11-14 02:44:42.000000000 +0100
   Modify: 2015-11-17 02:36:52.480382105 +0100
   Change: 2015-11-17 02:36:52.480382105 +0100
   # stat /cold1/var
     File: ‘/cold1/var’
     Size: 4096            Blocks: 16         IO Block: 4096   directory
   Device: fc11h/64529d    Inode: 956865      Links: 3
   Access: (0711/drwx--x--x)  Uid: (    0/    root)   Gid: (    0/    root)
   Access: 2015-11-14 20:33:56.014926084 +0100
   Modify: 2015-11-14 20:33:56.014926084 +0100
   Change: 2015-11-14 20:33:58.286917162 +0100
    Birth: -

Neither their contents:

   /cold1:
   total 8
   drwx--x--x 3 root root 4096 Nov 14 20:33 var

   /cold1/var:
   total 8
   drwx--x--x 3 root root 4096 Nov 14 20:33 lib

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-19  0:38   ` Marc Lehmann
@ 2015-11-19  1:29     ` Chao Yu
  2015-11-19  2:23       ` Marc Lehmann
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2015-11-19  1:29 UTC (permalink / raw)
  To: 'Marc Lehmann'; +Cc: linux-f2fs-devel

Hi,

> -----Original Message-----
> From: Marc Lehmann [mailto:schmorp@schmorp.de]
> Sent: Thursday, November 19, 2015 8:39 AM
> To: Chao Yu
> Cc: linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] f2fs stability problems keep me from testing
> 
> On Wed, Nov 18, 2015 at 06:00:40PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> > Could you share more info about this issue? like mount option, fsck data,
> > stat info about the two files if they still exist.
> 
> Sure, the mount options are (mkfs.f2fs -s64 -t0 -a0 btw).
> 
>    -onoinline_data,noatime,flush_merge,no_heap
> 
> I ran fsck, and it have all "Ok" messages, I can run it again, but would
> avoid it if all you want is to know whether fsck shows a problem, but can run
> it again if it is of any interest (the disk is in near-constant use):

Thanks for info your sharing.

If fsck shows nothing problem in the disk, still can't see '/cold1/var' in
directory '/cold1' now?

Thanks,

> 
> The two directories have not changed since then:
> 
>    # stat /cold1
>      File: ‘/cold1’
>      Size: 4096            Blocks: 16         IO Block: 4096   directory
>    Device: fc11h/64529d    Inode: 3           Links: 3
>    Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
>    Access: 2015-11-14 02:44:42.000000000 +0100
>    Modify: 2015-11-17 02:36:52.480382105 +0100
>    Change: 2015-11-17 02:36:52.480382105 +0100
>    # stat /cold1/var
>      File: ‘/cold1/var’
>      Size: 4096            Blocks: 16         IO Block: 4096   directory
>    Device: fc11h/64529d    Inode: 956865      Links: 3
>    Access: (0711/drwx--x--x)  Uid: (    0/    root)   Gid: (    0/    root)
>    Access: 2015-11-14 20:33:56.014926084 +0100
>    Modify: 2015-11-14 20:33:56.014926084 +0100
>    Change: 2015-11-14 20:33:58.286917162 +0100
>     Birth: -
> 
> Neither their contents:
> 
>    /cold1:
>    total 8
>    drwx--x--x 3 root root 4096 Nov 14 20:33 var
> 
>    /cold1/var:
>    total 8
>    drwx--x--x 3 root root 4096 Nov 14 20:33 lib
> 
> --
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\


------------------------------------------------------------------------------
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-18 10:00 ` Chao Yu
  2015-11-19  0:38   ` Marc Lehmann
@ 2015-11-19  1:42   ` Marc Lehmann
  1 sibling, 0 replies; 10+ messages in thread
From: Marc Lehmann @ 2015-11-19  1:42 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel

On Wed, Nov 18, 2015 at 06:00:40PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> I ran fsck, and it have all "Ok" messages, I can run it again, but would

Anyways, here is the fsck output - it was made after the reboot, a fsck, and
two mounts, I think (also, kernel was upgraded).

PS: there is a typo ("matcing", don't know if it's still in git).

   Info: sector size = 512
   Info: total sectors = 15628050432 (7630884 MB)
   Info: MKFS version
     "Linux version 3.18.21-031821-generic (kernel@gloin) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) ) #201509020527 SMP Wed Sep 2 05:29:06 UTC 2015"
   Info: FSCK version
     from "Linux version 3.18.24-031824-generic (kernel@gomeisa) (gcc version 4.9.2 (Ubuntu 4.9.2-10ubuntu13) ) #201511031331 SMP Tue Nov 3 18:33:52 UTC 2015"
       to "Linux version 3.18.24-031824-generic (kernel@gomeisa) (gcc version 4.9.2 (Ubuntu 4.9.2-10ubuntu13) ) #201511031331 SMP Tue Nov 3 18:33:52 UTC 2015"
   Info: superblock features = 0 : 
   Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
   Info: total FS sectors = 15628050432 (7630884 MB)
   Info: CKPT version = 14c2
   Info: checkpoint state = 5 :  compacted_summary unmount

   [FSCK] Unreachable nat entries                        [Ok..] [0x0]
   [FSCK] SIT valid block bitmap checking                [Ok..]
   [FSCK] Hard link checking for regular file            [Ok..] [0x0]
   [FSCK] valid_block_count matching with CP             [Ok..] [0x24dad5a5]
   [FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0x9773e]
   [FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0x9773e]
   [FSCK] valid_inode_count matched with CP              [Ok..] [0x21bc]
   [FSCK] free segment_count matched with CP             [Ok..] [0x27ab81]
   [FSCK] next block offset is free                      [Ok..]
   [FSCK] fixing SIT types
   [FSCK] other corrupted bugs                           [Ok..]

   Done.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-19  1:29     ` Chao Yu
@ 2015-11-19  2:23       ` Marc Lehmann
  2015-11-19 20:56         ` Jaegeuk Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Lehmann @ 2015-11-19  2:23 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel

On Thu, Nov 19, 2015 at 09:29:29AM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> Thanks for info your sharing.
> 
> If fsck shows nothing problem in the disk, still can't see '/cold1/var' in
> directory '/cold1' now?

Ah, sorry if that might not have been clear - after a reboot, fsck showed
no errors and all files and directories (apparently) were there, i.e. it
looks like a transient problem, not a problem of the on-disk structure.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-17 17:24 f2fs stability problems keep me from testing Marc Lehmann
  2015-11-18 10:00 ` Chao Yu
@ 2015-11-19  8:04 ` Chao Yu
  2015-11-19 21:01   ` Marc Lehmann
  1 sibling, 1 reply; 10+ messages in thread
From: Chao Yu @ 2015-11-19  8:04 UTC (permalink / raw)
  To: 'Marc Lehmann', linux-f2fs-devel

Hi,

> 1. ls /cold1, find /cold1 (/cold1 is the f2fs mountpoint) gave empty results
>    here is an strace of find /cold1:
> 
>    openat(AT_FDCWD, "/cold1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>    fchdir(5)                               = 0
>    getdents(5, /* 0 entries */, 32768)     = 0

I found an issue here.

If the dir entry of '/cold1/var' is removed from dentry page in '/cold1', at
least we should return entries of '.' and '..' for the getdents invoking, but
we didn't. So here it looks like we keep silent about some kind of error (i.e.
ENOMEM/EIO...) when grabing&updating the dentry page of '/cold1', obviously
it's better to report such error to user rather than ignoring it.

I'd like to send a patch to fix this issue, could you include the following
patch in your f2fs module, so when it reproduces we can catch more details
about this problem.

Thanks,


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-19  2:23       ` Marc Lehmann
@ 2015-11-19 20:56         ` Jaegeuk Kim
  0 siblings, 0 replies; 10+ messages in thread
From: Jaegeuk Kim @ 2015-11-19 20:56 UTC (permalink / raw)
  To: Marc Lehmann; +Cc: linux-f2fs-devel

On Thu, Nov 19, 2015 at 03:23:17AM +0100, Marc Lehmann wrote:
> On Thu, Nov 19, 2015 at 09:29:29AM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> > Thanks for info your sharing.
> > 
> > If fsck shows nothing problem in the disk, still can't see '/cold1/var' in
> > directory '/cold1' now?
> 
> Ah, sorry if that might not have been clear - after a reboot, fsck showed
> no errors and all files and directories (apparently) were there, i.e. it
> looks like a transient problem, not a problem of the on-disk structure.

So, it seems that the system suffers from memory pressure significantly, which
can incur ENOMEM, as I guess.
Can you catch /sys/kernel/debug/f2fs/status at that moment?

Thanks,

> 
> -- 
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-19  8:04 ` Chao Yu
@ 2015-11-19 21:01   ` Marc Lehmann
  2015-11-20  1:48     ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Lehmann @ 2015-11-19 21:01 UTC (permalink / raw)
  To: Chao Yu; +Cc: linux-f2fs-devel

On Thu, Nov 19, 2015 at 04:04:21PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> >    openat(AT_FDCWD, "/cold1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> >    fchdir(5)                               = 0
> >    getdents(5, /* 0 entries */, 32768)     = 0
> 
> I found an issue here.
> 
> If the dir entry of '/cold1/var' is removed from dentry page in '/cold1', at
> least we should return entries of '.' and '..' for the getdents invoking, but
> we didn't. So here it looks like we keep silent about some kind of error (i.e.
> ENOMEM/EIO...) when grabing&updating the dentry page of '/cold1', obviously
> it's better to report such error to user rather than ignoring it.

As for EIO, I would expect the kernel to log something. As for ENOMEM,
unless f2fs tries higher order or zone-specific allocations, that's
unlikely because the box had over 20GB memory free (according to top).

As for other errors, sure, could be.

Would that explain the sync lockup though?

> I'd like to send a patch to fix this issue, could you include the following
> patch in your f2fs module, so when it reproduces we can catch more details
> about this problem.

I can, but I guess you forgot to attach the patch.

On Fri, Nov 20, 2015 at 04:56:14AM +0800, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > Ah, sorry if that might not have been clear - after a reboot, fsck showed
> > no errors and all files and directories (apparently) were there, i.e. it
> > looks like a transient problem, not a problem of the on-disk structure.
> 
> So, it seems that the system suffers from memory pressure significantly, which
> can incur ENOMEM, as I guess.
> Can you catch /sys/kernel/debug/f2fs/status at that moment?

If I can, I will - I didn't attempt to stress the filesystem since then, but
will presdumably start doing tests next week again.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: f2fs stability problems keep me from testing
  2015-11-19 21:01   ` Marc Lehmann
@ 2015-11-20  1:48     ` Chao Yu
  0 siblings, 0 replies; 10+ messages in thread
From: Chao Yu @ 2015-11-20  1:48 UTC (permalink / raw)
  To: 'Marc Lehmann'; +Cc: linux-f2fs-devel

Hi,

> -----Original Message-----
> From: Marc Lehmann [mailto:schmorp@schmorp.de]
> Sent: Friday, November 20, 2015 5:02 AM
> To: Chao Yu
> Cc: linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] f2fs stability problems keep me from testing
> 
> On Thu, Nov 19, 2015 at 04:04:21PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> > >    openat(AT_FDCWD, "/cold1", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> > >    fchdir(5)                               = 0
> > >    getdents(5, /* 0 entries */, 32768)     = 0
> >
> > I found an issue here.
> >
> > If the dir entry of '/cold1/var' is removed from dentry page in '/cold1', at
> > least we should return entries of '.' and '..' for the getdents invoking, but
> > we didn't. So here it looks like we keep silent about some kind of error (i.e.
> > ENOMEM/EIO...) when grabing&updating the dentry page of '/cold1', obviously
> > it's better to report such error to user rather than ignoring it.
> 
> As for EIO, I would expect the kernel to log something. As for ENOMEM,
> unless f2fs tries higher order or zone-specific allocations, that's
> unlikely because the box had over 20GB memory free (according to top).
> 
> As for other errors, sure, could be.
> 
> Would that explain the sync lockup though?

I don't think so, but I hope such error code can supply some kind of clue.

> 
> > I'd like to send a patch to fix this issue, could you include the following
> > patch in your f2fs module, so when it reproduces we can catch more details
> > about this problem.
> 
> I can, but I guess you forgot to attach the patch.

Oh, I just sent it to mailing list, I thought you have received it.

>From ceedb2996a666dddd35834f30bc14bec9ccaad02 Mon Sep 17 00:00:00 2001
From: Chao Yu <chao2.yu@samsung.com>
Date: Thu, 19 Nov 2015 15:36:11 +0800
Subject: [PATCH] f2fs: fix to report error in f2fs_readdir

get_lock_data_page in f2fs_readdir can fail due to a lot of reasons (i.e.
no memory or IO error...), it's better to report this kind of error to
user rather than ignoring it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
---
 fs/f2fs/dir.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index c2b330f..f1de7ee 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -855,8 +855,13 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx)
 
 	for (; n < npages; n++) {
 		dentry_page = get_lock_data_page(inode, n, false);
-		if (IS_ERR(dentry_page))
-			continue;
+		if (IS_ERR(dentry_page)) {
+			err = PTR_ERR(dentry_page);
+			if (err == -ENOENT)
+				continue;
+			else
+				goto out;
+		}
 
 		dentry_blk = kmap(dentry_page);
 
-- 
2.6.3

Thanks,

> 
> On Fri, Nov 20, 2015 at 04:56:14AM +0800, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > > Ah, sorry if that might not have been clear - after a reboot, fsck showed
> > > no errors and all files and directories (apparently) were there, i.e. it
> > > looks like a transient problem, not a problem of the on-disk structure.
> >
> > So, it seems that the system suffers from memory pressure significantly, which
> > can incur ENOMEM, as I guess.
> > Can you catch /sys/kernel/debug/f2fs/status at that moment?
> 
> If I can, I will - I didn't attempt to stress the filesystem since then, but
> will presdumably start doing tests next week again.
> 
> --
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\


------------------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-20  1:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-17 17:24 f2fs stability problems keep me from testing Marc Lehmann
2015-11-18 10:00 ` Chao Yu
2015-11-19  0:38   ` Marc Lehmann
2015-11-19  1:29     ` Chao Yu
2015-11-19  2:23       ` Marc Lehmann
2015-11-19 20:56         ` Jaegeuk Kim
2015-11-19  1:42   ` Marc Lehmann
2015-11-19  8:04 ` Chao Yu
2015-11-19 21:01   ` Marc Lehmann
2015-11-20  1:48     ` Chao Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.