f2fs for SMR drives

linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed

* f2fs for SMR drives
@ 2015-08-08 13:51 Marc Lehmann
  2015-08-10 10:20 ` Chao Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Lehmann @ 2015-08-08 13:51 UTC (permalink / raw)
  To: linux-f2fs-devel

Hi!

Sorry if this is the wrong address to ask about "user problems".

I am currently investigating various filesystems for use on drive-managed SMR
drives (e.g. the seagate 8TB disks). These drives have characteristics not
unlike flash (they want to be written in large batches), but are, of course,
still quite different.

I initially tried btrfs, ext4, xfs which, not unsurprisingly, failed rather
miserably after a few hundred GB, down to ~30mb/s (or 20 in case of btrfs).

I also tried nilfs, which should be an almost perfect match for this
technology, but it performed even worse (I have no clue why, maybe nilfs
skips sectors when writing, which would explain it).

As a last resort, I tried f2fs, which initially performed absolutely great
(average write speed ~130mb/s over multiple terabytes).

However, I am running into a number of problems, and wonder if f2fs can
somehow be configured to work right.

First of all, I did most of my tests on linux-3.18.14, and recently
switched to 4.1.4. The filesystems were formatted with "-s7", the idea
being that writes always occur in 256MB blocks as much as possible, and
most importantly, are freed in 256MB blocks, to keep fragmentation low.

Mount options included noatime or noatime,inline_xattr,inline_data,inline_dentry,flush_merge,extent_cache
(I suspect 4.1.4 doesn't implement flush_merge yet?).

My first problem considers ENOSPC problem - I was happily able to write to a
100% utilized filesystem with cp and rsync continuing to write, not receiving
any error, but no write activity occuring (and the files never ending up on
the filesystem). Is this a known bug?

My second, much bigger problem, considers defragmentation. For testing,
I created a 128GB partition and kept writing an assortment of 200kb -
multiple megabyte files to it. To stress test it, I kept deleting random
files to create holes. after a while (around 84% utilisation), write
performance went down to less than 1MB/s, and is at this leve ever since
for this filesystem.

I kept the filesystem idle for a night to hope for defragmentation, but
nothing happened. Suspecting in-place-updates to be the culprit, I tried
various configurations in the hope of disabling them (such as setting
ipu_policy to 4 or 8, and/or setting min_ipu_util to 0 or 100), but that
also doesn't seem to have any effect whatsoever.

>From the description of f2fs, it seems to be quite close to ideal for these
drives, as it should be possible to write mostly linearly, and keep
fragmentation low by freeing big sequentials sections of data.

Pity that it's so close and then fails so miserably after performing so
admirably initially - can anything be done about this, in way of
configuration, or is my understanding of how f2fs writes and garbage collects
flawed?

Here is the output of /sys/kernel/debug/f2fs/status for the filesysstem in
question. This was after keeping it idle for a night, then unmounting and
remounting the volume. Before the unmount, it had very high values for in
the GC calls section, but no reads have been observed during the night,
just writes (using dstat -Dsdx).

   =====[ partition info(dm-9). #1 ]=====
   [SB: 1] [CP: 2] [SIT: 6] [NAT: 114] [SSA: 130] [MAIN: 65275(OverProv:2094 Resv:1456)]

   Utilization: 84% (27320244 valid blocks)
     - Node: 31936 (Inode: 5027, Other: 26909)
     - Data: 27288308
     - Inline_data Inode: 0
     - Inline_dentry Inode: 0

   Main area: 65275 segs, 9325 secs 9325 zones
     - COLD  data: 12063, 1723, 1723
     - WARM  data: 12075, 1725, 1725
     - HOT   data: 65249, 9321, 9321
     - Dir   dnode: 65269, 9324, 9324
     - File   dnode: 24455, 3493, 3493
     - Indir nodes: 65260, 9322, 9322

     - Valid: 52278
     - Dirty: 9
     - Prefree: 0
     - Free: 12988 (126)

   CP calls: 10843
   GC calls: 91 (BG: 11)
     - data segments : 21 (0)
     - node segments : 70 (0)
   Try to move 30355 blocks (BG: 0)
     - data blocks : 7360 (0)
     - node blocks : 22995 (0)

   Extent Hit Ratio: 8267 / 24892

   Extent Tree Count: 3130

   Extent Node Count: 3138

   Balancing F2FS Async:
     - inmem:    0, wb:    0
     - nodes:    0 in 5672
     - dents:    0 in dirs:   0
     - meta:    0 in 3567
     - NATs:         0/     9757
     - SITs:         0/    65275
     - free_nids:       868

   Distribution of User Blocks: [ valid | invalid | free ]
     [------------------------------------------||--------]

   IPU: 0 blocks
   SSR: 0 blocks in 0 segments
   LFS: 49114 blocks in 95 segments

   BDF: 64, avg. vblocks: 1254

   Memory: 48948 KB
     - static: 11373 KB
     - cached: 619 KB
     - paged : 36956 KB

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: f2fs for SMR drives
  2015-08-08 13:51 f2fs for SMR drives Marc Lehmann
@ 2015-08-10 10:20 ` Chao Yu
  2015-08-10 13:05   ` Marc Lehmann
  0 siblings, 1 reply; 4+ messages in thread
From: Chao Yu @ 2015-08-10 10:20 UTC (permalink / raw)
  To: 'Marc Lehmann'; +Cc: Jaegeuk Kim, linux-f2fs-devel

Hi Marc,

> -----Original Message-----
> From: Marc Lehmann [mailto:schmorp@schmorp.de]
> Sent: Saturday, August 08, 2015 9:51 PM
> To: linux-f2fs-devel@lists.sourceforge.net
> Subject: [f2fs-dev] f2fs for SMR drives
> 
> Hi!
> 
> Sorry if this is the wrong address to ask about "user problems".
> 
> I am currently investigating various filesystems for use on drive-managed SMR
> drives (e.g. the seagate 8TB disks). These drives have characteristics not
> unlike flash (they want to be written in large batches), but are, of course,
> still quite different.
> 
> I initially tried btrfs, ext4, xfs which, not unsurprisingly, failed rather
> miserably after a few hundred GB, down to ~30mb/s (or 20 in case of btrfs).
> 
> I also tried nilfs, which should be an almost perfect match for this
> technology, but it performed even worse (I have no clue why, maybe nilfs
> skips sectors when writing, which would explain it).
> 
> As a last resort, I tried f2fs, which initially performed absolutely great
> (average write speed ~130mb/s over multiple terabytes).
> 
> However, I am running into a number of problems, and wonder if f2fs can
> somehow be configured to work right.
> 
> First of all, I did most of my tests on linux-3.18.14, and recently
> switched to 4.1.4. The filesystems were formatted with "-s7", the idea

'-s7' means that we configure seg_per_sec into 7, so our section size will
be 7 * 2M (segment size) = 14M, so no matter how we configure '-z' (section
number per zone), our allocation unit will not alignment to 256MB, so both
allocation and release unit in f2fs may across zone boundary in SMR driver,
which may cause low performance, is that right?

> being that writes always occur in 256MB blocks as much as possible, and
> most importantly, are freed in 256MB blocks, to keep fragmentation low.
> 
> Mount options included noatime or
> noatime,inline_xattr,inline_data,inline_dentry,flush_merge,extent_cache
> (I suspect 4.1.4 doesn't implement flush_merge yet?).
> 
> My first problem considers ENOSPC problem - I was happily able to write to a
> 100% utilized filesystem with cp and rsync continuing to write, not receiving
> any error, but no write activity occuring (and the files never ending up on
> the filesystem). Is this a known bug?

I have no SMR device, so I have to use hard disk for testing, I can't reproduce
this issue with cp in such device. But for rsync, one thing I note is that:

I use rsync to copy 32g local file to f2fs partition, the partition is with
100% utilized space and with no available block for further allocation. It
took very long time for 'the copy', finally it reported us there is no space.

I did same test with ext4 filesystem, it toke very short time to report us
ENOSPC.

As I investigate, the main flow of copy used by rsync is:
1. open src file
2. create tmp file in dst partition
3. copy data from src file to tmp file
4. rename tmp file to dst

a) In ext4, we reserve space separately for data block and inode, if data
block resource is exhausted (it makes df showing utilization as 100%), in
this partition, we can't write new data, but we can still create file as
creating only grab inode space in ext4, not block space. So this makes rsync
failing in step 3, and return error immediately.
b) In f2fs, we use inode/data block space mixedly, so when data block number
is zero, we can't create any file in f2fs. It makes rsync failing in step 2,
and leads it runs into discard_receive_data function which will still
receiving the whole src file. This makes rsync process keeping writing but
generating no IO in f2fs filesystem.

Fininally, I make one block space in f2fs by removing one file, this makes
f2fs passing step 2 and return error immediately in step 3 like ext4.

Can you please help to check that in your environment the reason of rsync
without returning ENOSPC is the same as above?

If it is not, can you share more details about test steps, io info, and f2fs
status info in debugfs (/sys/kernel/debug/f2fs/status).

> 
> My second, much bigger problem, considers defragmentation. For testing,
> I created a 128GB partition and kept writing an assortment of 200kb -
> multiple megabyte files to it. To stress test it, I kept deleting random
> files to create holes. after a while (around 84% utilisation), write
> performance went down to less than 1MB/s, and is at this leve ever since
> for this filesystem.

IMO, real-timely increasing ratio of below stat value may be helpful to
investigate the degression issue. Can you share us them?

CP calls: 
GC calls: (BG:)
  - data segments : 
  - node segments : 
Try to move blocks (BG:)
  - data blocks :
  - node blocks :
IPU: blocks
SSR: blocks in  segments
LFS: blocks in  segments

Thanks,

> 
> I kept the filesystem idle for a night to hope for defragmentation, but
> nothing happened. Suspecting in-place-updates to be the culprit, I tried
> various configurations in the hope of disabling them (such as setting
> ipu_policy to 4 or 8, and/or setting min_ipu_util to 0 or 100), but that
> also doesn't seem to have any effect whatsoever.
> 
> >From the description of f2fs, it seems to be quite close to ideal for these
> drives, as it should be possible to write mostly linearly, and keep
> fragmentation low by freeing big sequentials sections of data.
> 
> Pity that it's so close and then fails so miserably after performing so
> admirably initially - can anything be done about this, in way of
> configuration, or is my understanding of how f2fs writes and garbage collects
> flawed?
> 
> Here is the output of /sys/kernel/debug/f2fs/status for the filesysstem in
> question. This was after keeping it idle for a night, then unmounting and
> remounting the volume. Before the unmount, it had very high values for in
> the GC calls section, but no reads have been observed during the night,
> just writes (using dstat -Dsdx).
> 
>    =====[ partition info(dm-9). #1 ]=====
>    [SB: 1] [CP: 2] [SIT: 6] [NAT: 114] [SSA: 130] [MAIN: 65275(OverProv:2094 Resv:1456)]
> 
>    Utilization: 84% (27320244 valid blocks)
>      - Node: 31936 (Inode: 5027, Other: 26909)
>      - Data: 27288308
>      - Inline_data Inode: 0
>      - Inline_dentry Inode: 0
> 
>    Main area: 65275 segs, 9325 secs 9325 zones
>      - COLD  data: 12063, 1723, 1723
>      - WARM  data: 12075, 1725, 1725
>      - HOT   data: 65249, 9321, 9321
>      - Dir   dnode: 65269, 9324, 9324
>      - File   dnode: 24455, 3493, 3493
>      - Indir nodes: 65260, 9322, 9322
> 
>      - Valid: 52278
>      - Dirty: 9
>      - Prefree: 0
>      - Free: 12988 (126)
> 
>    CP calls: 10843
>    GC calls: 91 (BG: 11)
>      - data segments : 21 (0)
>      - node segments : 70 (0)
>    Try to move 30355 blocks (BG: 0)
>      - data blocks : 7360 (0)
>      - node blocks : 22995 (0)
> 
>    Extent Hit Ratio: 8267 / 24892
> 
>    Extent Tree Count: 3130
> 
>    Extent Node Count: 3138
> 
>    Balancing F2FS Async:
>      - inmem:    0, wb:    0
>      - nodes:    0 in 5672
>      - dents:    0 in dirs:   0
>      - meta:    0 in 3567
>      - NATs:         0/     9757
>      - SITs:         0/    65275
>      - free_nids:       868
> 
>    Distribution of User Blocks: [ valid | invalid | free ]
>      [------------------------------------------||--------]
> 
>    IPU: 0 blocks
>    SSR: 0 blocks in 0 segments
>    LFS: 49114 blocks in 95 segments
> 
>    BDF: 64, avg. vblocks: 1254
> 
>    Memory: 48948 KB
>      - static: 11373 KB
>      - cached: 619 KB
>      - paged : 36956 KB
> 
> --
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: f2fs for SMR drives
  2015-08-10 10:20 ` Chao Yu
@ 2015-08-10 13:05   ` Marc Lehmann
  2015-08-11 10:40     ` Chao Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Lehmann @ 2015-08-10 13:05 UTC (permalink / raw)
  To: Chao Yu; +Cc: Jaegeuk Kim, linux-f2fs-devel

On Mon, Aug 10, 2015 at 06:20:40PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> '-s7' means that we configure seg_per_sec into 7, so our section size will

Ah, I see, I am a victim of a documentation bug then: According to the
mkfs.f2fs (1.4.0) documentation, -s7 means 256MB (2 * 2**7), so that
explains it.

Good news, I will reset ASAP!

> which may cause low performance, is that right?

Yes, if the documentation is wrong, that would explain the bad performance
of defragmented sections.

> I have no SMR device, so I have to use hard disk for testing, I can't reproduce
> this issue with cp in such device. But for rsync, one thing I note is that:
> 
> I use rsync to copy 32g local file to f2fs partition, the partition is with
> 100% utilized space and with no available block for further allocation. It
> took very long time for 'the copy', finally it reported us there is no space.

Strange. For me, in 3.18.14, I could cp and rsync to a 100% utilized
disk at full (read) speed, but it didn't do any I/O (and the files never
arrived).

That was the same partition that later had the link count mismatches.

> b) In f2fs, we use inode/data block space mixedly, so when data block number
> is zero, we can't create any file in f2fs. It makes rsync failing in step 2,
> and leads it runs into discard_receive_data function which will still
> receiving the whole src file. This makes rsync process keeping writing but
> generating no IO in f2fs filesystem.

I am sorry, that cannot be true - if file creation would fail, then rsync
simply would be unable to write anything, it wouldn't have a valid fd to
write. I also strace'd it, and it successfully open()ed and "write()ed"
AND close()ed the file.

It can only be explained by f2fs neither creating nor writing the file,
without giving an error.

In any case, instead of discarding data, the filesystem should of course
return ENOSPC, as anything else causes data loss.

> Can you please help to check that in your environment the reason of rsync
> without returning ENOSPC is the same as above?

I can already rule it out baseed on API grounds: if file creation fails
(e.g. with ENOSPC), then rsync couldn't have an fd to write data to it.
something else must go on.

The only way for this behaviour to happen is if file creation succeeds
(and wriitng and closing, too - silent data loss).

> If it is not, can you share more details about test steps, io info, and f2fs
> status info in debugfs (/sys/kernel/debug/f2fs/status).

I mounted the partition with -onoatime and no other flags, used cp -Rp to
copy a large tree until the disk utilization was 100% for maybe 20 seconds
according to /sys/kernel/debug/f2fs/status. A bit puzzled, I ^C's cp,
and tried "rsync -avP --append", which took a bit to scan the directory
information, then proceeded to write.

I also don't think rsync --append goes via the temporary file route, but in
any case, I also used rsync -avP, which does.

After writing a few dozen gigabytes (as measured by read data throughput),
I stopped both.

I don't know what you mean with "io info".

Since fsck.f2fs completely destroyed the filesystem, I cannot provide any
more f2fs debug info about it.

> IMO, real-timely increasing ratio of below stat value may be helpful to
> investigate the degression issue. Can you share us them?

I lost this filesystem to corruption as well. I will certainly retry this
test though, and will record these values.

Anyways, thanks a lot for your input so far!

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: f2fs for SMR drives
  2015-08-10 13:05   ` Marc Lehmann
@ 2015-08-11 10:40     ` Chao Yu
  0 siblings, 0 replies; 4+ messages in thread
From: Chao Yu @ 2015-08-11 10:40 UTC (permalink / raw)
  To: 'Marc Lehmann'; +Cc: 'Jaegeuk Kim', linux-f2fs-devel

> -----Original Message-----
> From: Marc Lehmann [mailto:schmorp@schmorp.de]
> Sent: Monday, August 10, 2015 9:06 PM
> To: Chao Yu
> Cc: Jaegeuk Kim; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] f2fs for SMR drives
> 
> On Mon, Aug 10, 2015 at 06:20:40PM +0800, Chao Yu <chao2.yu@samsung.com> wrote:
> > '-s7' means that we configure seg_per_sec into 7, so our section size will
> 
> Ah, I see, I am a victim of a documentation bug then: According to the

Sorry about this, it's out of update, Jaegeuk had already sent the patch for
updating the manual.

> > b) In f2fs, we use inode/data block space mixedly, so when data block number
> > is zero, we can't create any file in f2fs. It makes rsync failing in step 2,
> > and leads it runs into discard_receive_data function which will still
> > receiving the whole src file. This makes rsync process keeping writing but
> > generating no IO in f2fs filesystem.
> 
> I am sorry, that cannot be true - if file creation would fail, then rsync
> simply would be unable to write anything, it wouldn't have a valid fd to
> write. I also strace'd it, and it successfully open()ed and "write()ed"
> AND close()ed the file.

[test in f2fs]
I try to use rsync to write 32g file located in ext4 partition into f2fs
partition (32g file is not exist in f2fs partition), f2fs partition is
100% utilized and with no available blocks.

Command: strace rsync -avP --append 32g /mnt/f2fs/32g

This is strace info when reproducing this issue:

read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 262144) = 262144
select(5, NULL, [4], [4], {60, 0})      = 1 (out [4], left {59, 999998})
write(4, "\374\17\0\7", 4)              = 4
select(5, NULL, [4], [4], {60, 0})      = 1 (out [4], left {59, 999999})
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4092) = 4092
select(5, NULL, [4], [4], {60, 0})      = 1 (out [4], left {59, 999999})
write(4, "\374\17\0\7", 4)              = 4
select(5, NULL, [4], [4], {60, 0})      = 1 (out [4], left {59, 999999})
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4092) = 4092

This is just part of message, the whole message is almost fully filled with
the [read, select, write] combination. From this information we can see that
rsync program reads from file with fd:3, and then writes to file with fd:4.

root@fs:/# ps -ef|grep rsync
root     16912  3246 73 13:20 pts/3    00:00:05 rsync -avP --append 32g /mnt/ext4/32g
root     16913 16912  0 13:20 pts/3    00:00:00 rsync -avP --append 32g /mnt/ext4/32g
root     16914 16913 66 13:20 pts/3    00:00:04 rsync -avP --append 32g /mnt/ext4/32g
root     16932 32494  0 13:21 pts/1    00:00:00 grep --color=auto rsync
root@fs:/# ll -l /proc/16912/fd/
total 0
dr-x------ 2 root root  0 Aug 11 13:21 ./
dr-xr-xr-x 9 root root  0 Aug 11 13:21 ../
lrwx------ 1 root root 64 Aug 11 13:21 0 -> /dev/pts/3
lrwx------ 1 root root 64 Aug 11 13:21 1 -> /dev/pts/3
lrwx------ 1 root root 64 Aug 11 13:21 2 -> /dev/pts/3
lr-x------ 1 root root 64 Aug 11 13:21 3 -> /home/yuchao/32g
lrwx------ 1 root root 64 Aug 11 13:21 4 -> socket:[3190839]
lrwx------ 1 root root 64 Aug 11 13:21 5 -> socket:[3190840]

Here, the information can indicate that the file with fd:3 is source file
which path is /home/yuchao/32g, and the file with fd:4 is a socket file.

I guess rsync is designed as a C/S structure, in rsync, client part write
out data of source file through socket, and then server part will receive
these data, finally server part will write the data to destined file in
server.

This can explain why we can see so many 'write' ops in strace information.

root@fs:/# ll -l /proc/16913/fd/
total 0
dr-x------ 2 root root  0 Aug 11 13:21 ./
dr-xr-xr-x 9 root root  0 Aug 11 13:21 ../
lrwx------ 1 root root 64 Aug 11 13:21 1 -> socket:[3190841]
lrwx------ 1 root root 64 Aug 11 13:21 2 -> /dev/pts/3
lrwx------ 1 root root 64 Aug 11 13:21 3 -> socket:[3191842]
root@fs:/# ll -l /proc/16914/fd/
total 0
dr-x------ 2 root root  0 Aug 11 13:21 ./
dr-xr-xr-x 9 root root  0 Aug 11 13:21 ../
lrwx------ 1 root root 64 Aug 11 13:21 0 -> socket:[3190838]
lrwx------ 1 root root 64 Aug 11 13:21 2 -> /dev/pts/3
lrwx------ 1 root root 64 Aug 11 13:21 4 -> socket:[3191843]

[test in ext4]
Then I try to reproduce the issue in ext4 for verification, since I
think if the issue is due to copy model of rsync, ext4 may have the
same issue.

I make an ext4 partition with no available block and no free inode:

root@fs:/home/yuchao/mycode# stat -f /mnt/ext4
  File: "/mnt/ext4"
    ID: b391327e3c675398 Namelen: 255     Type: ext2/ext3
Block size: 1024       Fundamental block size: 1024
Blocks: Total: 122835     Free: 7709       Available: 0
Inodes: Total: 32768      Free: 0

I run rsync with both no option and '-avP --append', we can see the
result below:

root@fs:/home/yuchao # time rsync 32g /mnt/ext4/32g
rsync: mkstemp "/mnt/ext4/.32g.MGVSYA" failed: No space left on device (28)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]

real    2m50.020s
user    2m3.820s
sys     0m45.496s

root@fs:/home/yuchao # time rsync -avP --append 32g /mnt/ext4/32g
sending incremental file list
32g
 34359738368 100%  192.13MB/s    0:02:50 (xfer#1, to-check=0/1)
rsync: open "/mnt/ext4/32g" failed: No space left on device (28)

sent 34363932741 bytes  received 31 bytes  200372785.84 bytes/sec
total size is 34359738368  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]

real    2m50.597s
user    2m4.728s
sys     0m44.796s

So the result shows that:
a) like in f2fs, in ext4 it aslo takes rsync long time to finish the copy,
it looks like we can still keep writing at full fs without reporting ENOSPC;
b) when we fail to create target file in fs, it does not report "No space
left on device (28)" message as soon as possible, but report it when ending
up.

> 
> It can only be explained by f2fs neither creating nor writing the file,
> without giving an error.

I add printks in f2fs_create for tracing, actually I have seen that f2fs
had returned ENOSPC to user space. And also I enable trace in f2fs, I didn't
see any writes are triggered by regular file in f2fs.

> 
> In any case, instead of discarding data, the filesystem should of course
> return ENOSPC, as anything else causes data loss.
> 
> > Can you please help to check that in your environment the reason of rsync
> > without returning ENOSPC is the same as above?
> 
> I can already rule it out baseed on API grounds: if file creation fails
> (e.g. with ENOSPC), then rsync couldn't have an fd to write data to it.
> something else must go on.
> 
> The only way for this behaviour to happen is if file creation succeeds
> (and wriitng and closing, too - silent data loss).
> 
> > If it is not, can you share more details about test steps, io info, and f2fs
> > status info in debugfs (/sys/kernel/debug/f2fs/status).
> 
> I mounted the partition with -onoatime and no other flags, used cp -Rp to
> copy a large tree until the disk utilization was 100% for maybe 20 seconds
> according to /sys/kernel/debug/f2fs/status. A bit puzzled, I ^C's cp,
> and tried "rsync -avP --append", which took a bit to scan the directory
> information, then proceeded to write.
> 
> I also don't think rsync --append goes via the temporary file route, but in
> any case, I also used rsync -avP, which does.
> 
> After writing a few dozen gigabytes (as measured by read data throughput),
> I stopped both.

Thank you for all detail description about the test, it really helps. :)

IMO, it's better to wait for the end of the test instead of stopping them,
so perhaps we can see the result message of rsync which may indicate that
why our target file is not generated.

> 
> I don't know what you mean with "io info".

Sorry, I mean iostat information.

Thanks,

> 
> Since fsck.f2fs completely destroyed the filesystem, I cannot provide any
> more f2fs debug info about it.
> 
> > IMO, real-timely increasing ratio of below stat value may be helpful to
> > investigate the degression issue. Can you share us them?
> 
> I lost this filesystem to corruption as well. I will certainly retry this
> test though, and will record these values.
> 
> Anyways, thanks a lot for your input so far!
> 
> --
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-11 10:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-08 13:51 f2fs for SMR drives Marc Lehmann
2015-08-10 10:20 ` Chao Yu
2015-08-10 13:05   ` Marc Lehmann
2015-08-11 10:40     ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).