Data lost in Android app for not write new checkpoint

linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed

* Data lost in Android app for not write new checkpoint
@ 2015-07-31  2:28 He YunLei
  2015-07-31  6:18 ` Chao Yu
  2015-07-31 10:49 ` Chao Yu
  0 siblings, 2 replies; 10+ messages in thread
From: He YunLei @ 2015-07-31  2:28 UTC (permalink / raw)
  To: linux-f2fs-devel, Jaegeuk Kim

Hi all,
	Recently I did some test with f2fs on my Android phone, and found a problem
which I didn't know how to tackle it.
	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
in debugfs.
	
CP calls: 10
GC calls: 19 (BG: 19)
   - data segments : 19 (19)
   - node segments : 0 (0)

	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
on the phone, it also has triggered 19 times background GC.

******

Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again

******

CP calls: 10
GC calls: 20 (BG: 20)
   - data segments : 20 (20)
   - node segments : 0 (0)
	
	there is no change in CP calls number and background GC doesn't write new checkpoint.
if then a sudden power failure or system crash occur, the photos will be lost when the phone
restart, and a sync before crash will avoid the data lost.
	I think this problem is bad for user experience of using Android phone with f2fs.
How do we deal with such situation? I wish you and other developers in this list could help
me in a correct way.

Thanks,
He



------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-07-31  2:28 Data lost in Android app for not write new checkpoint He YunLei
@ 2015-07-31  6:18 ` Chao Yu
  2015-07-31 10:49 ` Chao Yu
  1 sibling, 0 replies; 10+ messages in thread
From: Chao Yu @ 2015-07-31  6:18 UTC (permalink / raw)
  To: 'He YunLei', 'Jaegeuk Kim'; +Cc: linux-f2fs-devel

Hi Yunlei,

> -----Original Message-----
> From: He YunLei [mailto:heyunlei@huawei.com]
> Sent: Friday, July 31, 2015 10:29 AM
> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
> 
> Hi all,
> 	Recently I did some test with f2fs on my Android phone, and found a problem
> which I didn't know how to tackle it.
> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
> in debugfs.
> 
> CP calls: 10
> GC calls: 19 (BG: 19)
>    - data segments : 19 (19)
>    - node segments : 0 (0)
> 
> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
> on the phone, it also has triggered 19 times background GC.
> 
> ******
> 
> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
> 
> ******
> 
> CP calls: 10
> GC calls: 20 (BG: 20)
>    - data segments : 20 (20)
>    - node segments : 0 (0)
> 
> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
> if then a sudden power failure or system crash occur, the photos will be lost when the phone
> restart, and a sync before crash will avoid the data lost.

One way to keep data consistent is invoking fsync() after writing, after that, data will be 
recoverable even though there occurs abnormal pow-cut or kernel crash.

So can you help to check whether the camera app *fsync* the photo file to storage for persistence?
If there is no such invoking, our photos may be lost, otherwise there may be a bug in f2fs.

BTW, checkpoint can only keep persistence with FS metadata (data in META/NODE inode), metadata/data
of directory inode and metadata of other inode, but not the data of regular/symlink inode which is
in memory. So seems simply calling a checkpoint doesn't help.

Thanks,

> 	I think this problem is bad for user experience of using Android phone with f2fs.
> How do we deal with such situation? I wish you and other developers in this list could help
> me in a correct way.
> 
> Thanks,
> He



------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-07-31  2:28 Data lost in Android app for not write new checkpoint He YunLei
  2015-07-31  6:18 ` Chao Yu
@ 2015-07-31 10:49 ` Chao Yu
  2015-07-31 12:00   ` Bintian
  2015-08-04 13:16   ` He YunLei
  1 sibling, 2 replies; 10+ messages in thread
From: Chao Yu @ 2015-07-31 10:49 UTC (permalink / raw)
  To: 'Bintian', 'He YunLei', 'Jaegeuk Kim'
  Cc: linux-f2fs-devel

Hi Bintian,

> -----Original Message-----
> From: He YunLei [mailto:heyunlei@huawei.com]
> Sent: Friday, July 31, 2015 10:29 AM
> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
> 
> Hi all,
> 	Recently I did some test with f2fs on my Android phone, and found a problem
> which I didn't know how to tackle it.
> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
> in debugfs.
> 
> CP calls: 10
> GC calls: 19 (BG: 19)
>    - data segments : 19 (19)
>    - node segments : 0 (0)
> 
> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
> on the phone, it also has triggered 19 times background GC.
> 
> ******
> 
> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
> 
> ******
> 
> CP calls: 10
> GC calls: 20 (BG: 20)
>    - data segments : 20 (20)
>    - node segments : 0 (0)
> 
> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
> if then a sudden power failure or system crash occur, the photos will be lost when the phone
> restart, and a sync before crash will avoid the data lost.
> 	I think this problem is bad for user experience of using Android phone with f2fs.
> How do we deal with such situation? I wish you and other developers in this list could help
> me in a correct way.

IMO, it's better to figure out whether this is a bug of f2fs first or not.

You can enable some traces in f2fs to see whether fsync is called or not.

enable trace by:
echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
print trace by:
cat /sys/kernel/debug/tracing/trace

If fsync is not be called, I think in ext4 there must be the same problem,
but I guess fortunately journal commit thread save its data since it commit
transaction per 5 second by default. You can try to configure (commit=nrsec)
it with larger value for verification the issue with ext4 filesystem.

As a quick thought, maybe we can add one commit data thread, periodically
writebacking user data written by user previously, then do checkpoint for
persistence.

So by this way, at most, we just lose our data for last configured time of
commit period.

Thanks,

> 
> Thanks,
> He



------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-07-31 10:49 ` Chao Yu
@ 2015-07-31 12:00   ` Bintian
  2015-08-04 13:16   ` He YunLei
  1 sibling, 0 replies; 10+ messages in thread
From: Bintian @ 2015-07-31 12:00 UTC (permalink / raw)
  To: Chao Yu, 'He YunLei', 'Jaegeuk Kim'; +Cc: linux-f2fs-devel

Thank you Chao, I will do the test based on your suggestion.

Thanks,

Bintian

On 2015/7/31 18:49, Chao Yu wrote:
> Hi Bintian,
>
>> -----Original Message-----
>> From: He YunLei [mailto:heyunlei@huawei.com]
>> Sent: Friday, July 31, 2015 10:29 AM
>> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
>> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
>> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
>>
>> Hi all,
>> 	Recently I did some test with f2fs on my Android phone, and found a problem
>> which I didn't know how to tackle it.
>> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
>> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
>> in debugfs.
>>
>> CP calls: 10
>> GC calls: 19 (BG: 19)
>>     - data segments : 19 (19)
>>     - node segments : 0 (0)
>>
>> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
>> on the phone, it also has triggered 19 times background GC.
>>
>> ******
>>
>> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
>>
>> ******
>>
>> CP calls: 10
>> GC calls: 20 (BG: 20)
>>     - data segments : 20 (20)
>>     - node segments : 0 (0)
>>
>> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
>> if then a sudden power failure or system crash occur, the photos will be lost when the phone
>> restart, and a sync before crash will avoid the data lost.
>> 	I think this problem is bad for user experience of using Android phone with f2fs.
>> How do we deal with such situation? I wish you and other developers in this list could help
>> me in a correct way.
>
> IMO, it's better to figure out whether this is a bug of f2fs first or not.
>
> You can enable some traces in f2fs to see whether fsync is called or not.
>
> enable trace by:
> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
> print trace by:
> cat /sys/kernel/debug/tracing/trace
>
> If fsync is not be called, I think in ext4 there must be the same problem,
> but I guess fortunately journal commit thread save its data since it commit
> transaction per 5 second by default. You can try to configure (commit=nrsec)
> it with larger value for verification the issue with ext4 filesystem.
>
> As a quick thought, maybe we can add one commit data thread, periodically
> writebacking user data written by user previously, then do checkpoint for
> persistence.
>
> So by this way, at most, we just lose our data for last configured time of
> commit period.
>
> Thanks,
>
>>
>> Thanks,
>> He
>
>
>
> .
>


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-07-31 10:49 ` Chao Yu
  2015-07-31 12:00   ` Bintian
@ 2015-08-04 13:16   ` He YunLei
  2015-08-04 18:29     ` Jaegeuk Kim
  2015-08-06 10:17     ` Chao Yu
  1 sibling, 2 replies; 10+ messages in thread
From: He YunLei @ 2015-08-04 13:16 UTC (permalink / raw)
  To: Chao Yu; +Cc: 'Jaegeuk Kim', linux-f2fs-devel

On 2015/7/31 18:49, Chao Yu wrote:
> Hi Bintian,
>
>> -----Original Message-----
>> From: He YunLei [mailto:heyunlei@huawei.com]
>> Sent: Friday, July 31, 2015 10:29 AM
>> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
>> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
>> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
>>
>> Hi all,
>> 	Recently I did some test with f2fs on my Android phone, and found a problem
>> which I didn't know how to tackle it.
>> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
>> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
>> in debugfs.
>>
>> CP calls: 10
>> GC calls: 19 (BG: 19)
>>     - data segments : 19 (19)
>>     - node segments : 0 (0)
>>
>> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
>> on the phone, it also has triggered 19 times background GC.
>>
>> ******
>>
>> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
>>
>> ******
>>
>> CP calls: 10
>> GC calls: 20 (BG: 20)
>>     - data segments : 20 (20)
>>     - node segments : 0 (0)
>>
>> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
>> if then a sudden power failure or system crash occur, the photos will be lost when the phone
>> restart, and a sync before crash will avoid the data lost.
>> 	I think this problem is bad for user experience of using Android phone with f2fs.
>> How do we deal with such situation? I wish you and other developers in this list could help
>> me in a correct way.
>
> IMO, it's better to figure out whether this is a bug of f2fs first or not.
>
> You can enable some traces in f2fs to see whether fsync is called or not.
>
> enable trace by:
> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
> print trace by:
> cat /sys/kernel/debug/tracing/trace
>
> If fsync is not be called, I think in ext4 there must be the same problem,
> but I guess fortunately journal commit thread save its data since it commit
> transaction per 5 second by default. You can try to configure (commit=nrsec)
> it with larger value for verification the issue with ext4 filesystem.
>

I enable the event xxx_sync_file_enter both in f2fs and ext4, and find neither of
them was triggered by photo files.

Then I try f2fs_writepages and ext4_da_write_pages:

    ino     file_name
				
    65573   IMG_20150804_031619.jpg
    65575   IMG_20150804_031619_1.jpg
    65576   IMG_20150804_031620.jpg
    65577   IMG_20150804_031620_1.jpg

  ext4_da_write_pages: dev 259,0 ino 65573 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
  ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 2408448 b_state 0x0221 first_page 0 io_done 1 pages_written 588 sync_mode 0
  ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
  ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 2428928 b_state 0x0221 first_page 0 io_done 1 pages_written 593 sync_mode 0
  ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
  ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 2383872 b_state 0x0221 first_page 0 io_done 1 pages_written 582 sync_mode 0
  ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0

f2fs_writepages doesn't appear in the test of f2fs

I also try modify commit=300(default 5), but it doesn't work. Maybe somewhere else in ext4
launch the ext4_da_write_pages operation.

At the end, I try to mount f2fs with disable_roll_forward, when system reboot, the f2fs is inconsistent,
there are several failed check items in fsck.

Thanks,
He

> As a quick thought, maybe we can add one commit data thread, periodically
> writebacking user data written by user previously, then do checkpoint for
> persistence.
>
> So by this way, at most, we just lose our data for last configured time of
> commit period.
>
> Thanks,
>
>>
>> Thanks,
>> He
>
>
>
> .
>


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-08-04 13:16   ` He YunLei
@ 2015-08-04 18:29     ` Jaegeuk Kim
  2015-08-06 10:17     ` Chao Yu
  1 sibling, 0 replies; 10+ messages in thread
From: Jaegeuk Kim @ 2015-08-04 18:29 UTC (permalink / raw)
  To: He YunLei; +Cc: linux-f2fs-devel

Hi He,

On Tue, Aug 04, 2015 at 09:16:21PM +0800, He YunLei wrote:
> On 2015/7/31 18:49, Chao Yu wrote:
> > Hi Bintian,
> >
> >> -----Original Message-----
> >> From: He YunLei [mailto:heyunlei@huawei.com]
> >> Sent: Friday, July 31, 2015 10:29 AM
> >> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
> >> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
> >> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
> >>
> >> Hi all,
> >> 	Recently I did some test with f2fs on my Android phone, and found a problem
> >> which I didn't know how to tackle it.
> >> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
> >> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
> >> in debugfs.
> >>
> >> CP calls: 10
> >> GC calls: 19 (BG: 19)
> >>     - data segments : 19 (19)
> >>     - node segments : 0 (0)
> >>
> >> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
> >> on the phone, it also has triggered 19 times background GC.
> >>
> >> ******
> >>
> >> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
> >>
> >> ******
> >>
> >> CP calls: 10
> >> GC calls: 20 (BG: 20)
> >>     - data segments : 20 (20)
> >>     - node segments : 0 (0)
> >>
> >> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
> >> if then a sudden power failure or system crash occur, the photos will be lost when the phone
> >> restart, and a sync before crash will avoid the data lost.
> >> 	I think this problem is bad for user experience of using Android phone with f2fs.
> >> How do we deal with such situation? I wish you and other developers in this list could help
> >> me in a correct way.
> >
> > IMO, it's better to figure out whether this is a bug of f2fs first or not.
> >
> > You can enable some traces in f2fs to see whether fsync is called or not.
> >
> > enable trace by:
> > echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
> > echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
> > print trace by:
> > cat /sys/kernel/debug/tracing/trace
> >
> > If fsync is not be called, I think in ext4 there must be the same problem,
> > but I guess fortunately journal commit thread save its data since it commit
> > transaction per 5 second by default. You can try to configure (commit=nrsec)
> > it with larger value for verification the issue with ext4 filesystem.
> >
> 
> I enable the event xxx_sync_file_enter both in f2fs and ext4, and find neither of
> them was triggered by photo files.
> 
> Then I try f2fs_writepages and ext4_da_write_pages:
> 
>     ino     file_name
> 				
>     65573   IMG_20150804_031619.jpg
>     65575   IMG_20150804_031619_1.jpg
>     65576   IMG_20150804_031620.jpg
>     65577   IMG_20150804_031620_1.jpg
> 
>   ext4_da_write_pages: dev 259,0 ino 65573 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 2408448 b_state 0x0221 first_page 0 io_done 1 pages_written 588 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 2428928 b_state 0x0221 first_page 0 io_done 1 pages_written 593 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 2383872 b_state 0x0221 first_page 0 io_done 1 pages_written 582 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0 io_done 0 pages_written 0 sync_mode 0
> 
> f2fs_writepages doesn't appear in the test of f2fs

Could you check the submitted IOs from block layer?

> 
> I also try modify commit=300(default 5), but it doesn't work. Maybe somewhere else in ext4
> launch the ext4_da_write_pages operation.
> 
> At the end, I try to mount f2fs with disable_roll_forward, when system reboot, the f2fs is inconsistent,
> there are several failed check items in fsck.

The disable_roll_forward should not be related to any inconsistency.
Could you share the log provided by fsck?

Thanks,

> 
> Thanks,
> He
> 
> > As a quick thought, maybe we can add one commit data thread, periodically
> > writebacking user data written by user previously, then do checkpoint for
> > persistence.
> >
> > So by this way, at most, we just lose our data for last configured time of
> > commit period.
> >
> > Thanks,
> >
> >>
> >> Thanks,
> >> He
> >
> >
> >
> > .
> >
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-08-04 13:16   ` He YunLei
  2015-08-04 18:29     ` Jaegeuk Kim
@ 2015-08-06 10:17     ` Chao Yu
  2015-08-07  6:26       ` He YunLei
  1 sibling, 1 reply; 10+ messages in thread
From: Chao Yu @ 2015-08-06 10:17 UTC (permalink / raw)
  To: 'He YunLei'; +Cc: 'Jaegeuk Kim', linux-f2fs-devel

> -----Original Message-----
> From: He YunLei [mailto:heyunlei@huawei.com]
> Sent: Tuesday, August 04, 2015 9:16 PM
> To: Chao Yu
> Cc: 'Bintian'; 'Jaegeuk Kim'; cm224.lee@samsung.com; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] Data lost in Android app for not write new checkpoint
> 
> On 2015/7/31 18:49, Chao Yu wrote:
> > Hi Bintian,
> >
> >> -----Original Message-----
> >> From: He YunLei [mailto:heyunlei@huawei.com]
> >> Sent: Friday, July 31, 2015 10:29 AM
> >> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
> >> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
> >> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
> >>
> >> Hi all,
> >> 	Recently I did some test with f2fs on my Android phone, and found a problem
> >> which I didn't know how to tackle it.
> >> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
> >> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
> >> in debugfs.
> >>
> >> CP calls: 10
> >> GC calls: 19 (BG: 19)
> >>     - data segments : 19 (19)
> >>     - node segments : 0 (0)
> >>
> >> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
> >> on the phone, it also has triggered 19 times background GC.
> >>
> >> ******
> >>
> >> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
> >>
> >> ******
> >>
> >> CP calls: 10
> >> GC calls: 20 (BG: 20)
> >>     - data segments : 20 (20)
> >>     - node segments : 0 (0)
> >>
> >> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
> >> if then a sudden power failure or system crash occur, the photos will be lost when the phone
> >> restart, and a sync before crash will avoid the data lost.
> >> 	I think this problem is bad for user experience of using Android phone with f2fs.
> >> How do we deal with such situation? I wish you and other developers in this list could help
> >> me in a correct way.
> >
> > IMO, it's better to figure out whether this is a bug of f2fs first or not.
> >
> > You can enable some traces in f2fs to see whether fsync is called or not.
> >
> > enable trace by:
> > echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
> > echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
> > print trace by:
> > cat /sys/kernel/debug/tracing/trace
> >
> > If fsync is not be called, I think in ext4 there must be the same problem,
> > but I guess fortunately journal commit thread save its data since it commit
> > transaction per 5 second by default. You can try to configure (commit=nrsec)
> > it with larger value for verification the issue with ext4 filesystem.
> >
> 
> I enable the event xxx_sync_file_enter both in f2fs and ext4, and find neither of
> them was triggered by photo files.
> 
> Then I try f2fs_writepages and ext4_da_write_pages:
> 
>     ino     file_name
> 
>     65573   IMG_20150804_031619.jpg
>     65575   IMG_20150804_031619_1.jpg
>     65576   IMG_20150804_031620.jpg
>     65577   IMG_20150804_031620_1.jpg
> 
>   ext4_da_write_pages: dev 259,0 ino 65573 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
> io_done 0 pages_written 0 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 2408448 b_state 0x0221 first_page
> 0 io_done 1 pages_written 588 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
> io_done 0 pages_written 0 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 2428928 b_state 0x0221 first_page
> 0 io_done 1 pages_written 593 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
> io_done 0 pages_written 0 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 2383872 b_state 0x0221 first_page
> 0 io_done 1 pages_written 582 sync_mode 0
>   ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
> io_done 0 pages_written 0 sync_mode 0
> 
> f2fs_writepages doesn't appear in the test of f2fs

Weird, was IO triggered from DIO/reclaim path? As Jaegeuk said, it's better
to check the IOs in block layer.

> 
> I also try modify commit=300(default 5), but it doesn't work. Maybe somewhere else in ext4
> launch the ext4_da_write_pages operation.

Maybe it's triggered by bdi flusher, can you try to configure parameters
under /proc/sys/vm/ e.g. dirty_writeback_centisecs/dirty_background_ratio
for delaying ->writepages in ext4?

> 
> At the end, I try to mount f2fs with disable_roll_forward, when system reboot, the f2fs is
> inconsistent,
> there are several failed check items in fsck.

Can you share the log?

Thanks,

> 
> Thanks,
> He
> 
> > As a quick thought, maybe we can add one commit data thread, periodically
> > writebacking user data written by user previously, then do checkpoint for
> > persistence.
> >
> > So by this way, at most, we just lose our data for last configured time of
> > commit period.
> >
> > Thanks,
> >
> >>
> >> Thanks,
> >> He
> >
> >
> >
> > .
> >


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-08-06 10:17     ` Chao Yu
@ 2015-08-07  6:26       ` He YunLei
  2015-08-07  9:18         ` Chao Yu
  2015-08-07  9:50         ` Chao Yu
  0 siblings, 2 replies; 10+ messages in thread
From: He YunLei @ 2015-08-07  6:26 UTC (permalink / raw)
  To: Chao Yu; +Cc: 'Jaegeuk Kim', linux-f2fs-devel

On 2015/8/6 18:17, Chao Yu wrote:
>> -----Original Message-----
>> From: He YunLei [mailto:heyunlei@huawei.com]
>> Sent: Tuesday, August 04, 2015 9:16 PM
>> To: Chao Yu
>> Cc: 'Bintian'; 'Jaegeuk Kim'; cm224.lee@samsung.com; linux-f2fs-devel@lists.sourceforge.net
>> Subject: Re: [f2fs-dev] Data lost in Android app for not write new checkpoint
>>
>> On 2015/7/31 18:49, Chao Yu wrote:
>>> Hi Bintian,
>>>
>>>> -----Original Message-----
>>>> From: He YunLei [mailto:heyunlei@huawei.com]
>>>> Sent: Friday, July 31, 2015 10:29 AM
>>>> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
>>>> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
>>>> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
>>>>
>>>> Hi all,
>>>> 	Recently I did some test with f2fs on my Android phone, and found a problem
>>>> which I didn't know how to tackle it.
>>>> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
>>>> phone just started, I check the f2fs status by reading the file /sys/kernel/debug/f2fs/status
>>>> in debugfs.
>>>>
>>>> CP calls: 10
>>>> GC calls: 19 (BG: 19)
>>>>      - data segments : 19 (19)
>>>>      - node segments : 0 (0)
>>>>
>>>> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
>>>> on the phone, it also has triggered 19 times background GC.
>>>>
>>>> ******
>>>>
>>>> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status again
>>>>
>>>> ******
>>>>
>>>> CP calls: 10
>>>> GC calls: 20 (BG: 20)
>>>>      - data segments : 20 (20)
>>>>      - node segments : 0 (0)
>>>>
>>>> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
>>>> if then a sudden power failure or system crash occur, the photos will be lost when the phone
>>>> restart, and a sync before crash will avoid the data lost.
>>>> 	I think this problem is bad for user experience of using Android phone with f2fs.
>>>> How do we deal with such situation? I wish you and other developers in this list could help
>>>> me in a correct way.
>>>
>>> IMO, it's better to figure out whether this is a bug of f2fs first or not.
>>>
>>> You can enable some traces in f2fs to see whether fsync is called or not.
>>>
>>> enable trace by:
>>> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
>>> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
>>> print trace by:
>>> cat /sys/kernel/debug/tracing/trace
>>>
>>> If fsync is not be called, I think in ext4 there must be the same problem,
>>> but I guess fortunately journal commit thread save its data since it commit
>>> transaction per 5 second by default. You can try to configure (commit=nrsec)
>>> it with larger value for verification the issue with ext4 filesystem.
>>>
>>
>> I enable the event xxx_sync_file_enter both in f2fs and ext4, and find neither of
>> them was triggered by photo files.
>>
>> Then I try f2fs_writepages and ext4_da_write_pages:
>>
>>      ino     file_name
>>
>>      65573   IMG_20150804_031619.jpg
>>      65575   IMG_20150804_031619_1.jpg
>>      65576   IMG_20150804_031620.jpg
>>      65577   IMG_20150804_031620_1.jpg
>>
>>    ext4_da_write_pages: dev 259,0 ino 65573 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
>> io_done 0 pages_written 0 sync_mode 0
>>    ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 2408448 b_state 0x0221 first_page
>> 0 io_done 1 pages_written 588 sync_mode 0
>>    ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
>> io_done 0 pages_written 0 sync_mode 0
>>    ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 2428928 b_state 0x0221 first_page
>> 0 io_done 1 pages_written 593 sync_mode 0
>>    ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
>> io_done 0 pages_written 0 sync_mode 0
>>    ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 2383872 b_state 0x0221 first_page
>> 0 io_done 1 pages_written 582 sync_mode 0
>>    ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 0 b_state 0x0000 first_page 0
>> io_done 0 pages_written 0 sync_mode 0
>>
>> f2fs_writepages doesn't appear in the test of f2fs
>
> Weird, was IO triggered from DIO/reclaim path? As Jaegeuk said, it's better
> to check the IOs in block layer.
>
   I am sorry that I leave out f2fs_writepages message for the reason of huge trace log. I repeat the test
several times and now make sure f2fs_writepages is triggered but very little compare to ext4.

   Another problem is that roll_forward recovery  can just resume writeback files users fsynced , not including
files whose pages written back by bdi flusher ?

>>
>> I also try modify commit=300(default 5), but it doesn't work. Maybe somewhere else in ext4
>> launch the ext4_da_write_pages operation.
>
> Maybe it's triggered by bdi flusher, can you try to configure parameters
> under /proc/sys/vm/ e.g. dirty_writeback_centisecs/dirty_background_ratio
> for delaying ->writepages in ext4?
>
>>
>> At the end, I try to mount f2fs with disable_roll_forward, when system reboot, the f2fs is
>> inconsistent,
>> there are several failed check items in fsck.
>
> Can you share the log?

The log is below:

[FSCK] Unreachable nat entries                        [Fail] [0x64b]
[FSCK] SIT valid block bitmap checking                [Fail]
[FSCK] Hard link checking for regular file            [Ok..] [0x0]
[FSCK] valid_block_count matching with CP             [Ok..] [0x579b6]
[FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0x7b0]
[FSCK] valid_node_count matcing with CP (nat lookup)  [Fail] [0xdfb]
[FSCK] valid_inode_count matched with CP              [Ok..] [0x664]
[FSCK] free segment_count matched with CP             [Ok..] [0x238]
[FSCK] next block offset is free                      [Ok..]
[FSCK] other corrupted bugs                           [Fail]

I repeat the test about 5 times, the fsck failed just one time.
When I use disable_roll_forward mount option, I find some photos don't lose occasionally.
There are also some incomplete photo files exit on my photo.  Does roll_forward recovery
think pages written back by bdi flusher is unreliable, and clean them ?

Thanks,
He

>
> Thanks,
>
>>
>> Thanks,
>> He
>>
>>> As a quick thought, maybe we can add one commit data thread, periodically
>>> writebacking user data written by user previously, then do checkpoint for
>>> persistence.
>>>
>>> So by this way, at most, we just lose our data for last configured time of
>>> commit period.
>>>
>>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>> He
>>>
>>>
>>>
>>> .
>>>
>
>
> .
>


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-08-07  6:26       ` He YunLei
@ 2015-08-07  9:18         ` Chao Yu
  2015-08-07  9:50         ` Chao Yu
  1 sibling, 0 replies; 10+ messages in thread
From: Chao Yu @ 2015-08-07  9:18 UTC (permalink / raw)
  To: 'He YunLei'; +Cc: 'Jaegeuk Kim', linux-f2fs-devel

> -----Original Message-----
> From: He YunLei [mailto:heyunlei@huawei.com]
> Sent: Friday, August 07, 2015 2:26 PM
> To: Chao Yu
> Cc: 'Bintian'; 'Jaegeuk Kim'; cm224.lee@samsung.com; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] Data lost in Android app for not write new checkpoint
> 
> On 2015/8/6 18:17, Chao Yu wrote:
> >> -----Original Message-----
> >> From: He YunLei [mailto:heyunlei@huawei.com]
> >> Sent: Tuesday, August 04, 2015 9:16 PM
> >> To: Chao Yu
> >> Cc: 'Bintian'; 'Jaegeuk Kim'; cm224.lee@samsung.com; linux-f2fs-devel@lists.sourceforge.net
> >> Subject: Re: [f2fs-dev] Data lost in Android app for not write new checkpoint
> >>
> >> On 2015/7/31 18:49, Chao Yu wrote:
> >>> Hi Bintian,
> >>>
> >>>> -----Original Message-----
> >>>> From: He YunLei [mailto:heyunlei@huawei.com]
> >>>> Sent: Friday, July 31, 2015 10:29 AM
> >>>> To: linux-f2fs-devel@lists.sourceforge.net; Jaegeuk Kim
> >>>> Cc: Chao Yu; cm224.lee@samsung.com; Bintian
> >>>> Subject: [f2fs-dev] Data lost in Android app for not write new checkpoint
> >>>>
> >>>> Hi all,
> >>>> 	Recently I did some test with f2fs on my Android phone, and found a problem
> >>>> which I didn't know how to tackle it.
> >>>> 	I use my Android phone with /data partition formatted  by mkfs.f2fs. When the
> >>>> phone just started, I check the f2fs status by reading the file
> /sys/kernel/debug/f2fs/status
> >>>> in debugfs.
> >>>>
> >>>> CP calls: 10
> >>>> GC calls: 19 (BG: 19)
> >>>>      - data segments : 19 (19)
> >>>>      - node segments : 0 (0)
> >>>>
> >>>> 	We can see /data partition has done 10 times write_checkpoint since f2fs is mounted
> >>>> on the phone, it also has triggered 19 times background GC.
> >>>>
> >>>> ******
> >>>>
> >>>> Here I took some photos consecutively, and check the file /sys/kernel/debug/f2fs/status
> again
> >>>>
> >>>> ******
> >>>>
> >>>> CP calls: 10
> >>>> GC calls: 20 (BG: 20)
> >>>>      - data segments : 20 (20)
> >>>>      - node segments : 0 (0)
> >>>>
> >>>> 	there is no change in CP calls number and background GC doesn't write new checkpoint.
> >>>> if then a sudden power failure or system crash occur, the photos will be lost when the
> phone
> >>>> restart, and a sync before crash will avoid the data lost.
> >>>> 	I think this problem is bad for user experience of using Android phone with f2fs.
> >>>> How do we deal with such situation? I wish you and other developers in this list could
> help
> >>>> me in a correct way.
> >>>
> >>> IMO, it's better to figure out whether this is a bug of f2fs first or not.
> >>>
> >>> You can enable some traces in f2fs to see whether fsync is called or not.
> >>>
> >>> enable trace by:
> >>> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_enter/enable
> >>> echo 1 > /sys/kernel/debug/tracing/events/f2fs/f2fs_sync_file_exit/enable
> >>> print trace by:
> >>> cat /sys/kernel/debug/tracing/trace
> >>>
> >>> If fsync is not be called, I think in ext4 there must be the same problem,
> >>> but I guess fortunately journal commit thread save its data since it commit
> >>> transaction per 5 second by default. You can try to configure (commit=nrsec)
> >>> it with larger value for verification the issue with ext4 filesystem.
> >>>
> >>
> >> I enable the event xxx_sync_file_enter both in f2fs and ext4, and find neither of
> >> them was triggered by photo files.
> >>
> >> Then I try f2fs_writepages and ext4_da_write_pages:
> >>
> >>      ino     file_name
> >>
> >>      65573   IMG_20150804_031619.jpg
> >>      65575   IMG_20150804_031619_1.jpg
> >>      65576   IMG_20150804_031620.jpg
> >>      65577   IMG_20150804_031620_1.jpg
> >>
> >>    ext4_da_write_pages: dev 259,0 ino 65573 b_blocknr 0 b_size 0 b_state 0x0000 first_page
> 0
> >> io_done 0 pages_written 0 sync_mode 0
> >>    ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 2408448 b_state 0x0221
> first_page
> >> 0 io_done 1 pages_written 588 sync_mode 0
> >>    ext4_da_write_pages: dev 259,0 ino 65575 b_blocknr 0 b_size 0 b_state 0x0000 first_page
> 0
> >> io_done 0 pages_written 0 sync_mode 0
> >>    ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 2428928 b_state 0x0221
> first_page
> >> 0 io_done 1 pages_written 593 sync_mode 0
> >>    ext4_da_write_pages: dev 259,0 ino 65576 b_blocknr 0 b_size 0 b_state 0x0000 first_page
> 0
> >> io_done 0 pages_written 0 sync_mode 0
> >>    ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 2383872 b_state 0x0221
> first_page
> >> 0 io_done 1 pages_written 582 sync_mode 0
> >>    ext4_da_write_pages: dev 259,0 ino 65577 b_blocknr 0 b_size 0 b_state 0x0000 first_page
> 0
> >> io_done 0 pages_written 0 sync_mode 0
> >>
> >> f2fs_writepages doesn't appear in the test of f2fs
> >
> > Weird, was IO triggered from DIO/reclaim path? As Jaegeuk said, it's better
> > to check the IOs in block layer.
> >
>    I am sorry that I leave out f2fs_writepages message for the reason of huge trace log. I repeat
> the test
> several times and now make sure f2fs_writepages is triggered but very little compare to ext4.
> 
>    Another problem is that roll_forward recovery  can just resume writeback files users fsynced ,
> not including
> files whose pages written back by bdi flusher ?

Yes, f2fs only recover files fsynced, not those flushed.

> 
> >>
> >> I also try modify commit=300(default 5), but it doesn't work. Maybe somewhere else in ext4
> >> launch the ext4_da_write_pages operation.
> >
> > Maybe it's triggered by bdi flusher, can you try to configure parameters
> > under /proc/sys/vm/ e.g. dirty_writeback_centisecs/dirty_background_ratio
> > for delaying ->writepages in ext4?
> >
> >>
> >> At the end, I try to mount f2fs with disable_roll_forward, when system reboot, the f2fs is
> >> inconsistent,
> >> there are several failed check items in fsck.
> >
> > Can you share the log?
> 
> The log is below:
> 
> [FSCK] Unreachable nat entries                        [Fail] [0x64b]
> [FSCK] SIT valid block bitmap checking                [Fail]
> [FSCK] Hard link checking for regular file            [Ok..] [0x0]
> [FSCK] valid_block_count matching with CP             [Ok..] [0x579b6]
> [FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0x7b0]
> [FSCK] valid_node_count matcing with CP (nat lookup)  [Fail] [0xdfb]
> [FSCK] valid_inode_count matched with CP              [Ok..] [0x664]
> [FSCK] free segment_count matched with CP             [Ok..] [0x238]
> [FSCK] next block offset is free                      [Ok..]
> [FSCK] other corrupted bugs                           [Fail]

Can you share us more detail information of fsck fixing log? that would be
helpful. :)

> 
> I repeat the test about 5 times, the fsck failed just one time.
> When I use disable_roll_forward mount option, I find some photos don't lose occasionally.
> There are also some incomplete photo files exit on my photo.  Does roll_forward recovery
> think pages written back by bdi flusher is unreliable, and clean them ?

Since we don't know when the data will be flushed and also what part of
file will be flushed, at the time of abnormal pow-cut, the data/metadata
of flushed file can be partial in device. So I don't think it's not
possible for us to recover this kind of file with current fsync/recovery
policy of f2fs.

Thanks,

> 
> Thanks,
> He
> 
> >
> > Thanks,
> >
> >>
> >> Thanks,
> >> He
> >>
> >>> As a quick thought, maybe we can add one commit data thread, periodically
> >>> writebacking user data written by user previously, then do checkpoint for
> >>> persistence.
> >>>
> >>> So by this way, at most, we just lose our data for last configured time of
> >>> commit period.
> >>>
> >>> Thanks,
> >>>
> >>>>
> >>>> Thanks,
> >>>> He
> >>>
> >>>
> >>>
> >>> .
> >>>
> >
> >
> > .
> >


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Data lost in Android app for not write new checkpoint
  2015-08-07  6:26       ` He YunLei
  2015-08-07  9:18         ` Chao Yu
@ 2015-08-07  9:50         ` Chao Yu
  1 sibling, 0 replies; 10+ messages in thread
From: Chao Yu @ 2015-08-07  9:50 UTC (permalink / raw)
  To: 'He YunLei'; +Cc: 'Jaegeuk Kim', linux-f2fs-devel

> Since we don't know when the data will be flushed and also what part of
> file will be flushed, at the time of abnormal pow-cut, the data/metadata
> of flushed file can be partial in device. So I don't think it's not

Should be:

of flushed file can be partial in device. So I don't think it's

My bad, sorry about that.

> possible for us to recover this kind of file with current fsync/recovery
> policy of f2fs.



------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-08-07  9:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-31  2:28 Data lost in Android app for not write new checkpoint He YunLei
2015-07-31  6:18 ` Chao Yu
2015-07-31 10:49 ` Chao Yu
2015-07-31 12:00   ` Bintian
2015-08-04 13:16   ` He YunLei
2015-08-04 18:29     ` Jaegeuk Kim
2015-08-06 10:17     ` Chao Yu
2015-08-07  6:26       ` He YunLei
2015-08-07  9:18         ` Chao Yu
2015-08-07  9:50         ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).