From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <yuchao0@huawei.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH 2/3] f2fs: add bitmaps for empty or full NAT blocks
Date: Thu, 23 Feb 2017 14:54:49 -0800 [thread overview]
Message-ID: <20170223225449.GG2026@jaegeuk.local> (raw)
In-Reply-To: <1ecf0acf-2ae7-c547-6d7b-350bd356d48c@huawei.com>
On 02/23, Chao Yu wrote:
> On 2017/2/14 10:06, Jaegeuk Kim wrote:
> > This patches adds bitmaps to represent empty or full NAT blocks containing
> > free nid entries.
> >
> > If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
> > use these bitmaps when building free nids. In order to avoid checkpointing
> > burden, up-to-date bitmaps will be flushed only during umount time. So,
> > normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
> > which recovers this bitmap again.
> >
> > After this patch, we build free nids from nid #0 at mount time to make more
> > full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
> > without loading any NAT pages from disk.
> >
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> > fs/f2fs/checkpoint.c | 29 +++++++-
> > fs/f2fs/debug.c | 1 +
> > fs/f2fs/f2fs.h | 23 +++++-
> > fs/f2fs/node.c | 188 +++++++++++++++++++++++++++++++++++++++++++-----
> > fs/f2fs/segment.c | 2 +-
> > include/linux/f2fs_fs.h | 1 +
> > 6 files changed, 224 insertions(+), 20 deletions(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 042f8d9afe44..783c5c3f16a4 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1024,6 +1024,10 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >
> > spin_lock(&sbi->cp_lock);
> >
> > + if (ckpt->cp_pack_total_block_count >
> > + sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks)
> > + disable_nat_bits(sbi, false);
>
> I think we need to drop nat full/empty bitmap only if there is no enough space
> in CP area while doing umount, otherwise we can keep this in memory.
Yup.
>
> > +
> > if (cpc->reason == CP_UMOUNT)
> > __set_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
> > else
> > @@ -1136,6 +1140,29 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> >
> > start_blk = __start_cp_next_addr(sbi);
> >
> > + /* write nat bits */
...
> > +static int scan_nat_bits(struct f2fs_sb_info *sbi)
> > +{
> > + struct f2fs_nm_info *nm_i = NM_I(sbi);
> > + struct page *page;
> > + unsigned int i = 0;
> > + nid_t target = FREE_NID_PAGES * NAT_ENTRY_PER_BLOCK;
> > + nid_t nid;
> > +
> > + if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG))
> > + return -EAGAIN;
> > +
> > + down_read(&nm_i->nat_tree_lock);
> > +check_empty:
> > + i = find_next_bit_le(nm_i->empty_nat_bits, nm_i->nat_blocks, i);
> > + if (i >= nm_i->nat_blocks) {
> > + i = 0;
> > + goto check_partial;
> > + }
> > +
> > + for (nid = i * NAT_ENTRY_PER_BLOCK; nid < (i + 1) * NAT_ENTRY_PER_BLOCK;
> > + nid++) {
> > + if (unlikely(nid >= nm_i->max_nid))
> > + break;
> > + add_free_nid(sbi, nid, true);
> > + }
> > +
> > + if (nm_i->nid_cnt[FREE_NID_LIST] >= target)
> > + goto out;
> > + i++;
> > + goto check_empty;
> > +
> > +check_partial:
> > + i = find_next_zero_bit_le(nm_i->full_nat_bits, nm_i->nat_blocks, i);
> > + if (i >= nm_i->nat_blocks) {
> > + disable_nat_bits(sbi, true);
>
> Can this happen in real world? Should be a bug in somewhere?
It happens, since current design handles full_nat_bits optionally in order
to avoid scanning a whole NAT page to set it back as 1 from 0.
>
> > + return -EINVAL;
> > + }
> > +
> > + nid = i * NAT_ENTRY_PER_BLOCK;
> > + page = get_current_nat_page(sbi, nid);
> > + scan_nat_page(sbi, page, nid);
> > + f2fs_put_page(page, 1);
> > +
> > + if (nm_i->nid_cnt[FREE_NID_LIST] < target) {
> > + i++;
> > + goto check_partial;
> > + }
> > +out:
> > + up_read(&nm_i->nat_tree_lock);
> > + return 0;
> > +}
> > +
> > +static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
> > {
> > struct f2fs_nm_info *nm_i = NM_I(sbi);
> > struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
> > @@ -1854,6 +1911,20 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync)
> > if (!sync && !available_free_memory(sbi, FREE_NIDS))
> > return;
> >
> > + /* try to find free nids with nat_bits */
> > + if (!mount && !scan_nat_bits(sbi) && nm_i->nid_cnt[FREE_NID_LIST])
> > + return;
> > +
> > + /* find next valid candidate */
>
> This is just for mount case?
Yup, it reuses free nids in dirty NAT blocks, so that we can make them as full
NAT pages.
Thanks,
>
> > + if (is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG)) {
> > + int idx = find_next_zero_bit_le(nm_i->full_nat_bits,
> > + nm_i->nat_blocks, 0);
> > + if (idx >= nm_i->nat_blocks)
> > + set_sbi_flag(sbi, SBI_NEED_FSCK);
> > + else
> > + nid = idx * NAT_ENTRY_PER_BLOCK;
> > + }
> > +
> > /* readahead nat pages to be scanned */
> > ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES,
> > META_NAT, true);
> > @@ -1896,10 +1967,10 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync)
> > nm_i->ra_nid_pages, META_NAT, false);
> > }
> >
> > -void build_free_nids(struct f2fs_sb_info *sbi, bool sync)
> > +void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
> > {
> > mutex_lock(&NM_I(sbi)->build_lock);
> > - __build_free_nids(sbi, sync);
> > + __build_free_nids(sbi, sync, mount);
> > mutex_unlock(&NM_I(sbi)->build_lock);
> > }
> >
> > @@ -1941,7 +2012,7 @@ bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
> > spin_unlock(&nm_i->nid_list_lock);
> >
> > /* Let's scan nat pages and its caches to get free nids */
> > - build_free_nids(sbi, true);
> > + build_free_nids(sbi, true, false);
> > goto retry;
> > }
> >
> > @@ -2233,8 +2304,39 @@ static void __adjust_nat_entry_set(struct nat_entry_set *nes,
> > list_add_tail(&nes->set_list, head);
> > }
> >
> > +void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
> > + struct page *page)
> > +{
> > + struct f2fs_nm_info *nm_i = NM_I(sbi);
> > + unsigned int nat_index = start_nid / NAT_ENTRY_PER_BLOCK;
> > + struct f2fs_nat_block *nat_blk = page_address(page);
> > + int valid = 0;
> > + int i;
> > +
> > + if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG))
> > + return;
> > +
> > + for (i = 0; i < NAT_ENTRY_PER_BLOCK; i++) {
> > + if (start_nid == 0 && i == 0)
> > + valid++;
> > + if (nat_blk->entries[i].block_addr)
> > + valid++;
> > + }
> > + if (valid == 0) {
> > + test_and_set_bit_le(nat_index, nm_i->empty_nat_bits);
> > + test_and_clear_bit_le(nat_index, nm_i->full_nat_bits);
>
> set_bit_le/clear_bit_le
>
> > + return;
> > + }
> > +
> > + test_and_clear_bit_le(nat_index, nm_i->empty_nat_bits);
>
> ditto
>
> > + if (valid == NAT_ENTRY_PER_BLOCK)
> > + test_and_set_bit_le(nat_index, nm_i->full_nat_bits);
> > + else
> > + test_and_clear_bit_le(nat_index, nm_i->full_nat_bits);
>
> ditto
>
> > +}
> > +
> > static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
> > - struct nat_entry_set *set)
> > + struct nat_entry_set *set, struct cp_control *cpc)
> > {
> > struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
> > struct f2fs_journal *journal = curseg->journal;
> > @@ -2249,7 +2351,8 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
> > * #1, flush nat entries to journal in current hot data summary block.
> > * #2, flush nat entries to nat page.
> > */
> > - if (!__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL))
> > + if (cpc->reason == CP_UMOUNT ||
>
> if ((cpc->reason == CP_UMOUNT && is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG)) ||
>
> > + !__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL))
> > to_journal = false;
> >
> > if (to_journal) {
> > @@ -2289,10 +2392,12 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
> > }
> > }
> >
> > - if (to_journal)
> > + if (to_journal) {
> > up_write(&curseg->journal_rwsem);
> > - else
> > + } else {
> > + __update_nat_bits(sbi, start_nid, page);
> > f2fs_put_page(page, 1);
> > + }
> >
> > f2fs_bug_on(sbi, set->entry_cnt);
> >
> > @@ -2303,7 +2408,7 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi,
> > /*
> > * This function is called during the checkpointing process.
> > */
> > -void flush_nat_entries(struct f2fs_sb_info *sbi)
> > +void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> > {
> > struct f2fs_nm_info *nm_i = NM_I(sbi);
> > struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
> > @@ -2324,7 +2429,8 @@ void flush_nat_entries(struct f2fs_sb_info *sbi)
> > * entries, remove all entries from journal and merge them
> > * into nat entry set.
> > */
> > - if (!__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
> > + if (cpc->reason == CP_UMOUNT ||
>
> if ((cpc->reason == CP_UMOUNT && is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG)) ||
>
> > + !__has_cursum_space(journal, nm_i->dirty_nat_cnt, NAT_JOURNAL))
> > remove_nats_in_journal(sbi);
> >
> > while ((found = __gang_lookup_nat_set(nm_i,
> > @@ -2338,27 +2444,72 @@ void flush_nat_entries(struct f2fs_sb_info *sbi)
> >
> > /* flush dirty nats in nat entry set */
> > list_for_each_entry_safe(set, tmp, &sets, set_list)
> > - __flush_nat_entry_set(sbi, set);
> > + __flush_nat_entry_set(sbi, set, cpc);
> >
> > up_write(&nm_i->nat_tree_lock);
> >
> > f2fs_bug_on(sbi, nm_i->dirty_nat_cnt);
> > }
> >
> > +static int __get_nat_bitmaps(struct f2fs_sb_info *sbi)
> > +{
> > + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
> > + struct f2fs_nm_info *nm_i = NM_I(sbi);
> > + unsigned int nat_bits_bytes = nm_i->nat_blocks / BITS_PER_BYTE;
> > + unsigned int i;
> > + __u64 cp_ver = le64_to_cpu(ckpt->checkpoint_ver);
>
> __u64 cp_ver = cur_cp_version(ckpt);
>
> Thanks,
>
> > + size_t crc_offset = le32_to_cpu(ckpt->checksum_offset);
> > + __u64 crc = le32_to_cpu(*((__le32 *)
> > + ((unsigned char *)ckpt + crc_offset)));
> > + block_t nat_bits_addr;
> > +
> > + if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG))
> > + return 0;
> > +
> > + nm_i->nat_bits_blocks = F2FS_BYTES_TO_BLK((nat_bits_bytes << 1) + 8 +
> > + F2FS_BLKSIZE - 1);
> > + nm_i->nat_bits = kzalloc(nm_i->nat_bits_blocks << F2FS_BLKSIZE_BITS,
> > + GFP_KERNEL);
> > + if (!nm_i->nat_bits)
> > + return -ENOMEM;
> > +
> > + nat_bits_addr = __start_cp_addr(sbi) + sbi->blocks_per_seg -
> > + nm_i->nat_bits_blocks;
> > + for (i = 0; i < nm_i->nat_bits_blocks; i++) {
> > + struct page *page = get_meta_page(sbi, nat_bits_addr++);
> > +
> > + memcpy(nm_i->nat_bits + (i << F2FS_BLKSIZE_BITS),
> > + page_address(page), F2FS_BLKSIZE);
> > + f2fs_put_page(page, 1);
> > + }
> > +
> > + cp_ver |= (crc << 32);
> > + if (cpu_to_le64(cp_ver) != *(__le64 *)nm_i->nat_bits) {
> > + disable_nat_bits(sbi, true);
> > + return 0;
> > + }
> > +
> > + nm_i->full_nat_bits = nm_i->nat_bits + 8;
> > + nm_i->empty_nat_bits = nm_i->full_nat_bits + nat_bits_bytes;
> > +
> > + f2fs_msg(sbi->sb, KERN_NOTICE, "Found nat_bits in checkpoint");
> > + return 0;
> > +}
> > +
> > static int init_node_manager(struct f2fs_sb_info *sbi)
> > {
> > struct f2fs_super_block *sb_raw = F2FS_RAW_SUPER(sbi);
> > struct f2fs_nm_info *nm_i = NM_I(sbi);
> > unsigned char *version_bitmap;
> > - unsigned int nat_segs, nat_blocks;
> > + unsigned int nat_segs;
> > + int err;
> >
> > nm_i->nat_blkaddr = le32_to_cpu(sb_raw->nat_blkaddr);
> >
> > /* segment_count_nat includes pair segment so divide to 2. */
> > nat_segs = le32_to_cpu(sb_raw->segment_count_nat) >> 1;
> > - nat_blocks = nat_segs << le32_to_cpu(sb_raw->log_blocks_per_seg);
> > -
> > - nm_i->max_nid = NAT_ENTRY_PER_BLOCK * nat_blocks;
> > + nm_i->nat_blocks = nat_segs << le32_to_cpu(sb_raw->log_blocks_per_seg);
> > + nm_i->max_nid = NAT_ENTRY_PER_BLOCK * nm_i->nat_blocks;
> >
> > /* not used nids: 0, node, meta, (and root counted as valid node) */
> > nm_i->available_nids = nm_i->max_nid - sbi->total_valid_node_count -
> > @@ -2392,6 +2543,10 @@ static int init_node_manager(struct f2fs_sb_info *sbi)
> > if (!nm_i->nat_bitmap)
> > return -ENOMEM;
> >
> > + err = __get_nat_bitmaps(sbi);
> > + if (err)
> > + return err;
> > +
> > #ifdef CONFIG_F2FS_CHECK_FS
> > nm_i->nat_bitmap_mir = kmemdup(version_bitmap, nm_i->bitmap_size,
> > GFP_KERNEL);
> > @@ -2414,7 +2569,7 @@ int build_node_manager(struct f2fs_sb_info *sbi)
> > if (err)
> > return err;
> >
> > - build_free_nids(sbi, true);
> > + build_free_nids(sbi, true, true);
> > return 0;
> > }
> >
> > @@ -2473,6 +2628,7 @@ void destroy_node_manager(struct f2fs_sb_info *sbi)
> > up_write(&nm_i->nat_tree_lock);
> >
> > kfree(nm_i->nat_bitmap);
> > + kfree(nm_i->nat_bits);
> > #ifdef CONFIG_F2FS_CHECK_FS
> > kfree(nm_i->nat_bitmap_mir);
> > #endif
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index df2ff5cfe8f4..8e1ec248c653 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -386,7 +386,7 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
> > if (!available_free_memory(sbi, FREE_NIDS))
> > try_to_free_nids(sbi, MAX_FREE_NIDS);
> > else
> > - build_free_nids(sbi, false);
> > + build_free_nids(sbi, false, false);
> >
> > if (!is_idle(sbi))
> > return;
> > diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> > index f0748524ca8c..1c92ace2e8f8 100644
> > --- a/include/linux/f2fs_fs.h
> > +++ b/include/linux/f2fs_fs.h
> > @@ -114,6 +114,7 @@ struct f2fs_super_block {
> > /*
> > * For checkpoint
> > */
> > +#define CP_NAT_BITS_FLAG 0x00000080
> > #define CP_CRC_RECOVERY_FLAG 0x00000040
> > #define CP_FASTBOOT_FLAG 0x00000020
> > #define CP_FSCK_FLAG 0x00000010
> >
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2017-02-23 22:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-14 2:06 [PATCH 1/3] f2fs: remove build_free_nids() during checkpoint Jaegeuk Kim
2017-02-14 2:06 ` Jaegeuk Kim
2017-02-14 2:06 ` [PATCH 2/3] f2fs: add bitmaps for empty or full NAT blocks Jaegeuk Kim
2017-02-14 2:06 ` Jaegeuk Kim
2017-02-23 11:41 ` Chao Yu
2017-02-23 11:41 ` [f2fs-dev] " Chao Yu
2017-02-23 22:54 ` Jaegeuk Kim [this message]
2017-02-25 3:26 ` Chao Yu
2017-02-25 3:26 ` [f2fs-dev] " Chao Yu
2017-02-25 18:34 ` Jaegeuk Kim
2017-02-25 18:34 ` [f2fs-dev] " Jaegeuk Kim
2017-02-27 3:03 ` Chao Yu
2017-02-27 3:03 ` [f2fs-dev] " Chao Yu
2017-02-27 22:19 ` Jaegeuk Kim
2017-02-27 22:19 ` [f2fs-dev] " Jaegeuk Kim
2017-02-28 10:38 ` Chao Yu
2017-02-28 10:38 ` Chao Yu
2017-02-23 22:58 ` [PATCH 2/3 v2] " Jaegeuk Kim
2017-02-23 22:58 ` Jaegeuk Kim
2017-02-28 3:34 ` Chao Yu
2017-02-28 3:34 ` [f2fs-dev] " Chao Yu
2017-02-28 5:33 ` Jaegeuk Kim
2017-02-28 5:33 ` [f2fs-dev] " Jaegeuk Kim
2017-02-28 10:52 ` Chao Yu
2017-02-28 10:52 ` [f2fs-dev] " Chao Yu
2017-02-28 10:52 ` Chao Yu
2017-02-14 2:06 ` [PATCH 3/3] f2fs: avoid reading NAT page by get_node_info Jaegeuk Kim
2017-02-14 2:06 ` Jaegeuk Kim
2017-02-23 11:47 ` Chao Yu
2017-02-23 11:47 ` Chao Yu
2017-02-23 18:20 ` [f2fs-dev] " Jaegeuk Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170223225449.GG2026@jaegeuk.local \
--to=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=yuchao0@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.