From: Dan Williams <dan.j.williams@intel.com>
To: dennis.wu <dennis.wu@intel.com>, <nvdimm@lists.linux.dev>
Cc: <vishal.l.verma@intel.com>, <dan.j.williams@intel.com>,
<dave.jiang@intel.com>, dennis.wu <dennis.wu@intel.com>
Subject: RE: [PATCH] BTT: Use dram freelist and remove bflog to otpimize perf
Date: Mon, 11 Jul 2022 22:06:14 -0700 [thread overview]
Message-ID: <62cd01462c460_5c814294e@dwillia2-xfh.notmuch> (raw)
In-Reply-To: <20220630134244.685331-1-dennis.wu@intel.com>
dennis.wu wrote:
> Dependency:
> [PATCH] nvdimm: Add NVDIMM_NO_DEEPFLUSH flag to control btt
> data deepflush
> https://lore.kernel.org/nvdimm/20220629135801.192821-1-dennis.wu@intel.com/T/#u
>
> Reason:
> In BTT, each write will write sector data, update 4 bytes btt_map
> entry and update 16 bytes bflog (two 8 bytes atomic write),the
> meta data write overhead is big and we can optimize the algorithm
> and not use the bflog. Then each write, we will update the sector
> data and then 4 bytes btt_map entry.
>
> How:
> 1. scan the btt_map to generate the aba mapping bitmap, if one
> internal aba used, the bit will be set.
> 2. generate the in-memory freelist according the aba bitmap, the
> freelist is a array that records all the free ABAs like:
> | 340 | 422 | 578 |...
> that means ABA 340, 422, 578 are free. The last nfree(nlane)
> records in the array will be used for each lane at the beginning.
> 3. Get a free ABA of a lane, write data to the ABA. If the premap
> btt_map entry is initialization state (e_flag=0, z_flag=0), get
> an free ABA from the free ABA array for the lane. If the premap
> btt_map entry is not in initialization state, the ABA in the
> btt_map entry will be looked as the free ABA of the lane.Once
> the free ABAs = nfree that means the arena is fully written and
> we can free the whole freelist (not implimented yet).
> 4. In the code, "version_major ==2" is the new algorithm and
> the logic in else is the old algorithm.
>
> Result:
> 1. The write performance can improve ~50% and the latency also
> reduce to 60% of origial algorithm.
How does this improvement affect a real-world workload vs a
microbenchmark?
> 2. During initialization, scan btt_map and generate the freelist
> will take time and lead namespace enable longer. With 4K sector,
> 1TB namespace, the enable time less than 4s. This will only happen
> once during initalization.
> 3. Take 4 bytes per sector memory to store the freelist. But once
> the arena fully written, the freelist can be freed. As we know,in
> the storage case, the disk always be fully written for usage, then
> we don't have memory space overhead.
>
> Compatablity:
> 1. The new algorithm keep the layout of bflog, only ignore its
> logic, that means no update during new algorithm.
> 2. If a namespace create with old algorithm and layout, you can
> switch to the new algorithm seamless w/o any specific operation.
> 3. Since the bflog will not be updated if you move to the new
> algorithm. After you write data with the new algorithmyou, you
> can't switch back from the new algorithm to old algorithm.
Before digging deeper into the implementation, this needs a better
compatibility story. It is not acceptable to break the on-media format
like this. Consider someone bisecting a kernel problem over this
change, or someone reverting to an older kernel after encountering a
regression. As far as I can see this would need to be a BTT3 layout and
require explicit opt-in to move to the new format.
next prev parent reply other threads:[~2022-07-12 5:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-30 13:42 [PATCH] BTT: Use dram freelist and remove bflog to otpimize perf dennis.wu
2022-07-11 2:31 ` dennis.wu
2022-07-12 5:06 ` Dan Williams [this message]
2022-07-19 6:01 ` dennis.wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=62cd01462c460_5c814294e@dwillia2-xfh.notmuch \
--to=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dennis.wu@intel.com \
--cc=nvdimm@lists.linux.dev \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox