* Journal question
[not found] ` <50D8961A.1010703-Hpc4xzY4zrDSDkk6z29a7FAUjnlXr6A1@public.gmane.org>
@ 2012-12-24 18:46 ` Alex Pyrgiotis
[not found] ` <50D8A306.8050109-Hpc4xzY4zrDSDkk6z29a7FAUjnlXr6A1@public.gmane.org>
0 siblings, 1 reply; 2+ messages in thread
From: Alex Pyrgiotis @ 2012-12-24 18:46 UTC (permalink / raw)
To: linux-bcache-u79uwXL29TY76Z2rM5mHXA
Hi Kent,
I was reading the code documentation on your site, and I saw that you
have written this:
"... In the future, the journal will contain a strong checksum of the
data so that the write will complete when the journal entry finishes..."
My question is, is this feature implemented? As far as I can see by
looking at journal.c, you are journaling only the keys and not the data
they point to (it has been issued asynchronously a few steps back to the
SSD, right?).
Also, if this is a feature to be implemented, wouldn't it create more
stress on the CPUs, as well as more latency to issue the IO request to
the SSDs (unless you issue it asynchronously while processing the keylist)?
I'm just curious on how you'd handle it.
Thanks,
Alex
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Journal question
[not found] ` <50D8A306.8050109-Hpc4xzY4zrDSDkk6z29a7FAUjnlXr6A1@public.gmane.org>
@ 2012-12-26 18:32 ` Kent Overstreet
0 siblings, 0 replies; 2+ messages in thread
From: Kent Overstreet @ 2012-12-26 18:32 UTC (permalink / raw)
To: Alex Pyrgiotis; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA
On Mon, Dec 24, 2012 at 08:46:30PM +0200, Alex Pyrgiotis wrote:
> Hi Kent,
>
> I was reading the code documentation on your site, and I saw that
> you have written this:
>
> "... In the future, the journal will contain a strong checksum of
> the data so that the write will complete when the journal entry
> finishes..."
I think that was poorly worded. The idea is that if the pointer has a
data checksum, we can allow the journal entry to be written before all
the data it points to has been written, and then on recovery we can
figure out if we should replay keys in the journal entry by checking if
the checksums match (which means the data write did finish).
> My question is, is this feature implemented?
Nah, would need to finish data checksumming first. I'm also not sure
it's really worth implementing, since awhile back I made it so only
cache flushes wait on journal writes... in the absence of cache flushes
we only write journal entries every ~100 ms by default.
Since cache flushes have to wait on outstanding data writes anyways it
wouldn't buy us much, it'd just let the data writes and journal write go
in parallel.
> As far as I can see by
> looking at journal.c, you are journaling only the keys and not the
> data they point to (it has been issued asynchronously a few steps
> back to the SSD, right?).
Yup, that's correct.
> Also, if this is a feature to be implemented, wouldn't it create
> more stress on the CPUs, as well as more latency to issue the IO
> request to the SSDs (unless you issue it asynchronously while
> processing the keylist)?
>
> I'm just curious on how you'd handle it.
Not really, crc32c is ridiculously fast since it's implemented in
hardware and the 64 bit checksum bcache uses for some stuff is also
plenty fast. The checksumming overhead would only be noticable if we
were using something like sha1 (and there's advantages to using a crc
instead of a cryptographic hash - you can merge crcs (which we'd want to
do when merging extents), you can't merge cryptographic hashes).
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-12-26 18:32 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <50D8961A.1010703@cslab.ece.ntua.gr>
[not found] ` <50D8961A.1010703-Hpc4xzY4zrDSDkk6z29a7FAUjnlXr6A1@public.gmane.org>
2012-12-24 18:46 ` Journal question Alex Pyrgiotis
[not found] ` <50D8A306.8050109-Hpc4xzY4zrDSDkk6z29a7FAUjnlXr6A1@public.gmane.org>
2012-12-26 18:32 ` Kent Overstreet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).