All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akira Hayakawa <ruby.wktk@gmail.com>
To: dm-devel@redhat.com
Cc: gregkh@linuxfoundation.org, masami.hiramatsu@gmail.com,
	snitzer@redhat.com
Subject: dm-writeboost: An idea of adding read-caching
Date: Sat, 06 Dec 2014 10:42:56 +0900	[thread overview]
Message-ID: <54825F20.7060206@gmail.com> (raw)

Hi,

Let me share my idea of implementing read-caching for Writeboost, my log-structured SSD-caching driver.

This would be the next biggest improvement that I want to work in staging.

# Background
As of now, Writeboost provides only write-caching. This means it never stage data from HDD to SSD. Why I do this way is the page cache is sufficient in most cases for this purpose and stacking another read-caching target will compliment if page cache is not large enough for the workload.

In the discussion below (sorry to dig up the old thread), Mike said a target should provide both write/read caching because stacking targets isn't simple in practice while it is so in concept.

> This idea that a single target cannot provide meaningful caching for
> both reads and writes is really unwelcome.  Conceptually stacking is
> simple, but in practice the management layers that need to configure
> these stacks is fairly cumbersome.
https://www.redhat.com/archives/dm-devel/2014-January/msg00078.html

At that moment, I didn't consider read-caching can be implemented in Writeboost simply but I came up with a idea of implementing it these days.

# Idea
The idea is, conceptually, resending the read data (from HDD) to itself as "fake" write request.
As a result, writes and reads will be put into a log and written to the cache device sequentially.

There are few requirements that read-caching should achieve:
- Staged data shouldn't be written back (because they are clean) for performance but this isn't a logical bug.
- Clean data on the cache device shouldn't be discarded after reboot.
- Too big sequential (e.g. >128KB) read shouldn't be staged. This is called threshold.

The implementation basic would be:
1. Store read data to buffer in endio (does the bio has the read data while in endio?)
2. If the buffer is full, wake up a worker to submit the data as "fake" write requests to itself.
   (but it doesn't really submit bio through generic_make_request but only pass through the internal write path)

Threshold can be implemented by having a pointer on the buffer to treat it like a stack.
(If the series of data acked are longer sequential than threshold, retard the pointer the cancelled distance)

I think the interface change would be only adding a tunable like "read_cache_threshold (int)" which means
read caching is disabled when the value is zero and
the non-zero value represents the threshold.

It sounds easy but there is one thing that really annoys me. That is, a problem of possibly
resending stale data. I think I need some data structure to add to avoid this problem but I am not sure what it would look like.

Thank you for reading,

- Akira

                 reply	other threads:[~2014-12-06  1:42 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54825F20.7060206@gmail.com \
    --to=ruby.wktk@gmail.com \
    --cc=dm-devel@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=masami.hiramatsu@gmail.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.