All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: linux-raid@vger.kernel.org, Kernel-team@fb.com,
	songliubraving@fb.com, hch@infradead.org,
	dan.j.williams@intel.com
Subject: Re: [PATCH 4/9] raid5: log reclaim support
Date: Wed, 12 Aug 2015 13:50:08 +1000	[thread overview]
Message-ID: <20150812135008.7ee44eef@noble> (raw)
In-Reply-To: <20150805213412.GA3167799@devbig257.prn2.facebook.com>

On Wed, 5 Aug 2015 14:34:21 -0700 Shaohua Li <shli@fb.com> wrote:

> On Wed, Aug 05, 2015 at 01:43:30PM +1000, NeilBrown wrote:
> > On Wed, 29 Jul 2015 17:38:44 -0700 Shaohua Li <shli@fb.com> wrote:
> > 
> > > This is the reclaim support for raid5 log. A stripe write will have
> > > following steps:
> > > 
> > > 1. reconstruct the stripe, read data/calculate parity. ops_run_io
> > > prepares to write data/parity to raid disks
> > > 2. hijack ops_run_io. stripe data/parity is appending to log disk
> > > 3. flush log disk cache
> > > 4. ops_run_io run again and do normal operation. stripe data/parity is
> > > written in raid array disks. raid core can return io to upper layer.
> > > 5. flush cache of all raid array disks
> > > 6. update super block
> > > 7. log disk space used by the stripe can be reused
> > > 
> > > In practice, several stripes consist of an io_unit and we will batch
> > > several io_unit in different steps, but the whole process doesn't
> > > change.
> > > 
> > > It's possible io return just after data/parity hit log disk, but then
> > > read IO will need read from log disk. For simplicity, IO return happens
> > > at step 4, where read IO can directly read from raid disks.
> > > 
> > > Currently reclaim run every minute or out of space. Reclaim is just to
> > > free log disk spaces, it doesn't impact data consistency.
> > 
> > Having arbitrary times lines "every minute" is a warning sign.
> > "As soon as possible" and "Just it time" can both make sense easily.
> > "every minute" needs more justification.
> > 
> > I'll probably say more when I find the code.
> 
> The idea is if we reclaim periodically, recovery could scan less log
> space. It's insane recovery scans a 1T disk. As I said this is just to
> free disk spaces. It's not a signal we will lose data in minute
> interval. I can change the relaim to run every 1G reclaimable space for
> example.

There seem to be two issues here and I might be confusing them.

Firstly there is the question of when a stripe gets written back to the
array.  Once the data is safe in the log this doesn't have to happen in
any great hurry, but I suspect it should still happen sooner rather
than later.

Presumably as soon as data/parity of a stripe is safe in the log,
that stripe will be scheduled to be written to the array - is that
correct?

As these writes-to-the-array complete the counter in the io_unit will
decrease.  when it reaches zero the io_unit can be freed and the
recovery_offset in the superblock can, potentially be updated.

Secondly there is the question of how often the superblock is updated.
As you say; delaying the updates indefinitely could lead to a recovery
having to examine a very large part of the log - maybe more than
necessary (though if that might be a problem, the simple solution is to
use a smaller log).

I would probably feel most comfortable scheduling a superblock update
whenever the amount of log space that it would reclaim exceeds 1/4 of
the log size.  That should be often enough without imposing a
completely arbitrary number.

Make sense?

thanks,
NeilBrown

  reply	other threads:[~2015-08-12  3:50 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-30  0:38 [PATCH 0/9]raid5: fix write hole Shaohua Li
2015-07-30  0:38 ` [PATCH 1/9] MD: add a new disk role to present cache device Shaohua Li
2015-08-04 14:28   ` Christoph Hellwig
2015-08-04 18:17     ` Song Liu
2015-08-05  0:25       ` NeilBrown
2015-08-05  1:05   ` NeilBrown
2015-07-30  0:38 ` [PATCH 2/9] md: override md superblock recovery_offset for " Shaohua Li
2015-08-04 14:30   ` Christoph Hellwig
2015-08-05  1:08   ` NeilBrown
2015-07-30  0:38 ` [PATCH 3/9] raid5: add basic stripe log Shaohua Li
2015-08-05  3:07   ` NeilBrown
2015-08-05 21:19     ` Shaohua Li
2015-08-12  3:20       ` NeilBrown
2015-07-30  0:38 ` [PATCH 4/9] raid5: log reclaim support Shaohua Li
2015-08-05  3:43   ` NeilBrown
2015-08-05 21:34     ` Shaohua Li
2015-08-12  3:50       ` NeilBrown [this message]
2015-08-05  3:52   ` NeilBrown
2015-07-30  0:38 ` [PATCH 5/9] raid5: log recovery Shaohua Li
2015-08-05  4:05   ` NeilBrown
2015-08-05 21:39     ` Shaohua Li
2015-08-12  3:51       ` NeilBrown
2015-07-30  0:38 ` [PATCH 6/9] raid5: disable batch with log enabled Shaohua Li
2015-07-30  0:38 ` [PATCH 7/9] raid5: don't allow resize/reshape with cache(log) support Shaohua Li
2015-08-05  4:13   ` NeilBrown
2015-08-05 21:42     ` Shaohua Li
2015-08-12  3:57       ` NeilBrown
2015-07-30  0:38 ` [PATCH 8/9] raid5: enable log for raid array with cache disk Shaohua Li
2015-07-30  0:38 ` [PATCH 9/9] raid5: skip resync if cache(log) is enabled Shaohua Li
2015-08-05  4:16   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150812135008.7ee44eef@noble \
    --to=neilb@suse.com \
    --cc=Kernel-team@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.