* raid5-cache I/O path improvements V2
@ 2015-09-12 6:17 Christoph Hellwig
2015-09-12 6:17 ` [PATCH 01/12] raid5-cache: port to 4.3-rc Christoph Hellwig
` (12 more replies)
0 siblings, 13 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Hi Shaohua, hi Neil,
this series contains a few updates to the raid5-cache feature.
The first patch just ports it to the post-4.2 block layer. As part of that
I noticed that it currently doesn't handle I/O errors - fixes for that will
follow.
The second and third patch simplify the I/O unit state machine and reduce
latency and memory usage for the I/O units. The remainder are just a couple
of cleanups in this area that I stumbled upon.
Changes since V1:
- only use REQ_FUA if supported natively by the log device
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 01/12] raid5-cache: port to 4.3-rc
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 02/12] raid5-cache: free I/O units earlier Christoph Hellwig
` (11 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Port for changes in the block layer: bio endio callers don't get passed
a separate error, and bio_get_nr_vecs is gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 785749b1..c345479 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -222,7 +222,8 @@ static void __r5l_set_io_unit_state(struct r5l_io_unit *io,
io->state = state;
}
-static inline void r5l_log_endio(struct bio *bio, int error)
+/* XXX: totally ignores I/O errors */
+static void r5l_log_endio(struct bio *bio)
{
struct r5l_io_unit *io = bio->bi_private;
struct r5l_log *log = io->log;
@@ -288,8 +289,7 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
io->meta_offset = sizeof(struct r5l_meta_block);
io->seq = log->seq;
- bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL,
- bio_get_nr_vecs(log->rdev->bdev));
+ bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL, BIO_MAX_PAGES);
io->current_bio = bio;
bio->bi_rw = WRITE;
bio->bi_bdev = log->rdev->bdev;
@@ -358,8 +358,7 @@ static void r5l_append_payload_page(struct r5l_log *log, struct page *page)
alloc_bio:
if (!io->current_bio) {
struct bio *bio;
- bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL,
- bio_get_nr_vecs(log->rdev->bdev));
+ bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL, BIO_MAX_PAGES);
bio->bi_rw = WRITE;
bio->bi_bdev = log->rdev->bdev;
bio->bi_iter.bi_sector = log->log_start;
@@ -518,7 +517,7 @@ int r5l_handle_flush_request(struct r5l_log *log, struct bio *bio)
* don't need to flush again
* */
if (bio->bi_iter.bi_size == 0) {
- bio_endio(bio, 0);
+ bio_endio(bio);
return 0;
}
bio->bi_rw &= ~REQ_FLUSH;
@@ -581,7 +580,7 @@ void r5l_stripe_write_finished(struct stripe_head *sh)
__r5l_stripe_write_finished(io);
}
-static void r5l_log_flush_endio(struct bio *bio, int error)
+static void r5l_log_flush_endio(struct bio *bio)
{
struct r5l_log *log = container_of(bio, struct r5l_log,
flush_bio);
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 02/12] raid5-cache: free I/O units earlier
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
2015-09-12 6:17 ` [PATCH 01/12] raid5-cache: port to 4.3-rc Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-15 7:00 ` Neil Brown
2015-09-15 8:07 ` Neil Brown
2015-09-12 6:17 ` [PATCH 03/12] raid5-cache: rename flushed_ios to finished_ios Christoph Hellwig
` (10 subsequent siblings)
12 siblings, 2 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
There is no good reason to keep the I/O unit structures around after the
stripe has been written back to the RAID array. The only information
we need is the log sequence number, and the checkpoint offset of the
highest successfull writeback. Store those in the log structure, and
free the IO units from __r5l_stripe_write_finished.
Besides simplifying the code this also avoid having to keep the allocation
for the I/O unit around for a potentially long time as superblock updates
that checkpoint the log do not happen very often.
This also fixes the previously incorrect calculation of 'free' in
r5l_do_reclaim as a side effect: previous if took the last unit which
isn't checkpointed into account.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 145 ++++++++++++++++++-----------------------------
1 file changed, 55 insertions(+), 90 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index c345479..803bcc6 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -51,6 +51,9 @@ struct r5l_log {
sector_t log_start; /* log head. where new data appends */
u64 seq; /* log head sequence */
+ sector_t next_checkpoint;
+ u64 next_cp_seq;
+
struct mutex io_mutex;
struct r5l_io_unit *current_io; /* current io_unit accepting new data */
@@ -65,10 +68,6 @@ struct r5l_log {
* cache flush */
struct list_head flushed_ios; /* io_units which settle down in log disk */
struct bio flush_bio;
- struct list_head stripe_end_ios; /* io_units which have been
- * completely written to the RAID *
- * but have not yet been considered *
- * for updating super */
struct kmem_cache *io_kc;
@@ -185,35 +184,6 @@ static void r5l_move_io_unit_list(struct list_head *from, struct list_head *to,
}
}
-/*
- * We don't want too many io_units reside in stripe_end_ios list, which will
- * waste a lot of memory. So we try to remove some. But we must keep at least 2
- * io_units. The superblock must point to a valid meta, if it's the last meta,
- * recovery can scan less
- * */
-static void r5l_compress_stripe_end_list(struct r5l_log *log)
-{
- struct r5l_io_unit *first, *last, *io;
-
- first = list_first_entry(&log->stripe_end_ios,
- struct r5l_io_unit, log_sibling);
- last = list_last_entry(&log->stripe_end_ios,
- struct r5l_io_unit, log_sibling);
- if (first == last)
- return;
- list_del(&first->log_sibling);
- list_del(&last->log_sibling);
- while (!list_empty(&log->stripe_end_ios)) {
- io = list_first_entry(&log->stripe_end_ios,
- struct r5l_io_unit, log_sibling);
- list_del(&io->log_sibling);
- first->log_end = io->log_end;
- r5l_free_io_unit(log, io);
- }
- list_add_tail(&first->log_sibling, &log->stripe_end_ios);
- list_add_tail(&last->log_sibling, &log->stripe_end_ios);
-}
-
static void __r5l_set_io_unit_state(struct r5l_io_unit *io,
enum r5l_io_unit_state state)
{
@@ -540,31 +510,53 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
spin_unlock(&log->no_space_stripes_lock);
}
+static sector_t r5l_reclaimable_space(struct r5l_log *log)
+{
+ return r5l_ring_distance(log, log->last_checkpoint,
+ log->next_checkpoint);
+}
+
+static bool r5l_complete_flushed_ios(struct r5l_log *log)
+{
+ struct r5l_io_unit *io, *next;
+ bool found = false;
+
+ assert_spin_locked(&log->io_list_lock);
+
+ list_for_each_entry_safe(io, next, &log->flushed_ios, log_sibling) {
+ /* don't change list order */
+ if (io->state < IO_UNIT_STRIPE_END)
+ break;
+
+ log->next_checkpoint = io->log_start;
+ log->next_cp_seq = io->seq;
+
+ list_del(&io->log_sibling);
+ r5l_free_io_unit(log, io);
+
+ found = true;
+ }
+
+
+ return found;
+}
+
static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
{
struct r5l_log *log = io->log;
- struct r5l_io_unit *last;
- sector_t reclaimable_space;
unsigned long flags;
spin_lock_irqsave(&log->io_list_lock, flags);
__r5l_set_io_unit_state(io, IO_UNIT_STRIPE_END);
- /* might move 0 entry */
- r5l_move_io_unit_list(&log->flushed_ios, &log->stripe_end_ios,
- IO_UNIT_STRIPE_END);
- if (list_empty(&log->stripe_end_ios)) {
+
+ if (!r5l_complete_flushed_ios(log)) {
spin_unlock_irqrestore(&log->io_list_lock, flags);
return;
}
- last = list_last_entry(&log->stripe_end_ios,
- struct r5l_io_unit, log_sibling);
- reclaimable_space = r5l_ring_distance(log, log->last_checkpoint,
- last->log_end);
- if (reclaimable_space >= log->max_free_space)
+ if (r5l_reclaimable_space(log) > log->max_free_space)
r5l_wake_reclaim(log, 0);
- r5l_compress_stripe_end_list(log);
spin_unlock_irqrestore(&log->io_list_lock, flags);
wake_up(&log->iounit_wait);
}
@@ -640,13 +632,6 @@ void r5l_flush_stripe_to_raid(struct r5l_log *log)
submit_bio(WRITE_FLUSH, &log->flush_bio);
}
-static void r5l_kick_io_unit(struct r5l_log *log)
-{
- md_wakeup_thread(log->rdev->mddev->thread);
- wait_event_lock_irq(log->iounit_wait, !list_empty(&log->stripe_end_ios),
- log->io_list_lock);
-}
-
static void r5l_write_super(struct r5l_log *log, sector_t cp);
static void r5l_write_super_and_discard_space(struct r5l_log *log,
sector_t end)
@@ -678,10 +663,10 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log,
static void r5l_do_reclaim(struct r5l_log *log)
{
- struct r5l_io_unit *io, *last;
- LIST_HEAD(list);
- sector_t free = 0;
sector_t reclaim_target = xchg(&log->reclaim_target, 0);
+ sector_t reclaimable;
+ sector_t next_checkpoint;
+ u64 next_cp_seq;
spin_lock_irq(&log->io_list_lock);
/*
@@ -690,60 +675,41 @@ static void r5l_do_reclaim(struct r5l_log *log)
* shouldn't reuse space of an unreclaimable io_unit
* */
while (1) {
- struct list_head *target_list = NULL;
-
- while (!list_empty(&log->stripe_end_ios)) {
- io = list_first_entry(&log->stripe_end_ios,
- struct r5l_io_unit, log_sibling);
- list_move_tail(&io->log_sibling, &list);
- free += r5l_ring_distance(log, io->log_start,
- io->log_end);
- }
-
- if (free >= reclaim_target ||
+ reclaimable = r5l_reclaimable_space(log);
+ if (reclaimable >= reclaim_target ||
(list_empty(&log->running_ios) &&
list_empty(&log->io_end_ios) &&
list_empty(&log->flushing_ios) &&
list_empty(&log->flushed_ios)))
break;
- /* Below waiting mostly happens when we shutdown the raid */
- if (!list_empty(&log->flushed_ios))
- target_list = &log->flushed_ios;
- else if (!list_empty(&log->flushing_ios))
- target_list = &log->flushing_ios;
- else if (!list_empty(&log->io_end_ios))
- target_list = &log->io_end_ios;
- else if (!list_empty(&log->running_ios))
- target_list = &log->running_ios;
-
- r5l_kick_io_unit(log);
+ md_wakeup_thread(log->rdev->mddev->thread);
+ wait_event_lock_irq(log->iounit_wait,
+ r5l_reclaimable_space(log) > reclaimable,
+ log->io_list_lock);
}
+
+ next_checkpoint = log->next_checkpoint;
+ next_cp_seq = log->next_cp_seq;
spin_unlock_irq(&log->io_list_lock);
- if (list_empty(&list))
+ BUG_ON(reclaimable < 0);
+ if (reclaimable == 0)
return;
- /* super always point to last valid meta */
- last = list_last_entry(&list, struct r5l_io_unit, log_sibling);
/*
* write_super will flush cache of each raid disk. We must write super
* here, because the log area might be reused soon and we don't want to
* confuse recovery
* */
- r5l_write_super_and_discard_space(log, last->log_start);
+ r5l_write_super_and_discard_space(log, next_checkpoint);
mutex_lock(&log->io_mutex);
- log->last_checkpoint = last->log_start;
- log->last_cp_seq = last->seq;
+ log->last_checkpoint = next_checkpoint;
+ log->last_cp_seq = next_cp_seq;
mutex_unlock(&log->io_mutex);
- r5l_run_no_space_stripes(log);
- while (!list_empty(&list)) {
- io = list_first_entry(&list, struct r5l_io_unit, log_sibling);
- list_del(&io->log_sibling);
- r5l_free_io_unit(log, io);
- }
+ r5l_run_no_space_stripes(log);
}
static void r5l_reclaim_thread(struct md_thread *thread)
@@ -1105,7 +1071,6 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
spin_lock_init(&log->io_list_lock);
INIT_LIST_HEAD(&log->running_ios);
INIT_LIST_HEAD(&log->io_end_ios);
- INIT_LIST_HEAD(&log->stripe_end_ios);
INIT_LIST_HEAD(&log->flushing_ios);
INIT_LIST_HEAD(&log->flushed_ios);
bio_init(&log->flush_bio);
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 03/12] raid5-cache: rename flushed_ios to finished_ios
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
2015-09-12 6:17 ` [PATCH 01/12] raid5-cache: port to 4.3-rc Christoph Hellwig
2015-09-12 6:17 ` [PATCH 02/12] raid5-cache: free I/O units earlier Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 04/12] raid5-cache: factor out a helper to run all stripes for an I/O unit Christoph Hellwig
` (9 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
After adding FUA support we won't nessecarily have called flushed for the
bios on this list, so give it a more neutral name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 803bcc6..4442dd3 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -66,7 +66,7 @@ struct r5l_log {
* to the RAID */
struct list_head flushing_ios; /* io_units which are waiting for log
* cache flush */
- struct list_head flushed_ios; /* io_units which settle down in log disk */
+ struct list_head finished_ios; /* io_units which settle down in log disk */
struct bio flush_bio;
struct kmem_cache *io_kc;
@@ -516,14 +516,14 @@ static sector_t r5l_reclaimable_space(struct r5l_log *log)
log->next_checkpoint);
}
-static bool r5l_complete_flushed_ios(struct r5l_log *log)
+static bool r5l_complete_finished_ios(struct r5l_log *log)
{
struct r5l_io_unit *io, *next;
bool found = false;
assert_spin_locked(&log->io_list_lock);
- list_for_each_entry_safe(io, next, &log->flushed_ios, log_sibling) {
+ list_for_each_entry_safe(io, next, &log->finished_ios, log_sibling) {
/* don't change list order */
if (io->state < IO_UNIT_STRIPE_END)
break;
@@ -549,7 +549,7 @@ static void __r5l_stripe_write_finished(struct r5l_io_unit *io)
spin_lock_irqsave(&log->io_list_lock, flags);
__r5l_set_io_unit_state(io, IO_UNIT_STRIPE_END);
- if (!r5l_complete_flushed_ios(log)) {
+ if (!r5l_complete_finished_ios(log)) {
spin_unlock_irqrestore(&log->io_list_lock, flags);
return;
}
@@ -590,7 +590,7 @@ static void r5l_log_flush_endio(struct bio *bio)
raid5_release_stripe(sh);
}
}
- list_splice_tail_init(&log->flushing_ios, &log->flushed_ios);
+ list_splice_tail_init(&log->flushing_ios, &log->finished_ios);
spin_unlock_irqrestore(&log->io_list_lock, flags);
}
@@ -680,7 +680,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
(list_empty(&log->running_ios) &&
list_empty(&log->io_end_ios) &&
list_empty(&log->flushing_ios) &&
- list_empty(&log->flushed_ios)))
+ list_empty(&log->finished_ios)))
break;
md_wakeup_thread(log->rdev->mddev->thread);
@@ -1072,7 +1072,7 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
INIT_LIST_HEAD(&log->running_ios);
INIT_LIST_HEAD(&log->io_end_ios);
INIT_LIST_HEAD(&log->flushing_ios);
- INIT_LIST_HEAD(&log->flushed_ios);
+ INIT_LIST_HEAD(&log->finished_ios);
bio_init(&log->flush_bio);
log->io_kc = KMEM_CACHE(r5l_io_unit, 0);
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 04/12] raid5-cache: factor out a helper to run all stripes for an I/O unit
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (2 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 03/12] raid5-cache: rename flushed_ios to finished_ios Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 05/12] raid5-cache: use FUA writes for the log Christoph Hellwig
` (8 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 4442dd3..c288c43 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -192,6 +192,17 @@ static void __r5l_set_io_unit_state(struct r5l_io_unit *io,
io->state = state;
}
+static void r5l_io_run_stripes(struct r5l_io_unit *io)
+{
+ struct stripe_head *sh, *next;
+
+ list_for_each_entry_safe(sh, next, &io->stripe_list, log_list) {
+ list_del_init(&sh->log_list);
+ set_bit(STRIPE_HANDLE, &sh->state);
+ raid5_release_stripe(sh);
+ }
+}
+
/* XXX: totally ignores I/O errors */
static void r5l_log_endio(struct bio *bio)
{
@@ -578,18 +589,10 @@ static void r5l_log_flush_endio(struct bio *bio)
flush_bio);
unsigned long flags;
struct r5l_io_unit *io;
- struct stripe_head *sh;
spin_lock_irqsave(&log->io_list_lock, flags);
- list_for_each_entry(io, &log->flushing_ios, log_sibling) {
- while (!list_empty(&io->stripe_list)) {
- sh = list_first_entry(&io->stripe_list,
- struct stripe_head, log_list);
- list_del_init(&sh->log_list);
- set_bit(STRIPE_HANDLE, &sh->state);
- raid5_release_stripe(sh);
- }
- }
+ list_for_each_entry(io, &log->flushing_ios, log_sibling)
+ r5l_io_run_stripes(io);
list_splice_tail_init(&log->flushing_ios, &log->finished_ios);
spin_unlock_irqrestore(&log->io_list_lock, flags);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/12] raid5-cache: use FUA writes for the log
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (3 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 04/12] raid5-cache: factor out a helper to run all stripes for an I/O unit Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 06/12] raid5-cache: clean up r5l_get_meta Christoph Hellwig
` (7 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
If we submit writes with the FUA bit for the log they are guaranteed to
be on stable storage once the endio callback is called. This allows
to simplify the IO unit state machine, and decrease latencies a lot
when the device supports FUA. Use this to improve performance on devices
that support FUA, or keep the old code if the device doesn't support it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index c288c43..d813c89 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -81,6 +81,8 @@ struct r5l_log {
struct list_head no_space_stripes; /* pending stripes, log has no space */
spinlock_t no_space_stripes_lock;
+
+ bool use_fua;
};
/*
@@ -203,6 +205,22 @@ static void r5l_io_run_stripes(struct r5l_io_unit *io)
}
}
+static void r5l_log_run_stripes(struct r5l_log *log)
+{
+ struct r5l_io_unit *io, *next;
+
+ assert_spin_locked(&log->io_list_lock);
+
+ list_for_each_entry_safe(io, next, &log->running_ios, log_sibling) {
+ /* don't change list order */
+ if (io->state < IO_UNIT_IO_END)
+ break;
+
+ list_move_tail(&io->log_sibling, &log->finished_ios);
+ r5l_io_run_stripes(io);
+ }
+}
+
/* XXX: totally ignores I/O errors */
static void r5l_log_endio(struct bio *bio)
{
@@ -217,11 +235,15 @@ static void r5l_log_endio(struct bio *bio)
spin_lock_irqsave(&log->io_list_lock, flags);
__r5l_set_io_unit_state(io, IO_UNIT_IO_END);
- r5l_move_io_unit_list(&log->running_ios, &log->io_end_ios,
- IO_UNIT_IO_END);
+ if (log->use_fua)
+ r5l_log_run_stripes(log);
+ else
+ r5l_move_io_unit_list(&log->running_ios, &log->io_end_ios,
+ IO_UNIT_IO_END);
spin_unlock_irqrestore(&log->io_list_lock, flags);
- md_wakeup_thread(log->rdev->mddev->thread);
+ if (!log->use_fua)
+ md_wakeup_thread(log->rdev->mddev->thread);
}
static void r5l_submit_current_io(struct r5l_log *log)
@@ -248,7 +270,7 @@ static void r5l_submit_current_io(struct r5l_log *log)
while ((bio = bio_list_pop(&io->bios))) {
/* all IO must start from rdev->data_offset */
bio->bi_iter.bi_sector += log->rdev->data_offset;
- submit_bio(WRITE, bio);
+ submit_bio(WRITE | (log->use_fua ? REQ_FUA : 0), bio);
}
}
@@ -614,7 +636,8 @@ static void r5l_log_flush_endio(struct bio *bio)
void r5l_flush_stripe_to_raid(struct r5l_log *log)
{
bool do_flush;
- if (!log)
+
+ if (!log || log->use_fua)
return;
spin_lock_irq(&log->io_list_lock);
@@ -1066,6 +1089,8 @@ int r5l_init_log(struct r5conf *conf, struct md_rdev *rdev)
return -ENOMEM;
log->rdev = rdev;
+ log->use_fua = (rdev->bdev->bd_disk->queue->flush_flags & REQ_FUA);
+
log->uuid_checksum = crc32_le(~0, (void *)rdev->mddev->uuid,
sizeof(rdev->mddev->uuid));
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 06/12] raid5-cache: clean up r5l_get_meta
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (4 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 05/12] raid5-cache: use FUA writes for the log Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 07/12] raid5-cache: refactor bio allocation Christoph Hellwig
` (6 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Remove the only partially used local 'io' variable to simplify the code
flow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index d813c89..f9aaf2b 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -320,16 +320,12 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
static int r5l_get_meta(struct r5l_log *log, unsigned int payload_size)
{
- struct r5l_io_unit *io;
-
- io = log->current_io;
- if (io && io->meta_offset + payload_size > PAGE_SIZE)
+ if (log->current_io &&
+ log->current_io->meta_offset + payload_size > PAGE_SIZE)
r5l_submit_current_io(log);
- io = log->current_io;
- if (io)
- return 0;
- log->current_io = r5l_new_meta(log);
+ if (!log->current_io)
+ log->current_io = r5l_new_meta(log);
return 0;
}
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 07/12] raid5-cache: refactor bio allocation
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (5 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 06/12] raid5-cache: clean up r5l_get_meta Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 08/12] raid5-cache: take rdev->data_offset into account early on Christoph Hellwig
` (5 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Split out a helper to allocate a bio for log writes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 44 ++++++++++++++++++++------------------------
1 file changed, 20 insertions(+), 24 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index f9aaf2b..fb1ac4c 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -274,11 +274,25 @@ static void r5l_submit_current_io(struct r5l_log *log)
}
}
+static struct bio *r5l_bio_alloc(struct r5l_log *log, struct r5l_io_unit *io)
+{
+ struct bio *bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL, BIO_MAX_PAGES);
+
+ bio->bi_rw = WRITE;
+ bio->bi_bdev = log->rdev->bdev;
+ bio->bi_iter.bi_sector = log->log_start;
+ bio->bi_end_io = r5l_log_endio;
+ bio->bi_private = io;
+
+ bio_list_add(&io->bios, bio);
+ atomic_inc(&io->pending_io);
+ return bio;
+}
+
static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
{
struct r5l_io_unit *io;
struct r5l_meta_block *block;
- struct bio *bio;
io = r5l_alloc_io_unit(log);
@@ -292,17 +306,8 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
io->meta_offset = sizeof(struct r5l_meta_block);
io->seq = log->seq;
- bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL, BIO_MAX_PAGES);
- io->current_bio = bio;
- bio->bi_rw = WRITE;
- bio->bi_bdev = log->rdev->bdev;
- bio->bi_iter.bi_sector = log->log_start;
- bio_add_page(bio, io->meta_page, PAGE_SIZE, 0);
- bio->bi_end_io = r5l_log_endio;
- bio->bi_private = io;
-
- bio_list_add(&io->bios, bio);
- atomic_inc(&io->pending_io);
+ io->current_bio = r5l_bio_alloc(log, io);
+ bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
log->seq++;
log->log_start = r5l_ring_add(log, log->log_start, BLOCK_SECTORS);
@@ -355,18 +360,9 @@ static void r5l_append_payload_page(struct r5l_log *log, struct page *page)
struct r5l_io_unit *io = log->current_io;
alloc_bio:
- if (!io->current_bio) {
- struct bio *bio;
- bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL, BIO_MAX_PAGES);
- bio->bi_rw = WRITE;
- bio->bi_bdev = log->rdev->bdev;
- bio->bi_iter.bi_sector = log->log_start;
- bio->bi_end_io = r5l_log_endio;
- bio->bi_private = io;
- bio_list_add(&io->bios, bio);
- atomic_inc(&io->pending_io);
- io->current_bio = bio;
- }
+ if (!io->current_bio)
+ io->current_bio = r5l_bio_alloc(log, io);
+
if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0)) {
io->current_bio = NULL;
goto alloc_bio;
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 08/12] raid5-cache: take rdev->data_offset into account early on
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (6 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 07/12] raid5-cache: refactor bio allocation Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 09/12] raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta Christoph Hellwig
` (4 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Set up bi_sector properly when we allocate an bio instead of updating it
at submission time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index fb1ac4c..df24e9c 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -267,11 +267,8 @@ static void r5l_submit_current_io(struct r5l_log *log)
__r5l_set_io_unit_state(io, IO_UNIT_IO_START);
spin_unlock_irqrestore(&log->io_list_lock, flags);
- while ((bio = bio_list_pop(&io->bios))) {
- /* all IO must start from rdev->data_offset */
- bio->bi_iter.bi_sector += log->rdev->data_offset;
+ while ((bio = bio_list_pop(&io->bios)))
submit_bio(WRITE | (log->use_fua ? REQ_FUA : 0), bio);
- }
}
static struct bio *r5l_bio_alloc(struct r5l_log *log, struct r5l_io_unit *io)
@@ -280,7 +277,7 @@ static struct bio *r5l_bio_alloc(struct r5l_log *log, struct r5l_io_unit *io)
bio->bi_rw = WRITE;
bio->bi_bdev = log->rdev->bdev;
- bio->bi_iter.bi_sector = log->log_start;
+ bio->bi_iter.bi_sector = log->rdev->data_offset + log->log_start;
bio->bi_end_io = r5l_log_endio;
bio->bi_private = io;
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 09/12] raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (7 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 08/12] raid5-cache: take rdev->data_offset into account early on Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 10/12] raid5-cache: new helper: r5_reserve_log_entry Christoph Hellwig
` (3 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
This is the only user, and keeping all code initializing the io_unit
structure together improves readbility.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index df24e9c..82a9d32 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -148,23 +148,6 @@ static bool r5l_has_free_space(struct r5l_log *log, sector_t size)
return log->device_size > used_size + size;
}
-static struct r5l_io_unit *r5l_alloc_io_unit(struct r5l_log *log)
-{
- struct r5l_io_unit *io;
- /* We can't handle memory allocate failure so far */
- gfp_t gfp = GFP_NOIO | __GFP_NOFAIL;
-
- io = kmem_cache_zalloc(log->io_kc, gfp);
- io->log = log;
- io->meta_page = alloc_page(gfp | __GFP_ZERO);
-
- bio_list_init(&io->bios);
- INIT_LIST_HEAD(&io->log_sibling);
- INIT_LIST_HEAD(&io->stripe_list);
- io->state = IO_UNIT_RUNNING;
- return io;
-}
-
static void r5l_free_io_unit(struct r5l_log *log, struct r5l_io_unit *io)
{
__free_page(io->meta_page);
@@ -291,8 +274,15 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
struct r5l_io_unit *io;
struct r5l_meta_block *block;
- io = r5l_alloc_io_unit(log);
+ /* We can't handle memory allocate failure so far */
+ io = kmem_cache_zalloc(log->io_kc, GFP_NOIO | __GFP_NOFAIL);
+ io->log = log;
+ bio_list_init(&io->bios);
+ INIT_LIST_HEAD(&io->log_sibling);
+ INIT_LIST_HEAD(&io->stripe_list);
+ io->state = IO_UNIT_RUNNING;
+ io->meta_page = alloc_page(GFP_NOIO | __GFP_NOFAIL | __GFP_ZERO);
block = page_address(io->meta_page);
block->magic = cpu_to_le32(R5LOG_MAGIC);
block->version = R5LOG_VERSION;
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 10/12] raid5-cache: new helper: r5_reserve_log_entry
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (8 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 09/12] raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 11/12] raid5-cache: small log->seq cleanup Christoph Hellwig
` (2 subsequent siblings)
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Factor out code to reserve log space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 82a9d32..7d9fa29 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -269,6 +269,23 @@ static struct bio *r5l_bio_alloc(struct r5l_log *log, struct r5l_io_unit *io)
return bio;
}
+static void r5_reserve_log_entry(struct r5l_log *log, struct r5l_io_unit *io)
+{
+ log->log_start = r5l_ring_add(log, log->log_start, BLOCK_SECTORS);
+
+ /*
+ * If we filled up the log device start from the beginning again,
+ * which will require a new bio.
+ *
+ * Note: for this to work properly the log size needs to me a multiple
+ * of BLOCK_SECTORS.
+ */
+ if (log->log_start == 0)
+ io->current_bio = NULL;
+
+ io->log_end = log->log_start;
+}
+
static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
{
struct r5l_io_unit *io;
@@ -297,11 +314,7 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
log->seq++;
- log->log_start = r5l_ring_add(log, log->log_start, BLOCK_SECTORS);
- io->log_end = log->log_start;
- /* current bio hit disk end */
- if (log->log_start == 0)
- io->current_bio = NULL;
+ r5_reserve_log_entry(log, io);
spin_lock_irq(&log->io_list_lock);
list_add_tail(&io->log_sibling, &log->running_ios);
@@ -354,13 +367,8 @@ alloc_bio:
io->current_bio = NULL;
goto alloc_bio;
}
- log->log_start = r5l_ring_add(log, log->log_start,
- BLOCK_SECTORS);
- /* current bio hit disk end */
- if (log->log_start == 0)
- io->current_bio = NULL;
- io->log_end = log->log_start;
+ r5_reserve_log_entry(log, io);
}
static void r5l_log_stripe(struct r5l_log *log, struct stripe_head *sh,
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 11/12] raid5-cache: small log->seq cleanup
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (9 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 10/12] raid5-cache: new helper: r5_reserve_log_entry Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 12/12] raid5-cache: use bio chaining Christoph Hellwig
2015-09-14 19:11 ` raid5-cache I/O path improvements V2 Shaohua Li
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 7d9fa29..858bfcf 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -308,12 +308,11 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
io->log_start = log->log_start;
io->meta_offset = sizeof(struct r5l_meta_block);
- io->seq = log->seq;
+ io->seq = log->seq++;
io->current_bio = r5l_bio_alloc(log, io);
bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
- log->seq++;
r5_reserve_log_entry(log, io);
spin_lock_irq(&log->io_list_lock);
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 12/12] raid5-cache: use bio chaining
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (10 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 11/12] raid5-cache: small log->seq cleanup Christoph Hellwig
@ 2015-09-12 6:17 ` Christoph Hellwig
2015-09-14 19:11 ` raid5-cache I/O path improvements V2 Shaohua Li
12 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-12 6:17 UTC (permalink / raw)
To: Shaohua Li, neilb; +Cc: linux-raid, Kernel-team, dan.j.williams
Simplify the bio completion handler by using bio chaining and submitting
bios as soon as they are full.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5-cache.c | 38 ++++++++++++++++----------------------
1 file changed, 16 insertions(+), 22 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 858bfcf..50ffc9b 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -98,8 +98,6 @@ struct r5l_io_unit {
struct page *meta_page; /* store meta block */
int meta_offset; /* current offset in meta_page */
- struct bio_list bios;
- atomic_t pending_io; /* pending bios not written to log yet */
struct bio *current_bio; /* current_bio accepting new data */
atomic_t pending_stripe; /* how many stripes not flushed to raid */
@@ -110,6 +108,7 @@ struct r5l_io_unit {
struct list_head stripe_list; /* stripes added to the io_unit */
int state;
+ bool need_split_bio;
};
/* r5l_io_unit state */
@@ -213,9 +212,6 @@ static void r5l_log_endio(struct bio *bio)
bio_put(bio);
- if (!atomic_dec_and_test(&io->pending_io))
- return;
-
spin_lock_irqsave(&log->io_list_lock, flags);
__r5l_set_io_unit_state(io, IO_UNIT_IO_END);
if (log->use_fua)
@@ -233,7 +229,6 @@ static void r5l_submit_current_io(struct r5l_log *log)
{
struct r5l_io_unit *io = log->current_io;
struct r5l_meta_block *block;
- struct bio *bio;
unsigned long flags;
u32 crc;
@@ -250,22 +245,17 @@ static void r5l_submit_current_io(struct r5l_log *log)
__r5l_set_io_unit_state(io, IO_UNIT_IO_START);
spin_unlock_irqrestore(&log->io_list_lock, flags);
- while ((bio = bio_list_pop(&io->bios)))
- submit_bio(WRITE | (log->use_fua ? REQ_FUA : 0), bio);
+ submit_bio(WRITE | (log->use_fua ? REQ_FUA : 0), io->current_bio);
}
-static struct bio *r5l_bio_alloc(struct r5l_log *log, struct r5l_io_unit *io)
+static struct bio *r5l_bio_alloc(struct r5l_log *log)
{
struct bio *bio = bio_kmalloc(GFP_NOIO | __GFP_NOFAIL, BIO_MAX_PAGES);
bio->bi_rw = WRITE;
bio->bi_bdev = log->rdev->bdev;
bio->bi_iter.bi_sector = log->rdev->data_offset + log->log_start;
- bio->bi_end_io = r5l_log_endio;
- bio->bi_private = io;
- bio_list_add(&io->bios, bio);
- atomic_inc(&io->pending_io);
return bio;
}
@@ -281,7 +271,7 @@ static void r5_reserve_log_entry(struct r5l_log *log, struct r5l_io_unit *io)
* of BLOCK_SECTORS.
*/
if (log->log_start == 0)
- io->current_bio = NULL;
+ io->need_split_bio = true;
io->log_end = log->log_start;
}
@@ -294,7 +284,6 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
/* We can't handle memory allocate failure so far */
io = kmem_cache_zalloc(log->io_kc, GFP_NOIO | __GFP_NOFAIL);
io->log = log;
- bio_list_init(&io->bios);
INIT_LIST_HEAD(&io->log_sibling);
INIT_LIST_HEAD(&io->stripe_list);
io->state = IO_UNIT_RUNNING;
@@ -310,7 +299,9 @@ static struct r5l_io_unit *r5l_new_meta(struct r5l_log *log)
io->meta_offset = sizeof(struct r5l_meta_block);
io->seq = log->seq++;
- io->current_bio = r5l_bio_alloc(log, io);
+ io->current_bio = r5l_bio_alloc(log);
+ io->current_bio->bi_end_io = r5l_log_endio;
+ io->current_bio->bi_private = io;
bio_add_page(io->current_bio, io->meta_page, PAGE_SIZE, 0);
r5_reserve_log_entry(log, io);
@@ -358,15 +349,18 @@ static void r5l_append_payload_page(struct r5l_log *log, struct page *page)
{
struct r5l_io_unit *io = log->current_io;
-alloc_bio:
- if (!io->current_bio)
- io->current_bio = r5l_bio_alloc(log, io);
+ if (io->need_split_bio) {
+ struct bio *prev = io->current_bio;
- if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0)) {
- io->current_bio = NULL;
- goto alloc_bio;
+ io->current_bio = r5l_bio_alloc(log);
+ bio_chain(io->current_bio, prev);
+
+ submit_bio(WRITE | (log->use_fua ? REQ_FUA : 0), prev);
}
+ if (!bio_add_page(io->current_bio, page, PAGE_SIZE, 0))
+ BUG();
+
r5_reserve_log_entry(log, io);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
` (11 preceding siblings ...)
2015-09-12 6:17 ` [PATCH 12/12] raid5-cache: use bio chaining Christoph Hellwig
@ 2015-09-14 19:11 ` Shaohua Li
2015-09-15 7:23 ` Neil Brown
12 siblings, 1 reply; 24+ messages in thread
From: Shaohua Li @ 2015-09-14 19:11 UTC (permalink / raw)
To: Christoph Hellwig, neilb@suse.de
Cc: linux-raid@vger.kernel.org, Kernel Team, dan.j.williams@intel.com
On 9/11/15, 11:17 PM, "Christoph Hellwig" <hch@lst.de> wrote:
>Hi Shaohua, hi Neil,
>
>this series contains a few updates to the raid5-cache feature.
>
>The first patch just ports it to the post-4.2 block layer. As part of
>that
>I noticed that it currently doesn't handle I/O errors - fixes for that
>will
>follow.
>
>The second and third patch simplify the I/O unit state machine and reduce
>latency and memory usage for the I/O units. The remainder are just a
>couple
>of cleanups in this area that I stumbled upon.
>
>Changes since V1:
> - only use REQ_FUA if supported natively by the log device
Hi Christoph,
I finally got some data with a Samsung SSD, which supports fua. Controller
is ahci.
Test is a simple fio with all full stripe write.
libata.fua=0, throughput 247m/s
libata.fua=1, throughput 74m/s
fua is significantly slower. I think we need a sysfs config to enable fua.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 02/12] raid5-cache: free I/O units earlier
2015-09-12 6:17 ` [PATCH 02/12] raid5-cache: free I/O units earlier Christoph Hellwig
@ 2015-09-15 7:00 ` Neil Brown
2015-09-17 1:50 ` Christoph Hellwig
2015-09-15 8:07 ` Neil Brown
1 sibling, 1 reply; 24+ messages in thread
From: Neil Brown @ 2015-09-15 7:00 UTC (permalink / raw)
To: Christoph Hellwig, Shaohua Li; +Cc: linux-raid, Kernel-team, dan.j.williams
[-- Attachment #1: Type: text/plain, Size: 3721 bytes --]
Christoph Hellwig <hch@lst.de> writes:
> There is no good reason to keep the I/O unit structures around after the
> stripe has been written back to the RAID array. The only information
> we need is the log sequence number, and the checkpoint offset of the
> highest successfull writeback. Store those in the log structure, and
> free the IO units from __r5l_stripe_write_finished.
>
> Besides simplifying the code this also avoid having to keep the allocation
> for the I/O unit around for a potentially long time as superblock updates
> that checkpoint the log do not happen very often.
This only saves the allocation for two I/O units doesn't it? So not a
big saveding, but still a good clean-up.
>
> This also fixes the previously incorrect calculation of 'free' in
> r5l_do_reclaim as a side effect: previous if took the last unit which
> isn't checkpointed into account.
It took me a while to see that - so just to be sure I understand.
There is a list of I/O units which have been completely written to the
log and to the RAID. The last one of these is used as a marker for the
start of the log, so it cannot be freed yet. All the earlier ones can
be freed.
The previous code added up the sizes of all of the units in this list,
so it incorrectly included that last unit.
Your new code calculates the difference between the start of the first
unit and the start of the last unit, and so correctly excluded the last
unit from the free space calculation.
Did I get that right?
> @@ -690,60 +675,41 @@ static void r5l_do_reclaim(struct r5l_log *log)
> * shouldn't reuse space of an unreclaimable io_unit
> * */
> while (1) {
> - struct list_head *target_list = NULL;
> -
> - while (!list_empty(&log->stripe_end_ios)) {
> - io = list_first_entry(&log->stripe_end_ios,
> - struct r5l_io_unit, log_sibling);
> - list_move_tail(&io->log_sibling, &list);
> - free += r5l_ring_distance(log, io->log_start,
> - io->log_end);
> - }
> -
> - if (free >= reclaim_target ||
> + reclaimable = r5l_reclaimable_space(log);
> + if (reclaimable >= reclaim_target ||
> (list_empty(&log->running_ios) &&
> list_empty(&log->io_end_ios) &&
> list_empty(&log->flushing_ios) &&
> list_empty(&log->flushed_ios)))
> break;
>
> - /* Below waiting mostly happens when we shutdown the raid */
> - if (!list_empty(&log->flushed_ios))
> - target_list = &log->flushed_ios;
> - else if (!list_empty(&log->flushing_ios))
> - target_list = &log->flushing_ios;
> - else if (!list_empty(&log->io_end_ios))
> - target_list = &log->io_end_ios;
> - else if (!list_empty(&log->running_ios))
> - target_list = &log->running_ios;
> -
> - r5l_kick_io_unit(log);
> + md_wakeup_thread(log->rdev->mddev->thread);
> + wait_event_lock_irq(log->iounit_wait,
> + r5l_reclaimable_space(log) > reclaimable,
> + log->io_list_lock);
> }
This is ringing warning bells.... I'm not sure it's wrong, but I'm not
sure it is right.
I feel that if the thread gets woken and finds that
r5l_reclaimable_space is still too low, it should wake up the thread
again, just in case.
Also, it should test against reclaim_target, not reclaimable, shouldn't
it?
Instead of a wait_event_lock_irq inside a loop, could we just have a
wake_event_*??
wait_event_lock_irq_cmd(log->iounit_wait,
r5l_reclaimable_space(log) >= reclaim_target ||
(list_empty() && .... list_empty()),
log->io_list_lock,
md_wakeup_thread(log->rdev->mddev->thread));
??
Otherwise I like it. Thanks.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-14 19:11 ` raid5-cache I/O path improvements V2 Shaohua Li
@ 2015-09-15 7:23 ` Neil Brown
2015-09-15 21:54 ` Shaohua Li
0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2015-09-15 7:23 UTC (permalink / raw)
To: Shaohua Li, Christoph Hellwig
Cc: linux-raid@vger.kernel.org, Kernel Team, dan.j.williams@intel.com
[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]
Shaohua Li <shli@fb.com> writes:
> On 9/11/15, 11:17 PM, "Christoph Hellwig" <hch@lst.de> wrote:
>
>>Hi Shaohua, hi Neil,
>>
>>this series contains a few updates to the raid5-cache feature.
>>
>>The first patch just ports it to the post-4.2 block layer. As part of
>>that
>>I noticed that it currently doesn't handle I/O errors - fixes for that
>>will
>>follow.
>>
>>The second and third patch simplify the I/O unit state machine and reduce
>>latency and memory usage for the I/O units. The remainder are just a
>>couple
>>of cleanups in this area that I stumbled upon.
>>
>>Changes since V1:
>> - only use REQ_FUA if supported natively by the log device
>
> Hi Christoph,
>
> I finally got some data with a Samsung SSD, which supports fua. Controller
> is ahci.
> Test is a simple fio with all full stripe write.
>
> libata.fua=0, throughput 247m/s
> libata.fua=1, throughput 74m/s
Eek! That's a big price to pay!
>
> fua is significantly slower. I think we need a sysfs config to enable fua.
I don't want a sysfs config if we can possibly avoid it.
Christoph's code sets FUA on every block written to the log, both data
and metadata. Is that really what we want?
I don't know much of the hardware details, but wouldn't setting FUA and
FLUSH on the last block written be just as effective and possibly faster
(by giving more flexibility to lower layers)??
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 02/12] raid5-cache: free I/O units earlier
2015-09-12 6:17 ` [PATCH 02/12] raid5-cache: free I/O units earlier Christoph Hellwig
2015-09-15 7:00 ` Neil Brown
@ 2015-09-15 8:07 ` Neil Brown
2015-09-17 1:48 ` Christoph Hellwig
1 sibling, 1 reply; 24+ messages in thread
From: Neil Brown @ 2015-09-15 8:07 UTC (permalink / raw)
To: Christoph Hellwig, Shaohua Li; +Cc: linux-raid, Kernel-team, dan.j.williams
[-- Attachment #1: Type: text/plain, Size: 194 bytes --]
Christoph Hellwig <hch@lst.de> writes:
> + sector_t reclaimable;
sector_t is unsigned (u64 or "unsigned long"), so
> + BUG_ON(reclaimable < 0);
This could only be a compiler bug.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-15 7:23 ` Neil Brown
@ 2015-09-15 21:54 ` Shaohua Li
2015-09-17 1:53 ` Christoph Hellwig
2015-09-28 14:01 ` Christoph Hellwig
0 siblings, 2 replies; 24+ messages in thread
From: Shaohua Li @ 2015-09-15 21:54 UTC (permalink / raw)
To: Neil Brown
Cc: Christoph Hellwig, linux-raid@vger.kernel.org, Kernel Team,
dan.j.williams@intel.com
On Tue, Sep 15, 2015 at 09:23:44AM +0200, Neil Brown wrote:
> Shaohua Li <shli@fb.com> writes:
>
> > On 9/11/15, 11:17 PM, "Christoph Hellwig" <hch@lst.de> wrote:
> >
> >>Hi Shaohua, hi Neil,
> >>
> >>this series contains a few updates to the raid5-cache feature.
> >>
> >>The first patch just ports it to the post-4.2 block layer. As part of
> >>that
> >>I noticed that it currently doesn't handle I/O errors - fixes for that
> >>will
> >>follow.
> >>
> >>The second and third patch simplify the I/O unit state machine and reduce
> >>latency and memory usage for the I/O units. The remainder are just a
> >>couple
> >>of cleanups in this area that I stumbled upon.
> >>
> >>Changes since V1:
> >> - only use REQ_FUA if supported natively by the log device
> >
> > Hi Christoph,
> >
> > I finally got some data with a Samsung SSD, which supports fua. Controller
> > is ahci.
> > Test is a simple fio with all full stripe write.
> >
> > libata.fua=0, throughput 247m/s
> > libata.fua=1, throughput 74m/s
>
> Eek! That's a big price to pay!
>
> >
> > fua is significantly slower. I think we need a sysfs config to enable fua.
>
> I don't want a sysfs config if we can possibly avoid it.
>
> Christoph's code sets FUA on every block written to the log, both data
> and metadata. Is that really what we want?
>
> I don't know much of the hardware details, but wouldn't setting FUA and
> FLUSH on the last block written be just as effective and possibly faster
> (by giving more flexibility to lower layers)??
How is it different against without FUA, eg, doing a flush after several bios?
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 02/12] raid5-cache: free I/O units earlier
2015-09-15 8:07 ` Neil Brown
@ 2015-09-17 1:48 ` Christoph Hellwig
0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-17 1:48 UTC (permalink / raw)
To: Neil Brown; +Cc: Shaohua Li, linux-raid, Kernel-team, dan.j.williams
On Tue, Sep 15, 2015 at 10:07:12AM +0200, Neil Brown wrote:
> Christoph Hellwig <hch@lst.de> writes:
>
> > + sector_t reclaimable;
>
> sector_t is unsigned (u64 or "unsigned long"), so
>
> > + BUG_ON(reclaimable < 0);
>
> This could only be a compiler bug.
Indeed, I'll fix this up.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 02/12] raid5-cache: free I/O units earlier
2015-09-15 7:00 ` Neil Brown
@ 2015-09-17 1:50 ` Christoph Hellwig
0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-17 1:50 UTC (permalink / raw)
To: Neil Brown; +Cc: Shaohua Li, linux-raid, Kernel-team, dan.j.williams
On Tue, Sep 15, 2015 at 09:00:31AM +0200, Neil Brown wrote:
> This only saves the allocation for two I/O units doesn't it? So not a
> big saveding, but still a good clean-up.
Yes.
> > r5l_do_reclaim as a side effect: previous if took the last unit which
> > isn't checkpointed into account.
>
> It took me a while to see that - so just to be sure I understand.
>
> There is a list of I/O units which have been completely written to the
> log and to the RAID. The last one of these is used as a marker for the
> start of the log, so it cannot be freed yet. All the earlier ones can
> be freed.
> The previous code added up the sizes of all of the units in this list,
> so it incorrectly included that last unit.
> Your new code calculates the difference between the start of the first
> unit and the start of the last unit, and so correctly excluded the last
> unit from the free space calculation.
>
> Did I get that right?
You did - or at very least your understanding matches my understanding of
the old code.
> > + md_wakeup_thread(log->rdev->mddev->thread);
> > + wait_event_lock_irq(log->iounit_wait,
> > + r5l_reclaimable_space(log) > reclaimable,
> > + log->io_list_lock);
> > }
>
> This is ringing warning bells.... I'm not sure it's wrong, but I'm not
> sure it is right.
> I feel that if the thread gets woken and finds that
> r5l_reclaimable_space is still too low, it should wake up the thread
> again, just in case.
> Also, it should test against reclaim_target, not reclaimable, shouldn't
> it?
>
> Instead of a wait_event_lock_irq inside a loop, could we just have a
> wake_event_*??
>
> wait_event_lock_irq_cmd(log->iounit_wait,
> r5l_reclaimable_space(log) >= reclaim_target ||
> (list_empty() && .... list_empty()),
> log->io_list_lock,
> md_wakeup_thread(log->rdev->mddev->thread));
>
> ??
>
>
> Otherwise I like it. Thanks.
I agree with a lot of your points, but I tried to match the old behavior
as close as possible. I can change it to something closer to your
suggestion if Shaohua agrees.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-15 21:54 ` Shaohua Li
@ 2015-09-17 1:53 ` Christoph Hellwig
2015-09-28 14:01 ` Christoph Hellwig
1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-17 1:53 UTC (permalink / raw)
To: Shaohua Li
Cc: Neil Brown, Christoph Hellwig, linux-raid@vger.kernel.org,
Kernel Team, dan.j.williams@intel.com
On Tue, Sep 15, 2015 at 02:54:59PM -0700, Shaohua Li wrote:
> > I don't want a sysfs config if we can possibly avoid it.
> >
> > Christoph's code sets FUA on every block written to the log, both data
> > and metadata. Is that really what we want?
> >
> > I don't know much of the hardware details, but wouldn't setting FUA and
> > FLUSH on the last block written be just as effective and possibly faster
> > (by giving more flexibility to lower layers)??
>
> How is it different against without FUA, eg, doing a flush after several bios?
It's just a more complicated version of doing the same..
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-15 21:54 ` Shaohua Li
2015-09-17 1:53 ` Christoph Hellwig
@ 2015-09-28 14:01 ` Christoph Hellwig
2015-09-30 5:39 ` Neil Brown
1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-28 14:01 UTC (permalink / raw)
To: Shaohua Li
Cc: Neil Brown, Christoph Hellwig, linux-raid@vger.kernel.org,
Kernel Team, dan.j.williams@intel.com
So the summary is that for now you want me to resend with a patch
to opt into using FUA?
Also do you have a git tree as a baseline somewhere? I've been collecting
the patches you've sent, but with increasing time I fear I might have
missed something.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-28 14:01 ` Christoph Hellwig
@ 2015-09-30 5:39 ` Neil Brown
2015-09-30 15:00 ` Christoph Hellwig
0 siblings, 1 reply; 24+ messages in thread
From: Neil Brown @ 2015-09-30 5:39 UTC (permalink / raw)
To: Shaohua Li
Cc: Christoph Hellwig, linux-raid@vger.kernel.org, Kernel Team,
dan.j.williams@intel.com
[-- Attachment #1: Type: text/plain, Size: 662 bytes --]
Christoph Hellwig <hch@lst.de> writes:
> So the summary is that for now you want me to resend with a patch
> to opt into using FUA?
I'd like to avoid "opt in" if at all possible.
Shoahua measured that using "FUA" for all writes to the journal
hurt performance on at least one device. Do you have a different device
where it demonstrably helps?
If there any chance of automatically detecting which is which?
>
> Also do you have a git tree as a baseline somewhere? I've been collecting
> the patches you've sent, but with increasing time I fear I might have
> missed something.
(almost) all at git://neil.brown.name/md in 'devel' branch.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: raid5-cache I/O path improvements V2
2015-09-30 5:39 ` Neil Brown
@ 2015-09-30 15:00 ` Christoph Hellwig
0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2015-09-30 15:00 UTC (permalink / raw)
To: Neil Brown
Cc: Christoph Hellwig, Shaohua Li, linux-raid@vger.kernel.org,
Kernel Team, dan.j.williams@intel.com
On Wed, Sep 30, 2015 at 03:39:52PM +1000, Neil Brown wrote:
> Christoph Hellwig <hch@lst.de> writes:
>
> > So the summary is that for now you want me to resend with a patch
> > to opt into using FUA?
>
> I'd like to avoid "opt in" if at all possible.
> Shoahua measured that using "FUA" for all writes to the journal
> hurt performance on at least one device. Do you have a different device
> where it demonstrably helps?
> If there any chance of automatically detecting which is which?
I have a high end SAS SSD where it helps, but the real use case where
it makes a major difference are battery backed dimms (NV-DIMMS) or
other devices where we don't even need the FUA bit as they don't have
a cache at all. The important part is to avoid the batching up for
the non-existant flush in that case.
So I could defintively default the code to on only for those, but not
even allowing a tunable for devices that have the FUA bit seems like
an odd restriction.
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-09-30 15:00 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-12 6:17 raid5-cache I/O path improvements V2 Christoph Hellwig
2015-09-12 6:17 ` [PATCH 01/12] raid5-cache: port to 4.3-rc Christoph Hellwig
2015-09-12 6:17 ` [PATCH 02/12] raid5-cache: free I/O units earlier Christoph Hellwig
2015-09-15 7:00 ` Neil Brown
2015-09-17 1:50 ` Christoph Hellwig
2015-09-15 8:07 ` Neil Brown
2015-09-17 1:48 ` Christoph Hellwig
2015-09-12 6:17 ` [PATCH 03/12] raid5-cache: rename flushed_ios to finished_ios Christoph Hellwig
2015-09-12 6:17 ` [PATCH 04/12] raid5-cache: factor out a helper to run all stripes for an I/O unit Christoph Hellwig
2015-09-12 6:17 ` [PATCH 05/12] raid5-cache: use FUA writes for the log Christoph Hellwig
2015-09-12 6:17 ` [PATCH 06/12] raid5-cache: clean up r5l_get_meta Christoph Hellwig
2015-09-12 6:17 ` [PATCH 07/12] raid5-cache: refactor bio allocation Christoph Hellwig
2015-09-12 6:17 ` [PATCH 08/12] raid5-cache: take rdev->data_offset into account early on Christoph Hellwig
2015-09-12 6:17 ` [PATCH 09/12] raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta Christoph Hellwig
2015-09-12 6:17 ` [PATCH 10/12] raid5-cache: new helper: r5_reserve_log_entry Christoph Hellwig
2015-09-12 6:17 ` [PATCH 11/12] raid5-cache: small log->seq cleanup Christoph Hellwig
2015-09-12 6:17 ` [PATCH 12/12] raid5-cache: use bio chaining Christoph Hellwig
2015-09-14 19:11 ` raid5-cache I/O path improvements V2 Shaohua Li
2015-09-15 7:23 ` Neil Brown
2015-09-15 21:54 ` Shaohua Li
2015-09-17 1:53 ` Christoph Hellwig
2015-09-28 14:01 ` Christoph Hellwig
2015-09-30 5:39 ` Neil Brown
2015-09-30 15:00 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).