[PATCH 0/2] patches to improve cluster raid1 performance [V2]

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] patches to improve cluster raid1 performance [V2]
@ 2013-09-26 10:50 dongmao zhang
  2013-09-26 10:50 ` [PATCH 1/2] improve the performance of dm-log-userspace dongmao zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: dongmao zhang @ 2013-09-26 10:50 UTC (permalink / raw)
  To: dm-devel; +Cc: dongmao zhang, agk

This patch change DM_ULOG_REQUEST_VERSION from 2 to 3. It could
tell cmirrord that kernel is now supporting delay some flushes, and
cmirrord will do someting accordingly.

Based on my test result, the cluster raid1 writes loss 80% performance. I found
that the most time is occupied by the function userspace_flush.

Usually userspace_flush needs three stages(mark, clear, flush)to communicate with cmirrord.

From the cmirrord's perspective, mark and flush functions run cluster_send first and then return to 
the kernel.

In other words, the requests of mark_region and flush in userspace_flush 
at least require a cluster_send period to finish. this is the root cause
of bad performance.

The idea is to merge flush and mark request together in both cmirrord and dm-log-userspace.

We run clog_flush directly after clog_mark_region. So the userspace_flush do not 
have to request flush again after requesting mark_region.  Moreover, I think the flush 
of clear region could be delayed. If we have both mark and clear region, only sending 
a mark_region request is OK, because clog_flush will automatically run.(ignore the clean_region 
time). It only takes one cluster_send period. If we have only mark region, 
a mark_region request is also OK, it takes one cluster_send period. If we have only 
clear_region, we could delay the flush of cleared_region for 3 seconds.

Overall, before the patch, mark_region require approximately two cluster_send
period(mark_region and flush), after the patch, mark_region only needs one 
cluster_send period. Based on my test, the performance is as twice as before.

dongmao zhang (2):
  improve the performance of dm-log-userspace
  change API of dm-log-userspace to support delay flush

 drivers/md/dm-log-userspace-base.c    |  109 ++++++++++++++++++++++++++++++---
 include/uapi/linux/dm-log-userspace.h |    2 +-
 2 files changed, 101 insertions(+), 10 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] improve the performance of dm-log-userspace
  2013-09-26 10:50 [PATCH 0/2] patches to improve cluster raid1 performance [V2] dongmao zhang
@ 2013-09-26 10:50 ` dongmao zhang
  2013-09-26 10:50 ` [PATCH 2/2] change API of dm-log-userspace to support delay flush dongmao zhang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: dongmao zhang @ 2013-09-26 10:50 UTC (permalink / raw)
  To: dm-devel; +Cc: dongmao zhang, agk

In the cluster evironment, cluster write has poor performance.
Because userspace_flush has to access userspace program(cmirrord)
for clear/mark/flush request. But the mark and flush requests
require cmirrord send message to  all the cluster nodes. This behave
is realy slow. So the idea is merging mark and flush
request together and to reduce the kernel-userspace-kernel time.
Moreover, when only sending clear request, the flush request could
be delayed. Added a workqueue to run delayed flush request.

Signed-off-by: dongmao zhang <dmzhang@suse.com>
---
 drivers/md/dm-log-userspace-base.c |   65 +++++++++++++++++++++++++++++++++---
 1 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-log-userspace-base.c b/drivers/md/dm-log-userspace-base.c
index 9429159..d5f4a1c 100644
--- a/drivers/md/dm-log-userspace-base.c
+++ b/drivers/md/dm-log-userspace-base.c
@@ -11,6 +11,7 @@
 #include <linux/dm-log-userspace.h>
 #include <linux/module.h>
 
+#include <linux/workqueue.h>
 #include "dm-log-userspace-transfer.h"
 
 #define DM_LOG_USERSPACE_VSN "1.1.0"
@@ -58,6 +59,11 @@ struct log_c {
 	spinlock_t flush_lock;
 	struct list_head mark_list;
 	struct list_head clear_list;
+
+	/*work queue for flush clear region*/
+	struct workqueue_struct *dmlog_wq;
+	struct delayed_work flush_log_work;
+	atomic_t sched_flush;
 };
 
 static mempool_t *flush_entry_pool;
@@ -141,6 +147,17 @@ static int build_constructor_string(struct dm_target *ti,
 	return str_size;
 }
 
+static void do_flush(struct delayed_work *work)
+{
+	int r;
+	struct log_c *lc = container_of(work, struct log_c, flush_log_work);
+	r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH,
+						 NULL, 0, NULL, NULL);
+	atomic_set(&lc->sched_flush, 0);
+	if (r)
+		dm_table_event(lc->ti->table);
+}
+
 /*
  * userspace_ctr
  *
@@ -234,6 +251,17 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
 	lc->region_size = (uint32_t)rdata;
 	lc->region_count = dm_sector_div_up(ti->len, lc->region_size);
 
+	lc->dmlog_wq =  alloc_workqueue("dmlogd",
+			WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);
+	if (!lc->dmlog_wq) {
+		DMERR("couldn't start dmlogd");
+		r = -ENOMEM;
+		goto out;
+	}
+
+	INIT_DELAYED_WORK(&lc->flush_log_work, do_flush);
+	atomic_set(&lc->sched_flush, 0);
+
 	if (devices_rdata_size) {
 		if (devices_rdata[devices_rdata_size - 1] != '\0') {
 			DMERR("DM_ULOG_CTR device return string not properly terminated");
@@ -264,6 +292,13 @@ static void userspace_dtr(struct dm_dirty_log *log)
 {
 	struct log_c *lc = log->context;
 
+	/*flush workqueue*/
+	if (atomic_read(&lc->sched_flush))
+		flush_delayed_work(&lc->flush_log_work);
+
+	flush_workqueue(lc->dmlog_wq);
+	destroy_workqueue(lc->dmlog_wq);
+
 	(void) dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_DTR,
 				 NULL, 0,
 				 NULL, NULL);
@@ -294,6 +329,10 @@ static int userspace_postsuspend(struct dm_dirty_log *log)
 	int r;
 	struct log_c *lc = log->context;
 
+	/*run planed flush*/
+	if (atomic_read(&lc->sched_flush))
+		flush_delayed_work(&lc->flush_log_work);
+
 	r = dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_POSTSUSPEND,
 				 NULL, 0,
 				 NULL, NULL);
@@ -474,6 +513,8 @@ static int userspace_flush(struct dm_dirty_log *log)
 	int r = 0;
 	unsigned long flags;
 	struct log_c *lc = log->context;
+	int is_mark_list_empty;
+	int is_clear_list_empty;
 	LIST_HEAD(mark_list);
 	LIST_HEAD(clear_list);
 	struct flush_entry *fe, *tmp_fe;
@@ -483,19 +524,33 @@ static int userspace_flush(struct dm_dirty_log *log)
 	list_splice_init(&lc->clear_list, &clear_list);
 	spin_unlock_irqrestore(&lc->flush_lock, flags);
 
-	if (list_empty(&mark_list) && list_empty(&clear_list))
+	is_mark_list_empty = list_empty(&mark_list);
+	is_clear_list_empty = list_empty(&clear_list);
+
+	if (is_mark_list_empty && is_clear_list_empty)
 		return 0;
 
-	r = flush_by_group(lc, &mark_list);
+
+	r = flush_by_group(lc, &clear_list);
 	if (r)
 		goto fail;
 
-	r = flush_by_group(lc, &clear_list);
+	r = flush_by_group(lc, &mark_list);
 	if (r)
 		goto fail;
 
-	r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH,
-				 NULL, 0, NULL, NULL);
+	if (!is_clear_list_empty && is_mark_list_empty
+				 && !atomic_read(&lc->sched_flush)) {
+		/* plan a flush */
+		queue_delayed_work(lc->dmlog_wq, &lc->flush_log_work , 3 * HZ);
+		atomic_set(&lc->sched_flush, 1);
+	} else {
+		/* cancel pending flush because already flushed in mark_region*/
+		cancel_delayed_work(&lc->flush_log_work);
+		atomic_set(&lc->sched_flush, 0);
+	}
+
+
 
 fail:
 	/*
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] change API of dm-log-userspace to support delay flush
  2013-09-26 10:50 [PATCH 0/2] patches to improve cluster raid1 performance [V2] dongmao zhang
  2013-09-26 10:50 ` [PATCH 1/2] improve the performance of dm-log-userspace dongmao zhang
@ 2013-09-26 10:50 ` dongmao zhang
  2013-09-26 17:17 ` [PATCH 0/2] patches to improve cluster raid1 performance [V2] Brassow Jonathan
  2013-09-30 14:24 ` Brassow Jonathan
  3 siblings, 0 replies; 10+ messages in thread
From: dongmao zhang @ 2013-09-26 10:50 UTC (permalink / raw)
  To: dm-devel; +Cc: dongmao zhang, agk

Signed-off-by: dongmao zhang <dmzhang@suse.com>
---
 drivers/md/dm-log-userspace-base.c    |   80 ++++++++++++++++++++++++---------
 include/uapi/linux/dm-log-userspace.h |    2 +-
 2 files changed, 59 insertions(+), 23 deletions(-)

diff --git a/drivers/md/dm-log-userspace-base.c b/drivers/md/dm-log-userspace-base.c
index d5f4a1c..f455d95 100644
--- a/drivers/md/dm-log-userspace-base.c
+++ b/drivers/md/dm-log-userspace-base.c
@@ -14,7 +14,7 @@
 #include <linux/workqueue.h>
 #include "dm-log-userspace-transfer.h"
 
-#define DM_LOG_USERSPACE_VSN "1.1.0"
+#define DM_LOG_USERSPACE_VSN "1.2.0"
 
 struct flush_entry {
 	int type;
@@ -30,6 +30,8 @@ struct flush_entry {
  */
 #define MAX_FLUSH_GROUP_COUNT 32
 
+#define SUPPORT_DELAY_FLUSH "support_delay_flush"
+
 struct log_c {
 	struct dm_target *ti;
 	struct dm_dev *log_dev;
@@ -64,6 +66,7 @@ struct log_c {
 	struct workqueue_struct *dmlog_wq;
 	struct delayed_work flush_log_work;
 	atomic_t sched_flush;
+	uint32_t support_delay_flush;
 };
 
 static mempool_t *flush_entry_pool;
@@ -158,6 +161,8 @@ static void do_flush(struct delayed_work *work)
 		dm_table_event(lc->ti->table);
 }
 
+
+
 /*
  * userspace_ctr
  *
@@ -181,7 +186,11 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
 	uint64_t rdata;
 	size_t rdata_size = sizeof(rdata);
 	char *devices_rdata = NULL;
-	size_t devices_rdata_size = DM_NAME_LEN;
+
+	char *logdev = NULL;
+	char *delay_flush_string = NULL;
+
+	size_t devices_rdata_size = DM_NAME_LEN + sizeof(SUPPORT_DELAY_FLUSH);
 
 	if (argc < 3) {
 		DMWARN("Too few arguments to userspace dirty log");
@@ -251,16 +260,8 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
 	lc->region_size = (uint32_t)rdata;
 	lc->region_count = dm_sector_div_up(ti->len, lc->region_size);
 
-	lc->dmlog_wq =  alloc_workqueue("dmlogd",
-			WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);
-	if (!lc->dmlog_wq) {
-		DMERR("couldn't start dmlogd");
-		r = -ENOMEM;
-		goto out;
-	}
 
-	INIT_DELAYED_WORK(&lc->flush_log_work, do_flush);
-	atomic_set(&lc->sched_flush, 0);
+	lc->support_delay_flush = 0;
 
 	if (devices_rdata_size) {
 		if (devices_rdata[devices_rdata_size - 1] != '\0') {
@@ -268,12 +269,34 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
 			r = -EINVAL;
 			goto out;
 		}
-		r = dm_get_device(ti, devices_rdata,
+
+		logdev = strsep(&devices_rdata, " ");
+		delay_flush_string = strsep(&devices_rdata, " ");
+
+		if (delay_flush_string &&
+			!strcmp(delay_flush_string, SUPPORT_DELAY_FLUSH))
+			lc->support_delay_flush = 1;
+
+		r = dm_get_device(ti, logdev,
 				  dm_table_get_mode(ti->table), &lc->log_dev);
 		if (r)
 			DMERR("Failed to register %s with device-mapper",
-			      devices_rdata);
+			      logdev);
 	}
+
+	if (lc->support_delay_flush) {
+		lc->dmlog_wq =  alloc_workqueue("dmlogd",
+				WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);
+		if (!lc->dmlog_wq) {
+			DMERR("couldn't start dmlogd");
+			r = -ENOMEM;
+			goto out;
+		}
+
+		INIT_DELAYED_WORK(&lc->flush_log_work, do_flush);
+		atomic_set(&lc->sched_flush, 0);
+	}
+
 out:
 	kfree(devices_rdata);
 	if (r) {
@@ -292,12 +315,14 @@ static void userspace_dtr(struct dm_dirty_log *log)
 {
 	struct log_c *lc = log->context;
 
-	/*flush workqueue*/
-	if (atomic_read(&lc->sched_flush))
-		flush_delayed_work(&lc->flush_log_work);
+	if (lc->support_delay_flush) {
+		/*flush workqueue*/
+		if (atomic_read(&lc->sched_flush))
+			flush_delayed_work(&lc->flush_log_work);
 
-	flush_workqueue(lc->dmlog_wq);
-	destroy_workqueue(lc->dmlog_wq);
+		flush_workqueue(lc->dmlog_wq);
+		destroy_workqueue(lc->dmlog_wq);
+	}
 
 	(void) dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_DTR,
 				 NULL, 0,
@@ -329,8 +354,8 @@ static int userspace_postsuspend(struct dm_dirty_log *log)
 	int r;
 	struct log_c *lc = log->context;
 
-	/*run planed flush*/
-	if (atomic_read(&lc->sched_flush))
+	/*run planed flush earlier*/
+	if (lc->support_delay_flush && atomic_read(&lc->sched_flush))
 		flush_delayed_work(&lc->flush_log_work);
 
 	r = dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_POSTSUSPEND,
@@ -539,13 +564,24 @@ static int userspace_flush(struct dm_dirty_log *log)
 	if (r)
 		goto fail;
 
+	/* If we do not support delay flush, just contact
+	 * userspace immediately.
+	 * If we support delay flush and only has clear region,
+	 * we could postpone * the flush command
+	 */
+	if (!lc->support_delay_flush) {
+		r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH,
+				NULL, 0, NULL, NULL);
+		goto fail;
+	}
+
+	/* when only has clear region,we plan a flush in the furture*/
 	if (!is_clear_list_empty && is_mark_list_empty
 				 && !atomic_read(&lc->sched_flush)) {
-		/* plan a flush */
 		queue_delayed_work(lc->dmlog_wq, &lc->flush_log_work , 3 * HZ);
 		atomic_set(&lc->sched_flush, 1);
+	/* cancel pending flush because we have already flushed in mark_region*/
 	} else {
-		/* cancel pending flush because already flushed in mark_region*/
 		cancel_delayed_work(&lc->flush_log_work);
 		atomic_set(&lc->sched_flush, 0);
 	}
diff --git a/include/uapi/linux/dm-log-userspace.h b/include/uapi/linux/dm-log-userspace.h
index 0678c2a..5fbeb2c 100644
--- a/include/uapi/linux/dm-log-userspace.h
+++ b/include/uapi/linux/dm-log-userspace.h
@@ -386,7 +386,7 @@
  *	            device name that is to be registered with DM via
  *	            'dm_get_device'.
  */
-#define DM_ULOG_REQUEST_VERSION 2
+#define DM_ULOG_REQUEST_VERSION 3
 
 struct dm_ulog_request {
 	/*
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-09-26 10:50 [PATCH 0/2] patches to improve cluster raid1 performance [V2] dongmao zhang
  2013-09-26 10:50 ` [PATCH 1/2] improve the performance of dm-log-userspace dongmao zhang
  2013-09-26 10:50 ` [PATCH 2/2] change API of dm-log-userspace to support delay flush dongmao zhang
@ 2013-09-26 17:17 ` Brassow Jonathan
  2013-09-27  7:59   ` Dongmao Zhang
  2013-09-30 14:24 ` Brassow Jonathan
  3 siblings, 1 reply; 10+ messages in thread
From: Brassow Jonathan @ 2013-09-26 17:17 UTC (permalink / raw)
  To: device-mapper development; +Cc: agk, dongmao zhang

I'll try to look these patches over during the next couple days.  Thanks for sending.

One thing I'll be looking for is that any delays in the flush don't violate the need for dm-raid1.c to ensure that the log changes have hit the disk before performing any writes.

 brassow

On Sep 26, 2013, at 5:50 AM, dongmao zhang wrote:

> This patch change DM_ULOG_REQUEST_VERSION from 2 to 3. It could
> tell cmirrord that kernel is now supporting delay some flushes, and
> cmirrord will do someting accordingly.
> 
> 
> Based on my test result, the cluster raid1 writes loss 80% performance. I found
> that the most time is occupied by the function userspace_flush.
> 
> Usually userspace_flush needs three stages(mark, clear, flush)to communicate with cmirrord.
> 
>> From the cmirrord's perspective, mark and flush functions run cluster_send first and then return to 
> the kernel.
> 
> In other words, the requests of mark_region and flush in userspace_flush 
> at least require a cluster_send period to finish. this is the root cause
> of bad performance.
> 
> The idea is to merge flush and mark request together in both cmirrord and dm-log-userspace.
> 
> We run clog_flush directly after clog_mark_region. So the userspace_flush do not 
> have to request flush again after requesting mark_region.  Moreover, I think the flush 
> of clear region could be delayed. If we have both mark and clear region, only sending 
> a mark_region request is OK, because clog_flush will automatically run.(ignore the clean_region 
> time). It only takes one cluster_send period. If we have only mark region, 
> a mark_region request is also OK, it takes one cluster_send period. If we have only 
> clear_region, we could delay the flush of cleared_region for 3 seconds.
> 
> Overall, before the patch, mark_region require approximately two cluster_send
> period(mark_region and flush), after the patch, mark_region only needs one 
> cluster_send period. Based on my test, the performance is as twice as before.
> 
> 
> 
> dongmao zhang (2):
>  improve the performance of dm-log-userspace
>  change API of dm-log-userspace to support delay flush
> 
> drivers/md/dm-log-userspace-base.c    |  109 ++++++++++++++++++++++++++++++---
> include/uapi/linux/dm-log-userspace.h |    2 +-
> 2 files changed, 101 insertions(+), 10 deletions(-)
> 
> -- 
> 1.7.3.4
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-09-26 17:17 ` [PATCH 0/2] patches to improve cluster raid1 performance [V2] Brassow Jonathan
@ 2013-09-27  7:59   ` Dongmao Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Dongmao Zhang @ 2013-09-27  7:59 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: dm-devel, agk

great, thanks for your response:)


于 2013年09月27日 01:17, Brassow Jonathan 写道:
> I'll try to look these patches over during the next couple days.  Thanks for sending.
>
> One thing I'll be looking for is that any delays in the flush don't violate the need for dm-raid1.c to ensure that the log changes have hit the disk before performing any writes.
>
>   brassow
>
> On Sep 26, 2013, at 5:50 AM, dongmao zhang wrote:
>
>> This patch change DM_ULOG_REQUEST_VERSION from 2 to 3. It could
>> tell cmirrord that kernel is now supporting delay some flushes, and
>> cmirrord will do someting accordingly.
>>
>>
>> Based on my test result, the cluster raid1 writes loss 80% performance. I found
>> that the most time is occupied by the function userspace_flush.
>>
>> Usually userspace_flush needs three stages(mark, clear, flush)to communicate with cmirrord.
>>
>>>  From the cmirrord's perspective, mark and flush functions run cluster_send first and then return to
>> the kernel.
>>
>> In other words, the requests of mark_region and flush in userspace_flush
>> at least require a cluster_send period to finish. this is the root cause
>> of bad performance.
>>
>> The idea is to merge flush and mark request together in both cmirrord and dm-log-userspace.
>>
>> We run clog_flush directly after clog_mark_region. So the userspace_flush do not
>> have to request flush again after requesting mark_region.  Moreover, I think the flush
>> of clear region could be delayed. If we have both mark and clear region, only sending
>> a mark_region request is OK, because clog_flush will automatically run.(ignore the clean_region
>> time). It only takes one cluster_send period. If we have only mark region,
>> a mark_region request is also OK, it takes one cluster_send period. If we have only
>> clear_region, we could delay the flush of cleared_region for 3 seconds.
>>
>> Overall, before the patch, mark_region require approximately two cluster_send
>> period(mark_region and flush), after the patch, mark_region only needs one
>> cluster_send period. Based on my test, the performance is as twice as before.
>>
>>
>>
>> dongmao zhang (2):
>>   improve the performance of dm-log-userspace
>>   change API of dm-log-userspace to support delay flush
>>
>> drivers/md/dm-log-userspace-base.c    |  109 ++++++++++++++++++++++++++++++---
>> include/uapi/linux/dm-log-userspace.h |    2 +-
>> 2 files changed, 101 insertions(+), 10 deletions(-)
>>
>> --
>> 1.7.3.4
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-09-26 10:50 [PATCH 0/2] patches to improve cluster raid1 performance [V2] dongmao zhang
                   ` (2 preceding siblings ...)
  2013-09-26 17:17 ` [PATCH 0/2] patches to improve cluster raid1 performance [V2] Brassow Jonathan
@ 2013-09-30 14:24 ` Brassow Jonathan
  2013-10-08  8:33   ` Dong Mao Zhang
  3 siblings, 1 reply; 10+ messages in thread
From: Brassow Jonathan @ 2013-09-30 14:24 UTC (permalink / raw)
  To: device-mapper development; +Cc: Alasdair Kergon, dongmao zhang

Hi,

The idea behind these patches (both the kernel and userspace) seems promising.  Basically, you want the userspace daemon (cmirrord) to implicitly flush when a 'mark' request is received.  The kernel is then allowed to skip 'flush' requests that it knows userspace is going to take care of anyway.  Further, since delaying 'clear' requests does not have an affect on data integrity after a machine failure, it is ok to do so and gain some additional performance during nominal operation for some minor potential slowdown during recovery scenarios.

Couple quick questions/comments:
1) Can you tell me more about how you are testing and comparing results?

2) I don't see a benefit to splitting the kernel updates into two patches - one patch will probably do.  For the userspace patches, I would combine 1 and 2 (cmirrord daemon related patches) and then combine 3 and 4 (core-LVM related patches).

There are some comments and other minor things to clean-up after that.  For example, I don't know if I like the name 'DM_SUPPORT_DELAY_FLUSH'...  I might rather prefer something that indicates that the 'mark' request is also responsible for 'flush'.  What is happening must be clear to anyone who might write a new log daemon in the future.  They must realize that the key change for them is that the 'mark' must handle the 'flush' - everything else is the same from their perspective.  It amounts to an API change.

thanks,
 brassow

On Sep 26, 2013, at 5:50 AM, dongmao zhang wrote:

> This patch change DM_ULOG_REQUEST_VERSION from 2 to 3. It could
> tell cmirrord that kernel is now supporting delay some flushes, and
> cmirrord will do someting accordingly.
> 
> 
> Based on my test result, the cluster raid1 writes loss 80% performance. I found
> that the most time is occupied by the function userspace_flush.
> 
> Usually userspace_flush needs three stages(mark, clear, flush)to communicate with cmirrord.
> 
>> From the cmirrord's perspective, mark and flush functions run cluster_send first and then return to 
> the kernel.
> 
> In other words, the requests of mark_region and flush in userspace_flush 
> at least require a cluster_send period to finish. this is the root cause
> of bad performance.
> 
> The idea is to merge flush and mark request together in both cmirrord and dm-log-userspace.
> 
> We run clog_flush directly after clog_mark_region. So the userspace_flush do not 
> have to request flush again after requesting mark_region.  Moreover, I think the flush 
> of clear region could be delayed. If we have both mark and clear region, only sending 
> a mark_region request is OK, because clog_flush will automatically run.(ignore the clean_region 
> time). It only takes one cluster_send period. If we have only mark region, 
> a mark_region request is also OK, it takes one cluster_send period. If we have only 
> clear_region, we could delay the flush of cleared_region for 3 seconds.
> 
> Overall, before the patch, mark_region require approximately two cluster_send
> period(mark_region and flush), after the patch, mark_region only needs one 
> cluster_send period. Based on my test, the performance is as twice as before.
> 
> 
> 
> dongmao zhang (2):
>  improve the performance of dm-log-userspace
>  change API of dm-log-userspace to support delay flush
> 
> drivers/md/dm-log-userspace-base.c    |  109 ++++++++++++++++++++++++++++++---
> include/uapi/linux/dm-log-userspace.h |    2 +-
> 2 files changed, 101 insertions(+), 10 deletions(-)
> 
> -- 
> 1.7.3.4
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-09-30 14:24 ` Brassow Jonathan
@ 2013-10-08  8:33   ` Dong Mao Zhang
  2013-10-10 16:29     ` Brassow Jonathan
  0 siblings, 1 reply; 10+ messages in thread
From: Dong Mao Zhang @ 2013-10-08  8:33 UTC (permalink / raw)
  To: dm-devel, jbrassow



>>> Brassow Jonathan  09/30/13 10:28 PM >>>
Hi,

Couple quick questions/comments:
>> 1) Can you tell me more about how you are testing and comparing results?

I just setup a cluster raid1, and compare the performance before and after this patch by 'dd' command.(the block size in dd command is 
4k, 8k).
And it was tested on two kvm nodes. I guess I will test it on the real nodes in the next two week.
Do you have other suggestions on the test scenario?


>>There are some comments and other minor things to clean-up after that.  For example, I don't know if I like the name 'DM_SUPPORT_DELAY_FLUSH'...  I might >>rather prefer something that indicates that the 'mark' request is also responsible for 'flush'.  What is happening must be clear to anyone who might write a new >>log daemon in the future.  They must realize that the key change for them is that the 'mark' must handle the 'flush' - everything else is the same from their >>perspective.  It amounts to an API change.
Yes, I agree to the patch combines.
And how about the name 'DM_FLUSH_WITH_MARK'? any other ideas?



Thank you.




On Sep 26, 2013, at 5:50 AM, dongmao zhang wrote:

> This patch change DM_ULOG_REQUEST_VERSION from 2 to 3. It could
> tell cmirrord that kernel is now supporting delay some flushes, and
> cmirrord will do someting accordingly.
> 
> 
> Based on my test result, the cluster raid1 writes loss 80% performance. I found
> that the most time is occupied by the function userspace_flush.
> 
> Usually userspace_flush needs three stages(mark, clear, flush)to communicate with cmirrord.
> 
>> From the cmirrord's perspective, mark and flush functions run cluster_send first and then return to 
> the kernel.
> 
> In other words, the requests of mark_region and flush in userspace_flush 
> at least require a cluster_send period to finish. this is the root cause
> of bad performance.
> 
> The idea is to merge flush and mark request together in both cmirrord and dm-log-userspace.
> 
> We run clog_flush directly after clog_mark_region. So the userspace_flush do not 
> have to request flush again after requesting mark_region.  Moreover, I think the flush 
> of clear region could be delayed. If we have both mark and clear region, only sending 
> a mark_region request is OK, because clog_flush will automatically run.(ignore the clean_region 
> time). It only takes one cluster_send period. If we have only mark region, 
> a mark_region request is also OK, it takes one cluster_send period. If we have only 
> clear_region, we could delay the flush of cleared_region for 3 seconds.
> 
> Overall, before the patch, mark_region require approximately two cluster_send
> period(mark_region and flush), after the patch, mark_region only needs one 
> cluster_send period. Based on my test, the performance is as twice as before.
> 
> 
> 
> dongmao zhang (2):
>  improve the performance of dm-log-userspace
>  change API of dm-log-userspace to support delay flush
> 
> drivers/md/dm-log-userspace-base.c    |  109 ++++++++++++++++++++++++++++++---
> include/uapi/linux/dm-log-userspace.h |    2 +-
> 2 files changed, 101 insertions(+), 10 deletions(-)
> 
> -- 
> 1.7.3.4
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-10-08  8:33   ` Dong Mao Zhang
@ 2013-10-10 16:29     ` Brassow Jonathan
  2013-10-15  6:32       ` Dongmao Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Brassow Jonathan @ 2013-10-10 16:29 UTC (permalink / raw)
  To: Dong Mao Zhang; +Cc: dm-devel

On Oct 8, 2013, at 3:33 AM, Dong Mao Zhang wrote:

>>>> 
>>>> Brassow Jonathan  09/30/13 10:28 PM >>>
> Hi,
> 
> Couple quick questions/comments:
>>> 1) Can you tell me more about how you are testing and comparing results?
> 
> I just setup a cluster raid1, and compare the performance before and after this patch by 'dd' command.(the block size in dd command is 
> 4k, 8k).
> And it was tested on two kvm nodes. I guess I will test it on the real nodes in the next two week.
> Do you have other suggestions on the test scenario?

Serial I/O is a good start, but won't show how things would perform in the real world.  If you are using cluster mirrors, then you are almost certainly using a clustered file system (e.g. GFS2).  What is the difference in performance when running file system tests on GFS when it is sitting on a linear LV vs a mirror LV vs a mirror LV with your changes?  What happens when you run these benchmarks in parallel from different machines on the same file system?

In the past, I have performed a test that involved untarring 'X' number of linux kernels - each into their own directory which were themselves in a directory that was named after the machine.  Once untarred, the directory and files are removed.  This can be performed for a number of iterations.  Performing a test in a directory named after the machine allows you to run the test in parallel on different machines while reducing file system clustered lock contention - meaning more of the focus is on the I/O performance of the lower layers.  This is just one example of a test that would create many files and directories.  Couple this test with your serial I/O test and you begin to get an idea of what performance would be like for different workloads.  If you want to go further, there are other
  benchmarks our there.

This type of testing satisfies a curiosity about quantifying the improvement, but understanding the patches you sent has already made it obvious that there will be at least some improvement.

> 
> 
>>> There are some comments and other minor things to clean-up after that.  For example, I don't know if I like the name 'DM_SUPPORT_DELAY_FLUSH'...  I might >>rather prefer something that indicates that the 'mark' request is also responsible for 'flush'.  What is happening must be clear to anyone who might write a new >>log daemon in the future.  They must realize that the key change for them is that the 'mark' must handle the 'flush' - everything else is the same from their >>perspective.  It amounts to an API change.
> Yes, I agree to the patch combines.
> And how about the name 'DM_FLUSH_WITH_MARK'? any other ideas?
> 

Sounds fine.  Although it can be confusing whether it should be DM_FLUSH_WITH_MARK (i.e. "flush when mark is performed") or DM_MARK_WITH_FLUSH (i.e. "mark has flush capability").  Perhaps DM_MARK_AND_FLUSH or DM_MARK_PERFORMS_FLUSH or DM_INTEGRATED_MARK_AND_FLUSH...

Thinking more about it...  From the kernel's perspective, it is the /flush/ function that sends the 'mark' requests (but doesn't then send the flush request - it is implicit).  From the userspace perspective, it is the /mark/ function that performs the 'flush' action.  It works and I get it, but it does seem like it could be a source of confusion.  It also makes it tough to come up with a descriptive name, since the kernel and userspace are doing opposite things.

Perhaps a better thing to do is to integrate the marks (and possibly clears) into the flush - DM_INTEGRATED_FLUSH.  The payload for the DM_ULOG_FLUSH flush communication is empty.  It seems to me now that it would make more sense to make use of that empty space and fill it with mark/clear requests.  What do you think?

 brassow

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-10-10 16:29     ` Brassow Jonathan
@ 2013-10-15  6:32       ` Dongmao Zhang
  2013-10-18 21:04         ` Brassow Jonathan
  0 siblings, 1 reply; 10+ messages in thread
From: Dongmao Zhang @ 2013-10-15  6:32 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: dm-devel

于 2013年10月11日 00:29, Brassow Jonathan 写道:
> Perhaps a better thing to do is to integrate the marks (and possibly
> clears) into the flush - DM_INTEGRATED_FLUSH.  The payload for the
> DM_ULOG_FLUSH flush communication is empty.  It seems to me now that
> it would make more sense to make use of that empty space and fill it
> with mark/clear requests.  What do you think?
>

DM_INTEGRATED_FLUSH is a good idea. It just simplify the confusion of
DM_FLUSH_WITH_MARK thing. My plan is to add (mark and clear) payload to
flush. It is like this:

'X' means we have this kind of request, while '0' means none.

mark request :     X               X               0

clear request:     X               0               X

flush method:    payload_flush   payload_flush   delayed flush.

If there is only clear requests, we can send the clear request first, 
and send normal flush later.  how do you like this strategy?

As for the testing, I only have ocfs2 available.(not GFS2). I will
test on that;

Dongmao Zhang.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] patches to improve cluster raid1 performance [V2]
  2013-10-15  6:32       ` Dongmao Zhang
@ 2013-10-18 21:04         ` Brassow Jonathan
  0 siblings, 0 replies; 10+ messages in thread
From: Brassow Jonathan @ 2013-10-18 21:04 UTC (permalink / raw)
  To: Dongmao Zhang; +Cc: dm-devel


On Oct 15, 2013, at 1:32 AM, Dongmao Zhang wrote:

> 于 2013年10月11日 00:29, Brassow Jonathan 写道:
>> Perhaps a better thing to do is to integrate the marks (and possibly
>> clears) into the flush - DM_INTEGRATED_FLUSH.  The payload for the
>> DM_ULOG_FLUSH flush communication is empty.  It seems to me now that
>> it would make more sense to make use of that empty space and fill it
>> with mark/clear requests.  What do you think?
>> 
> 
> DM_INTEGRATED_FLUSH is a good idea. It just simplify the confusion of
> DM_FLUSH_WITH_MARK thing. My plan is to add (mark and clear) payload to
> flush. It is like this:
> 
> 'X' means we have this kind of request, while '0' means none.
> 
> mark request :     X               X               0
> 
> clear request:     X               0               X
> 
> flush method:    payload_flush   payload_flush   delayed flush.
> 
> If there is only clear requests, we can send the clear request first, and send normal flush later.  how do you like this strategy?
> 
> As for the testing, I only have ocfs2 available.(not GFS2). I will
> test on that;

All sounds good.

 brassow


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-10-18 21:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-26 10:50 [PATCH 0/2] patches to improve cluster raid1 performance [V2] dongmao zhang
2013-09-26 10:50 ` [PATCH 1/2] improve the performance of dm-log-userspace dongmao zhang
2013-09-26 10:50 ` [PATCH 2/2] change API of dm-log-userspace to support delay flush dongmao zhang
2013-09-26 17:17 ` [PATCH 0/2] patches to improve cluster raid1 performance [V2] Brassow Jonathan
2013-09-27  7:59   ` Dongmao Zhang
2013-09-30 14:24 ` Brassow Jonathan
2013-10-08  8:33   ` Dong Mao Zhang
2013-10-10 16:29     ` Brassow Jonathan
2013-10-15  6:32       ` Dongmao Zhang
2013-10-18 21:04         ` Brassow Jonathan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.