[PATCH v2 0/4]zram: locking redesign

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4]zram: locking redesign
@ 2014-01-15  1:11 Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 1/4] zram: use atomic operation for stat Minchan Kim
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Minchan Kim @ 2014-01-15  1:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand,
	Minchan Kim

Currently, zram->lock rw_semaphore is coarse-grained so it hurts
for scalability.
This patch try to enhance it with remove the lock in read path.

[1] uses atomic opeartion so it removes dependency of 32bit stat
from zram->lock.
[2] introduces table own lock instead of relying on zram->lock.
[3] remove free pending slot mess so it makes core very clean.
[4] finally removes zram->lock in read path and changes it with mutex.

So, output is wonderful. read/write mixed workload performs well
11 times than old and write concurrency is also enhanced because
mutex supports SPIN_ON_OWNER while rw_semaphore doesn't yet.
(I know recenty there were some effort to enhance it for rw_semaphore
from Tim Chen but not sure it got merged. Anyway, we don't need it
any more and there is no reason to prevent read-write concurrency)

Thanks.

Minchan Kim (4):
  [1] zram: use atomic operation for stat
  [2] zram: introduce zram->tb_lock
  [3] zram: remove workqueue for freeing removed pending slot
  [4] zram: Remove zram->lock in read path and change it with mutex

 drivers/staging/zram/zram_drv.c | 117 ++++++++++++++++------------------------
 drivers/staging/zram/zram_drv.h |  27 +++-------
 2 files changed, 51 insertions(+), 93 deletions(-)

-- 
1.8.5.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/4] zram: use atomic operation for stat
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
@ 2014-01-15  1:11 ` Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 2/4] zram: introduce zram->tb_lock Minchan Kim
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2014-01-15  1:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand,
	Minchan Kim

Some of fields in zram->stats are protected by zram->lock which
is rather coarse-grained so let's use atomic operation without
explict locking.

This patch is ready for removing dependency of zram->lock in
read path which is very coarse-grained rw_semaphore.
Of course, this patch adds new atomic operation so it might make
slow but my 12CPU test couldn't spot any regression.
All gain/lose is marginal within stddev.

iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

==Initial write	==Initial write
records: 50		records: 50
avg:  412875.17		avg:  415638.23
std:   38543.12 (9.34%)	std:   36601.11 (8.81%)
max:  521262.03		max:  502976.72
min:  343263.13		min:  351389.12
==Rewrite	==Rewrite
records: 50		records: 50
avg:  416640.34		avg:  397914.33
std:   60798.92 (14.59%)	std:   46150.42 (11.60%)
max:  543057.07		max:  522669.17
min:  304071.67		min:  316588.77
==Read	==Read
records: 50		records: 50
avg: 4147338.63		avg: 4070736.51
std:  179333.25 (4.32%)	std:  223499.89 (5.49%)
max: 4459295.28		max: 4539514.44
min: 3753057.53		min: 3444686.31
==Re-read	==Re-read
records: 50		records: 50
avg: 4096706.71		avg: 4117218.57
std:  229735.04 (5.61%)	std:  171676.25 (4.17%)
max: 4430012.09		max: 4459263.94
min: 2987217.80		min: 3666904.28
==Reverse Read	==Reverse Read
records: 50		records: 50
avg: 4062763.83		avg: 4078508.32
std:  186208.46 (4.58%)	std:  172684.34 (4.23%)
max: 4401358.78		max: 4424757.22
min: 3381625.00		min: 3679359.94
==Stride read	==Stride read
records: 50		records: 50
avg: 4094933.49		avg: 4082170.22
std:  185710.52 (4.54%)	std:  196346.68 (4.81%)
max: 4478241.25		max: 4460060.97
min: 3732593.23		min: 3584125.78
==Random read	==Random read
records: 50		records: 50
avg: 4031070.04		avg: 4074847.49
std:  192065.51 (4.76%)	std:  206911.33 (5.08%)
max: 4356931.16		max: 4399442.56
min: 3481619.62		min: 3548372.44
==Mixed workload	==Mixed workload
records: 50		records: 50
avg:  149925.73		avg:  149675.54
std:    7701.26 (5.14%)	std:    6902.09 (4.61%)
max:  191301.56		max:  175162.05
min:  133566.28		min:  137762.87
==Random write	==Random write
records: 50		records: 50
avg:  404050.11		avg:  393021.47
std:   58887.57 (14.57%)	std:   42813.70 (10.89%)
max:  601798.09		max:  524533.43
min:  325176.99		min:  313255.34
==Pwrite	==Pwrite
records: 50		records: 50
avg:  411217.70		avg:  411237.96
std:   43114.99 (10.48%)	std:   33136.29 (8.06%)
max:  530766.79		max:  471899.76
min:  320786.84		min:  317906.94
==Pread	==Pread
records: 50		records: 50
avg: 4154908.65		avg: 4087121.92
std:  151272.08 (3.64%)	std:  219505.04 (5.37%)
max: 4459478.12		max: 4435857.38
min: 3730512.41		min: 3101101.67

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zram/zram_drv.c | 20 ++++++++++----------
 drivers/staging/zram/zram_drv.h | 16 ++++++----------
 2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 7889dd6048b9..6613225dfca1 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -104,7 +104,7 @@ static ssize_t zero_pages_show(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	return sprintf(buf, "%u\n", zram->stats.pages_zero);
+	return sprintf(buf, "%u\n", atomic_read(&zram->stats.pages_zero));
 }
 
 static ssize_t orig_data_size_show(struct device *dev,
@@ -113,7 +113,7 @@ static ssize_t orig_data_size_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		(u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+		(u64)(atomic_read(&zram->stats.pages_stored)) << PAGE_SHIFT);
 }
 
 static ssize_t compr_data_size_show(struct device *dev,
@@ -292,21 +292,21 @@ static void zram_free_page(struct zram *zram, size_t index)
 		 */
 		if (zram_test_flag(meta, index, ZRAM_ZERO)) {
 			zram_clear_flag(meta, index, ZRAM_ZERO);
-			zram->stats.pages_zero--;
+			atomic_dec(&zram->stats.pages_zero);
 		}
 		return;
 	}
 
 	if (unlikely(size > max_zpage_size))
-		zram->stats.bad_compress--;
+		atomic_dec(&zram->stats.bad_compress);
 
 	zs_free(meta->mem_pool, handle);
 
 	if (size <= PAGE_SIZE / 2)
-		zram->stats.good_compress--;
+		atomic_dec(&zram->stats.good_compress);
 
 	atomic64_sub(meta->table[index].size, &zram->stats.compr_size);
-	zram->stats.pages_stored--;
+	atomic_dec(&zram->stats.pages_stored);
 
 	meta->table[index].handle = 0;
 	meta->table[index].size = 0;
@@ -434,7 +434,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		/* Free memory associated with this sector now. */
 		zram_free_page(zram, index);
 
-		zram->stats.pages_zero++;
+		atomic_inc(&zram->stats.pages_zero);
 		zram_set_flag(meta, index, ZRAM_ZERO);
 		ret = 0;
 		goto out;
@@ -455,7 +455,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	}
 
 	if (unlikely(clen > max_zpage_size)) {
-		zram->stats.bad_compress++;
+		atomic_inc(&zram->stats.bad_compress);
 		clen = PAGE_SIZE;
 		src = NULL;
 		if (is_partial_io(bvec))
@@ -492,9 +492,9 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	/* Update stats */
 	atomic64_add(clen, &zram->stats.compr_size);
-	zram->stats.pages_stored++;
+	atomic_inc(&zram->stats.pages_stored);
 	if (clen <= PAGE_SIZE / 2)
-		zram->stats.good_compress++;
+		atomic_inc(&zram->stats.good_compress);
 
 out:
 	if (is_partial_io(bvec))
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 97a3acf6ab76..459483966c3d 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -69,10 +69,6 @@ struct table {
 	u8 flags;
 } __aligned(4);
 
-/*
- * All 64bit fields should only be manipulated by 64bit atomic accessors.
- * All modifications to 32bit counter should be protected by zram->lock.
- */
 struct zram_stats {
 	atomic64_t compr_size;	/* compressed size of pages stored */
 	atomic64_t num_reads;	/* failed + successful */
@@ -81,10 +77,10 @@ struct zram_stats {
 	atomic64_t failed_writes;	/* can happen when memory is too low */
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
 	atomic64_t notify_free;	/* no. of swap slot free notifications */
-	u32 pages_zero;		/* no. of zero filled pages */
-	u32 pages_stored;	/* no. of pages currently stored */
-	u32 good_compress;	/* % of pages with compression ratio<=50% */
-	u32 bad_compress;	/* % of pages with compression ratio>=75% */
+	atomic_t pages_zero;		/* no. of zero filled pages */
+	atomic_t pages_stored;	/* no. of pages currently stored */
+	atomic_t good_compress;	/* % of pages with compression ratio<=50% */
+	atomic_t bad_compress;	/* % of pages with compression ratio>=75% */
 };
 
 struct zram_meta {
@@ -102,8 +98,8 @@ struct zram_slot_free {
 struct zram {
 	struct zram_meta *meta;
 	struct rw_semaphore lock; /* protect compression buffers, table,
-				   * 32bit stat counters against concurrent
-				   * notifications, reads and writes */
+				   * reads and writes
+				   */
 
 	struct work_struct free_work;  /* handle pending free request */
 	struct zram_slot_free *slot_free_rq; /* list head of free request */
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/4] zram: introduce zram->tb_lock
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 1/4] zram: use atomic operation for stat Minchan Kim
@ 2014-01-15  1:11 ` Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 3/4] zram: remove workqueue for freeing removed pending slot Minchan Kim
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2014-01-15  1:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand,
	Minchan Kim

Currently, table is protected by zram->lock but it's rather
coarse-grained lock and it makes hard for scalibility.

Let's use own rwlock instead of depending on zram->lock.
This patch adds new locking so obviously, it would make slow
but this patch is just prepartion for removing coarse-grained
rw_semaphore(ie, zram->lock) which is hurdle about zram
scalability.

Final patch in this patchset series will remove the lock
from read-path and change rw_semaphore with mutex in write path.
With bonus, we could drop pending slot free mess in next patch.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zram/zram_drv.c | 26 +++++++++++++++++++++-----
 drivers/staging/zram/zram_drv.h |  3 ++-
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 6613225dfca1..8636f8511518 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -140,6 +140,7 @@ static ssize_t mem_used_total_show(struct device *dev,
 	return sprintf(buf, "%llu\n", val);
 }
 
+/* flag operations needs meta->tb_lock */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
 			enum zram_pageflags flag)
 {
@@ -227,6 +228,7 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
 		goto free_table;
 	}
 
+	rwlock_init(&meta->tb_lock);
 	return meta;
 
 free_table:
@@ -279,6 +281,7 @@ static void handle_zero_page(struct bio_vec *bvec)
 	flush_dcache_page(page);
 }
 
+/* NOTE: caller should hold meta->tb_lock with write-side */
 static void zram_free_page(struct zram *zram, size_t index)
 {
 	struct zram_meta *meta = zram->meta;
@@ -318,20 +321,26 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
 	size_t clen = PAGE_SIZE;
 	unsigned char *cmem;
 	struct zram_meta *meta = zram->meta;
-	unsigned long handle = meta->table[index].handle;
+	unsigned long handle;
+	u16 size;
+
+	read_lock(&meta->tb_lock);
+	handle = meta->table[index].handle;
+	size = meta->table[index].size;
 
 	if (!handle || zram_test_flag(meta, index, ZRAM_ZERO)) {
+		read_unlock(&meta->tb_lock);
 		clear_page(mem);
 		return 0;
 	}
 
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
-	if (meta->table[index].size == PAGE_SIZE)
+	if (size == PAGE_SIZE)
 		copy_page(mem, cmem);
 	else
-		ret = lzo1x_decompress_safe(cmem, meta->table[index].size,
-						mem, &clen);
+		ret = lzo1x_decompress_safe(cmem, size,	mem, &clen);
 	zs_unmap_object(meta->mem_pool, handle);
+	read_unlock(&meta->tb_lock);
 
 	/* Should NEVER happen. Return bio error if it does. */
 	if (unlikely(ret != LZO_E_OK)) {
@@ -352,11 +361,14 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 	struct zram_meta *meta = zram->meta;
 	page = bvec->bv_page;
 
+	read_lock(&meta->tb_lock);
 	if (unlikely(!meta->table[index].handle) ||
 			zram_test_flag(meta, index, ZRAM_ZERO)) {
+		read_unlock(&meta->tb_lock);
 		handle_zero_page(bvec);
 		return 0;
 	}
+	read_unlock(&meta->tb_lock);
 
 	if (is_partial_io(bvec))
 		/* Use  a temporary buffer to decompress the page */
@@ -432,10 +444,12 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	if (page_zero_filled(uncmem)) {
 		kunmap_atomic(user_mem);
 		/* Free memory associated with this sector now. */
+		write_lock(&zram->meta->tb_lock);
 		zram_free_page(zram, index);
+		zram_set_flag(meta, index, ZRAM_ZERO);
+		write_unlock(&zram->meta->tb_lock);
 
 		atomic_inc(&zram->stats.pages_zero);
-		zram_set_flag(meta, index, ZRAM_ZERO);
 		ret = 0;
 		goto out;
 	}
@@ -485,10 +499,12 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	 * Free memory associated with this sector
 	 * before overwriting unused sectors.
 	 */
+	write_lock(&zram->meta->tb_lock);
 	zram_free_page(zram, index);
 
 	meta->table[index].handle = handle;
 	meta->table[index].size = clen;
+	write_unlock(&zram->meta->tb_lock);
 
 	/* Update stats */
 	atomic64_add(clen, &zram->stats.compr_size);
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 459483966c3d..cf64bea3f7cc 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -84,6 +84,7 @@ struct zram_stats {
 };
 
 struct zram_meta {
+	rwlock_t tb_lock;	/* protect table */
 	void *compress_workmem;
 	void *compress_buffer;
 	struct table *table;
@@ -97,7 +98,7 @@ struct zram_slot_free {
 
 struct zram {
 	struct zram_meta *meta;
-	struct rw_semaphore lock; /* protect compression buffers, table,
+	struct rw_semaphore lock; /* protect compression buffers,
 				   * reads and writes
 				   */
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/4] zram: remove workqueue for freeing removed pending slot
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 1/4] zram: use atomic operation for stat Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 2/4] zram: introduce zram->tb_lock Minchan Kim
@ 2014-01-15  1:11 ` Minchan Kim
  2014-01-15  1:11 ` [PATCH v2 4/4] zram: Remove zram->lock in read path and change it with mutex Minchan Kim
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2014-01-15  1:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand,
	Minchan Kim

[1] introduced free request pending code to avoid scheduling
by mutex under spinlock and it was a mess which made code
lenghty and increased overhead.

Now, we don't need zram->lock any more to free slot so
this patch reverts it and then, tb_lock should protect it.

[1] a0c516c, zram: don't grab mutex in zram_slot_free_noity
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zram/zram_drv.c | 54 +++++------------------------------------
 drivers/staging/zram/zram_drv.h | 10 --------
 2 files changed, 6 insertions(+), 58 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 8636f8511518..cdc8697476b3 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -521,20 +521,6 @@ out:
 	return ret;
 }
 
-static void handle_pending_slot_free(struct zram *zram)
-{
-	struct zram_slot_free *free_rq;
-
-	spin_lock(&zram->slot_free_lock);
-	while (zram->slot_free_rq) {
-		free_rq = zram->slot_free_rq;
-		zram->slot_free_rq = free_rq->next;
-		zram_free_page(zram, free_rq->index);
-		kfree(free_rq);
-	}
-	spin_unlock(&zram->slot_free_lock);
-}
-
 static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 			int offset, struct bio *bio, int rw)
 {
@@ -546,7 +532,6 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 		up_read(&zram->lock);
 	} else {
 		down_write(&zram->lock);
-		handle_pending_slot_free(zram);
 		ret = zram_bvec_write(zram, bvec, index, offset);
 		up_write(&zram->lock);
 	}
@@ -565,8 +550,6 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 		return;
 	}
 
-	flush_work(&zram->free_work);
-
 	meta = zram->meta;
 	zram->init_done = 0;
 
@@ -766,40 +749,19 @@ error:
 	bio_io_error(bio);
 }
 
-static void zram_slot_free(struct work_struct *work)
-{
-	struct zram *zram;
-
-	zram = container_of(work, struct zram, free_work);
-	down_write(&zram->lock);
-	handle_pending_slot_free(zram);
-	up_write(&zram->lock);
-}
-
-static void add_slot_free(struct zram *zram, struct zram_slot_free *free_rq)
-{
-	spin_lock(&zram->slot_free_lock);
-	free_rq->next = zram->slot_free_rq;
-	zram->slot_free_rq = free_rq;
-	spin_unlock(&zram->slot_free_lock);
-}
-
 static void zram_slot_free_notify(struct block_device *bdev,
 				unsigned long index)
 {
 	struct zram *zram;
-	struct zram_slot_free *free_rq;
+	struct zram_meta *meta;
 
 	zram = bdev->bd_disk->private_data;
-	atomic64_inc(&zram->stats.notify_free);
-
-	free_rq = kmalloc(sizeof(struct zram_slot_free), GFP_ATOMIC);
-	if (!free_rq)
-		return;
+	meta = zram->meta;
 
-	free_rq->index = index;
-	add_slot_free(zram, free_rq);
-	schedule_work(&zram->free_work);
+	write_lock(&meta->tb_lock);
+	zram_free_page(zram, index);
+	write_unlock(&meta->tb_lock);
+	atomic64_inc(&zram->stats.notify_free);
 }
 
 static const struct block_device_operations zram_devops = {
@@ -846,10 +808,6 @@ static int create_device(struct zram *zram, int device_id)
 	init_rwsem(&zram->lock);
 	init_rwsem(&zram->init_lock);
 
-	INIT_WORK(&zram->free_work, zram_slot_free);
-	spin_lock_init(&zram->slot_free_lock);
-	zram->slot_free_rq = NULL;
-
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!zram->queue) {
 		pr_err("Error allocating disk queue for device %d\n",
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index cf64bea3f7cc..b371ae23421a 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -91,20 +91,11 @@ struct zram_meta {
 	struct zs_pool *mem_pool;
 };
 
-struct zram_slot_free {
-	unsigned long index;
-	struct zram_slot_free *next;
-};
-
 struct zram {
 	struct zram_meta *meta;
 	struct rw_semaphore lock; /* protect compression buffers,
 				   * reads and writes
 				   */
-
-	struct work_struct free_work;  /* handle pending free request */
-	struct zram_slot_free *slot_free_rq; /* list head of free request */
-
 	struct request_queue *queue;
 	struct gendisk *disk;
 	int init_done;
@@ -115,7 +106,6 @@ struct zram {
 	 * we can store in a disk.
 	 */
 	u64 disksize;	/* bytes */
-	spinlock_t slot_free_lock;
 
 	struct zram_stats stats;
 };
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/4] zram: Remove zram->lock in read path and change it with mutex
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
                   ` (2 preceding siblings ...)
  2014-01-15  1:11 ` [PATCH v2 3/4] zram: remove workqueue for freeing removed pending slot Minchan Kim
@ 2014-01-15  1:11 ` Minchan Kim
  2014-01-15  1:29 ` [PATCH v2 0/4]zram: locking redesign Minchan Kim
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2014-01-15  1:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand,
	Minchan Kim

Finally, we separated zram->lock dependency from 32bit stat/
table handling so there is no reason to use rw_semaphore
between read and write path so this patch removes the lock
from read path totally and changes rw_semaphore with mutex.
So, we could do

old:

read-read: OK
read-write: NO
write-write: NO

Now:

read-read: OK
read-write: OK
write-write: NO

So below data proves mixed workload performs well 11 times
and there is also enhance on write-write path because current
rw-semaphore doesn't support SPIN_ON_OWNER.
It's side effect but anyway good thing for us.

Write-related test perform better(from 61% to 1058%) but
read path has good/bad(from -2.22% to 1.45%) but they are all
marginal within stddev.

CPU 12
iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

==Initial write           ==Initial write
records: 10               records: 10
avg:  516189.16           avg:  839907.96
std:   22486.53 (4.36%)   std:   47902.17 (5.70%)
max:  546970.60           max:  909910.35
min:  481131.54           min:  751148.38
==Rewrite                 ==Rewrite
records: 10               records: 10
avg:  509527.98           avg: 1050156.37
std:   45799.94 (8.99%)   std:   40695.44 (3.88%)
max:  611574.27           max: 1111929.26
min:  443679.95           min:  980409.62
==Read                    ==Read
records: 10               records: 10
avg: 4408624.17           avg: 4472546.76
std:  281152.61 (6.38%)   std:  163662.78 (3.66%)
max: 4867888.66           max: 4727351.03
min: 4058347.69           min: 4126520.88
==Re-read                 ==Re-read
records: 10               records: 10
avg: 4462147.53           avg: 4363257.75
std:  283546.11 (6.35%)   std:  247292.63 (5.67%)
max: 4912894.44           max: 4677241.75
min: 4131386.50           min: 4035235.84
==Reverse Read            ==Reverse Read
records: 10               records: 10
avg: 4565865.97           avg: 4485818.08
std:  313395.63 (6.86%)   std:  248470.10 (5.54%)
max: 5232749.16           max: 4789749.94
min: 4185809.62           min: 3963081.34
==Stride read             ==Stride read
records: 10               records: 10
avg: 4515981.80           avg: 4418806.01
std:  211192.32 (4.68%)   std:  212837.97 (4.82%)
max: 4889287.28           max: 4686967.22
min: 4210362.00           min: 4083041.84
==Random read             ==Random read
records: 10               records: 10
avg: 4410525.23           avg: 4387093.18
std:  236693.22 (5.37%)   std:  235285.23 (5.36%)
max: 4713698.47           max: 4669760.62
min: 4057163.62           min: 3952002.16
==Mixed workload          ==Mixed workload
records: 10               records: 10
avg:  243234.25           avg: 2818677.27
std:   28505.07 (11.72%)  std:  195569.70 (6.94%)
max:  288905.23           max: 3126478.11
min:  212473.16           min: 2484150.69
==Random write            ==Random write
records: 10               records: 10
avg:  555887.07           avg: 1053057.79
std:   70841.98 (12.74%)  std:   35195.36 (3.34%)
max:  683188.28           max: 1096125.73
min:  437299.57           min:  992481.93
==Pwrite                  ==Pwrite
records: 10               records: 10
avg:  501745.93           avg:  810363.09
std:   16373.54 (3.26%)   std:   19245.01 (2.37%)
max:  518724.52           max:  833359.70
min:  464208.73           min:  765501.87
==Pread                   ==Pread
records: 10               records: 10
avg: 4539894.60           avg: 4457680.58
std:  197094.66 (4.34%)   std:  188965.60 (4.24%)
max: 4877170.38           max: 4689905.53
min: 4226326.03           min: 4095739.72

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/staging/zram/zram_drv.c | 17 ++++++++---------
 drivers/staging/zram/zram_drv.h |  4 +---
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index cdc8697476b3..2d0966def066 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -229,6 +229,7 @@ static struct zram_meta *zram_meta_alloc(u64 disksize)
 	}
 
 	rwlock_init(&meta->tb_lock);
+	mutex_init(&meta->buffer_lock);
 	return meta;
 
 free_table:
@@ -411,6 +412,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	struct page *page;
 	unsigned char *user_mem, *cmem, *src, *uncmem = NULL;
 	struct zram_meta *meta = zram->meta;
+	bool locked = false;
 
 	page = bvec->bv_page;
 	src = meta->compress_buffer;
@@ -430,6 +432,8 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			goto out;
 	}
 
+	mutex_lock(&meta->buffer_lock);
+	locked = true;
 	user_mem = kmap_atomic(page);
 
 	if (is_partial_io(bvec)) {
@@ -456,7 +460,6 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	ret = lzo1x_1_compress(uncmem, PAGE_SIZE, src, &clen,
 			       meta->compress_workmem);
-
 	if (!is_partial_io(bvec)) {
 		kunmap_atomic(user_mem);
 		user_mem = NULL;
@@ -513,6 +516,8 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		atomic_inc(&zram->stats.good_compress);
 
 out:
+	if (locked)
+		mutex_unlock(&meta->buffer_lock);
 	if (is_partial_io(bvec))
 		kfree(uncmem);
 
@@ -526,15 +531,10 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 {
 	int ret;
 
-	if (rw == READ) {
-		down_read(&zram->lock);
+	if (rw == READ)
 		ret = zram_bvec_read(zram, bvec, index, offset, bio);
-		up_read(&zram->lock);
-	} else {
-		down_write(&zram->lock);
+	else
 		ret = zram_bvec_write(zram, bvec, index, offset);
-		up_write(&zram->lock);
-	}
 
 	return ret;
 }
@@ -805,7 +805,6 @@ static int create_device(struct zram *zram, int device_id)
 {
 	int ret = -ENOMEM;
 
-	init_rwsem(&zram->lock);
 	init_rwsem(&zram->init_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index b371ae23421a..70835c286728 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -89,13 +89,11 @@ struct zram_meta {
 	void *compress_buffer;
 	struct table *table;
 	struct zs_pool *mem_pool;
+	struct mutex buffer_lock; /* protect compress buffers */
 };
 
 struct zram {
 	struct zram_meta *meta;
-	struct rw_semaphore lock; /* protect compression buffers,
-				   * reads and writes
-				   */
 	struct request_queue *queue;
 	struct gendisk *disk;
 	int init_done;
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/4]zram: locking redesign
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
                   ` (3 preceding siblings ...)
  2014-01-15  1:11 ` [PATCH v2 4/4] zram: Remove zram->lock in read path and change it with mutex Minchan Kim
@ 2014-01-15  1:29 ` Minchan Kim
  2014-01-15  1:34   ` Andrew Morton
  2014-01-15  9:21 ` Jerome Marchand
  2014-01-15  9:25 ` Sergey Senozhatsky
  6 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2014-01-15  1:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand

On Wed, Jan 15, 2014 at 10:11:06AM +0900, Minchan Kim wrote:
> Currently, zram->lock rw_semaphore is coarse-grained so it hurts
> for scalability.
> This patch try to enhance it with remove the lock in read path.
> 
> [1] uses atomic opeartion so it removes dependency of 32bit stat
> from zram->lock.
> [2] introduces table own lock instead of relying on zram->lock.
> [3] remove free pending slot mess so it makes core very clean.
> [4] finally removes zram->lock in read path and changes it with mutex.
> 
> So, output is wonderful. read/write mixed workload performs well
> 11 times than old and write concurrency is also enhanced because
> mutex supports SPIN_ON_OWNER while rw_semaphore doesn't yet.
> (I know recenty there were some effort to enhance it for rw_semaphore
> from Tim Chen but not sure it got merged. Anyway, we don't need it
> any more and there is no reason to prevent read-write concurrency)
> 
> Thanks.
> 
> Minchan Kim (4):
>   [1] zram: use atomic operation for stat
>   [2] zram: introduce zram->tb_lock
>   [3] zram: remove workqueue for freeing removed pending slot
>   [4] zram: Remove zram->lock in read path and change it with mutex
> 
>  drivers/staging/zram/zram_drv.c | 117 ++++++++++++++++------------------------
>  drivers/staging/zram/zram_drv.h |  27 +++-------
>  2 files changed, 51 insertions(+), 93 deletions(-)
> 
> -- 
> 1.8.5.2
> 

Oops, I missed Tested-by from Sergey. Really, sorry.
If I have a chance to resend from any review, I will add. Otherwise,
Andrew, Could you add his tested-by?

This patch is same with v1.
I just updated description from Andrew and Jerome's comment.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/4]zram: locking redesign
  2014-01-15  1:29 ` [PATCH v2 0/4]zram: locking redesign Minchan Kim
@ 2014-01-15  1:34   ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2014-01-15  1:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-kernel, Nitin Gupta, Sergey Senozhatsky, Jerome Marchand

On Wed, 15 Jan 2014 10:29:48 +0900 Minchan Kim <minchan@kernel.org> wrote:

> Oops, I missed Tested-by from Sergey. Really, sorry.
> If I have a chance to resend from any review, I will add. Otherwise,
> Andrew, Could you add his tested-by?

Sure.

I normally wouldn't consider such a patchset this late in the kernel
cycle.  But these patches speeds up some writes by 10x and zram is
officially still in staging/.  So I guess we could squeak these into
3.14.

Prompt review and testing by others would be appreciated, please.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/4]zram: locking redesign
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
                   ` (4 preceding siblings ...)
  2014-01-15  1:29 ` [PATCH v2 0/4]zram: locking redesign Minchan Kim
@ 2014-01-15  9:21 ` Jerome Marchand
  2014-01-15  9:25 ` Sergey Senozhatsky
  6 siblings, 0 replies; 9+ messages in thread
From: Jerome Marchand @ 2014-01-15  9:21 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, Nitin Gupta, Sergey Senozhatsky

On 01/15/2014 02:11 AM, Minchan Kim wrote:
> Currently, zram->lock rw_semaphore is coarse-grained so it hurts
> for scalability.
> This patch try to enhance it with remove the lock in read path.
> 
> [1] uses atomic opeartion so it removes dependency of 32bit stat
> from zram->lock.
> [2] introduces table own lock instead of relying on zram->lock.
> [3] remove free pending slot mess so it makes core very clean.
> [4] finally removes zram->lock in read path and changes it with mutex.
> 
> So, output is wonderful. read/write mixed workload performs well
> 11 times than old and write concurrency is also enhanced because
> mutex supports SPIN_ON_OWNER while rw_semaphore doesn't yet.
> (I know recenty there were some effort to enhance it for rw_semaphore
> from Tim Chen but not sure it got merged. Anyway, we don't need it
> any more and there is no reason to prevent read-write concurrency)
> 
> Thanks.
> 
> Minchan Kim (4):
>   [1] zram: use atomic operation for stat
>   [2] zram: introduce zram->tb_lock
>   [3] zram: remove workqueue for freeing removed pending slot
>   [4] zram: Remove zram->lock in read path and change it with mutex
> 
>  drivers/staging/zram/zram_drv.c | 117 ++++++++++++++++------------------------
>  drivers/staging/zram/zram_drv.h |  27 +++-------
>  2 files changed, 51 insertions(+), 93 deletions(-)
> 

The new locking scheme seems sound to me.

Acked-by: Jerome Marchand <jmarchan@redhat.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/4]zram: locking redesign
  2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
                   ` (5 preceding siblings ...)
  2014-01-15  9:21 ` Jerome Marchand
@ 2014-01-15  9:25 ` Sergey Senozhatsky
  6 siblings, 0 replies; 9+ messages in thread
From: Sergey Senozhatsky @ 2014-01-15  9:25 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, Nitin Gupta, Jerome Marchand

On (01/15/14 10:11), Minchan Kim wrote:
> Currently, zram->lock rw_semaphore is coarse-grained so it hurts
> for scalability.
> This patch try to enhance it with remove the lock in read path.
> 
> [1] uses atomic opeartion so it removes dependency of 32bit stat
> from zram->lock.
> [2] introduces table own lock instead of relying on zram->lock.
> [3] remove free pending slot mess so it makes core very clean.
> [4] finally removes zram->lock in read path and changes it with mutex.
> 
> So, output is wonderful. read/write mixed workload performs well
> 11 times than old and write concurrency is also enhanced because
> mutex supports SPIN_ON_OWNER while rw_semaphore doesn't yet.
> (I know recenty there were some effort to enhance it for rw_semaphore
> from Tim Chen but not sure it got merged. Anyway, we don't need it
> any more and there is no reason to prevent read-write concurrency)
> 
> Thanks.
> 

Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

> Minchan Kim (4):
>   [1] zram: use atomic operation for stat
>   [2] zram: introduce zram->tb_lock
>   [3] zram: remove workqueue for freeing removed pending slot
>   [4] zram: Remove zram->lock in read path and change it with mutex
> 
>  drivers/staging/zram/zram_drv.c | 117 ++++++++++++++++------------------------
>  drivers/staging/zram/zram_drv.h |  27 +++-------
>  2 files changed, 51 insertions(+), 93 deletions(-)
> 
> -- 
> 1.8.5.2
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-01-15  9:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15  1:11 [PATCH v2 0/4]zram: locking redesign Minchan Kim
2014-01-15  1:11 ` [PATCH v2 1/4] zram: use atomic operation for stat Minchan Kim
2014-01-15  1:11 ` [PATCH v2 2/4] zram: introduce zram->tb_lock Minchan Kim
2014-01-15  1:11 ` [PATCH v2 3/4] zram: remove workqueue for freeing removed pending slot Minchan Kim
2014-01-15  1:11 ` [PATCH v2 4/4] zram: Remove zram->lock in read path and change it with mutex Minchan Kim
2014-01-15  1:29 ` [PATCH v2 0/4]zram: locking redesign Minchan Kim
2014-01-15  1:34   ` Andrew Morton
2014-01-15  9:21 ` Jerome Marchand
2014-01-15  9:25 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).