linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] v4 block refcount conversion patches
@ 2017-10-20  8:15 Elena Reshetova
  2017-10-20  8:15 ` [PATCH 1/6] block: convert bio.__bi_cnt from atomic_t to refcount_t Elena Reshetova
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:15 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

Changes in v4:
 - Improved commit messages and signoff info.
 - Rebase on top of linux-next as of yesterday.
 - WARN_ONs are restored since x86 refcount_t does not WARN on zero

Changes in v3:
No changes in patches apart from trivial rebases, but now by
default refcount_t = atomic_t and uses all atomic standard operations
unless CONFIG_REFCOUNT_FULL is enabled. This is a compromize for the
systems that are critical on performance and cannot accept even
slight delay on the refcounter operations.

Changes in v2:
Not needed WARNs are removed since refcount_t warns by itself.
BUG_ONs are left as it is, since refcount_t doesn't bug by default.

This series, for block subsystem, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can lead to use-after-free vulnerabilities.

The patches are fully independent and can be cherry-picked separately.
If there are no objections to the patches, please merge them via respective trees.

Elena Reshetova (6):
  block: convert bio.__bi_cnt from atomic_t to refcount_t
  block: convert blk_queue_tag.refcnt from atomic_t to refcount_t
  block: convert blkcg_gq.refcnt from atomic_t to refcount_t
  block: convert io_context.active_ref from atomic_t to refcount_t
  block: convert bsg_device.ref_count from atomic_t to refcount_t
  drivers, block: convert xen_blkif.refcnt from atomic_t to refcount_t

 block/bfq-iosched.c                |  2 +-
 block/bio.c                        |  6 +++---
 block/blk-cgroup.c                 |  2 +-
 block/blk-ioc.c                    |  4 ++--
 block/blk-tag.c                    |  8 ++++----
 block/bsg.c                        |  9 +++++----
 block/cfq-iosched.c                |  4 ++--
 drivers/block/xen-blkback/common.h |  7 ++++---
 drivers/block/xen-blkback/xenbus.c |  2 +-
 fs/btrfs/volumes.c                 |  2 +-
 include/linux/bio.h                |  4 ++--
 include/linux/blk-cgroup.h         | 11 ++++++-----
 include/linux/blk_types.h          |  3 ++-
 include/linux/blkdev.h             |  3 ++-
 include/linux/iocontext.h          |  7 ++++---
 15 files changed, 40 insertions(+), 34 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] block: convert bio.__bi_cnt from atomic_t to refcount_t
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
@ 2017-10-20  8:15 ` Elena Reshetova
  2017-10-20  8:15 ` [PATCH 2/6] block: convert blk_queue_tag.refcnt " Elena Reshetova
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:15 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable bio.__bi_cnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 block/bio.c               | 6 +++---
 fs/btrfs/volumes.c        | 2 +-
 include/linux/bio.h       | 4 ++--
 include/linux/blk_types.h | 3 ++-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 101c2a9..58edc1b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -279,7 +279,7 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 {
 	memset(bio, 0, sizeof(*bio));
 	atomic_set(&bio->__bi_remaining, 1);
-	atomic_set(&bio->__bi_cnt, 1);
+	refcount_set(&bio->__bi_cnt, 1);
 
 	bio->bi_io_vec = table;
 	bio->bi_max_vecs = max_vecs;
@@ -557,12 +557,12 @@ void bio_put(struct bio *bio)
 	if (!bio_flagged(bio, BIO_REFFED))
 		bio_free(bio);
 	else {
-		BIO_BUG_ON(!atomic_read(&bio->__bi_cnt));
+		BIO_BUG_ON(!refcount_read(&bio->__bi_cnt));
 
 		/*
 		 * last put frees it
 		 */
-		if (atomic_dec_and_test(&bio->__bi_cnt))
+		if (refcount_dec_and_test(&bio->__bi_cnt))
 			bio_free(bio);
 	}
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b397375..11812ee 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -450,7 +450,7 @@ static noinline void run_scheduled_bios(struct btrfs_device *device)
 		    waitqueue_active(&fs_info->async_submit_wait))
 			wake_up(&fs_info->async_submit_wait);
 
-		BUG_ON(atomic_read(&cur->__bi_cnt) == 0);
+		BUG_ON(refcount_read(&cur->__bi_cnt) == 0);
 
 		/*
 		 * if we're doing the sync list, record that our
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 275c91c..0fa4dd2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -253,7 +253,7 @@ static inline void bio_get(struct bio *bio)
 {
 	bio->bi_flags |= (1 << BIO_REFFED);
 	smp_mb__before_atomic();
-	atomic_inc(&bio->__bi_cnt);
+	refcount_inc(&bio->__bi_cnt);
 }
 
 static inline void bio_cnt_set(struct bio *bio, unsigned int count)
@@ -262,7 +262,7 @@ static inline void bio_cnt_set(struct bio *bio, unsigned int count)
 		bio->bi_flags |= (1 << BIO_REFFED);
 		smp_mb__before_atomic();
 	}
-	atomic_set(&bio->__bi_cnt, count);
+	refcount_set(&bio->__bi_cnt, count);
 }
 
 static inline bool bio_flagged(struct bio *bio, unsigned int bit)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa7..1ec370e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -7,6 +7,7 @@
 
 #include <linux/types.h>
 #include <linux/bvec.h>
+#include <linux/refcount.h>
 
 struct bio_set;
 struct bio;
@@ -104,7 +105,7 @@ struct bio {
 
 	unsigned short		bi_max_vecs;	/* max bvl_vecs we can hold */
 
-	atomic_t		__bi_cnt;	/* pin count */
+	refcount_t		__bi_cnt;	/* pin count */
 
 	struct bio_vec		*bi_io_vec;	/* the actual vec list */
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/6] block: convert blk_queue_tag.refcnt from atomic_t to refcount_t
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
  2017-10-20  8:15 ` [PATCH 1/6] block: convert bio.__bi_cnt from atomic_t to refcount_t Elena Reshetova
@ 2017-10-20  8:15 ` Elena Reshetova
  2017-10-20  8:15 ` [PATCH 3/6] block: convert blkcg_gq.refcnt " Elena Reshetova
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:15 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable blk_queue_tag.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 block/blk-tag.c        | 8 ++++----
 include/linux/blkdev.h | 3 ++-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/blk-tag.c b/block/blk-tag.c
index e1a9c15..a7263e3 100644
--- a/block/blk-tag.c
+++ b/block/blk-tag.c
@@ -35,7 +35,7 @@ EXPORT_SYMBOL(blk_queue_find_tag);
  */
 void blk_free_tags(struct blk_queue_tag *bqt)
 {
-	if (atomic_dec_and_test(&bqt->refcnt)) {
+	if (refcount_dec_and_test(&bqt->refcnt)) {
 		BUG_ON(find_first_bit(bqt->tag_map, bqt->max_depth) <
 							bqt->max_depth);
 
@@ -130,7 +130,7 @@ static struct blk_queue_tag *__blk_queue_init_tags(struct request_queue *q,
 	if (init_tag_map(q, tags, depth))
 		goto fail;
 
-	atomic_set(&tags->refcnt, 1);
+	refcount_set(&tags->refcnt, 1);
 	tags->alloc_policy = alloc_policy;
 	tags->next_tag = 0;
 	return tags;
@@ -180,7 +180,7 @@ int blk_queue_init_tags(struct request_queue *q, int depth,
 		queue_flag_set(QUEUE_FLAG_QUEUED, q);
 		return 0;
 	} else
-		atomic_inc(&tags->refcnt);
+		refcount_inc(&tags->refcnt);
 
 	/*
 	 * assign it, all done
@@ -225,7 +225,7 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 	 * Currently cannot replace a shared tag map with a new
 	 * one, so error out if this is the case
 	 */
-	if (atomic_read(&bqt->refcnt) != 1)
+	if (refcount_read(&bqt->refcnt) != 1)
 		return -EBUSY;
 
 	/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02fa42d..1fefdbb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -26,6 +26,7 @@
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
 #include <linux/blkzoned.h>
+#include <linux/refcount.h>
 
 struct module;
 struct scsi_ioctl_command;
@@ -295,7 +296,7 @@ struct blk_queue_tag {
 	unsigned long *tag_map;		/* bit map of free/busy tags */
 	int max_depth;			/* what we will send to device */
 	int real_max_depth;		/* what the array can hold */
-	atomic_t refcnt;		/* map can be shared */
+	refcount_t refcnt;		/* map can be shared */
 	int alloc_policy;		/* tag allocation policy */
 	int next_tag;			/* next tag */
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/6] block: convert blkcg_gq.refcnt from atomic_t to refcount_t
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
  2017-10-20  8:15 ` [PATCH 1/6] block: convert bio.__bi_cnt from atomic_t to refcount_t Elena Reshetova
  2017-10-20  8:15 ` [PATCH 2/6] block: convert blk_queue_tag.refcnt " Elena Reshetova
@ 2017-10-20  8:15 ` Elena Reshetova
  2017-10-20  8:16 ` [PATCH 4/6] block: convert io_context.active_ref " Elena Reshetova
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:15 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable blkcg_gq.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 block/blk-cgroup.c         |  2 +-
 include/linux/blk-cgroup.h | 11 ++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d3f56ba..1e7cedc 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -107,7 +107,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
 	blkg->q = q;
 	INIT_LIST_HEAD(&blkg->q_node);
 	blkg->blkcg = blkcg;
-	atomic_set(&blkg->refcnt, 1);
+	refcount_set(&blkg->refcnt, 1);
 
 	/* root blkg uses @q->root_rl, init rl only for !root blkgs */
 	if (blkcg != &blkcg_root) {
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 9d92153..c95d29d 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -19,6 +19,7 @@
 #include <linux/radix-tree.h>
 #include <linux/blkdev.h>
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 
 /* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */
 #define BLKG_STAT_CPU_BATCH	(INT_MAX / 2)
@@ -122,7 +123,7 @@ struct blkcg_gq {
 	struct request_list		rl;
 
 	/* reference count */
-	atomic_t			refcnt;
+	refcount_t			refcnt;
 
 	/* is this blkg online? protected by both blkcg and q locks */
 	bool				online;
@@ -354,8 +355,8 @@ static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen)
  */
 static inline void blkg_get(struct blkcg_gq *blkg)
 {
-	WARN_ON_ONCE(atomic_read(&blkg->refcnt) <= 0);
-	atomic_inc(&blkg->refcnt);
+	WARN_ON_ONCE(refcount_read(&blkg->refcnt) == 0);
+	refcount_inc(&blkg->refcnt);
 }
 
 void __blkg_release_rcu(struct rcu_head *rcu);
@@ -366,8 +367,8 @@ void __blkg_release_rcu(struct rcu_head *rcu);
  */
 static inline void blkg_put(struct blkcg_gq *blkg)
 {
-	WARN_ON_ONCE(atomic_read(&blkg->refcnt) <= 0);
-	if (atomic_dec_and_test(&blkg->refcnt))
+	WARN_ON_ONCE(refcount_read(&blkg->refcnt) == 0);
+	if (refcount_dec_and_test(&blkg->refcnt))
 		call_rcu(&blkg->rcu_head, __blkg_release_rcu);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/6] block: convert io_context.active_ref from atomic_t to refcount_t
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
                   ` (2 preceding siblings ...)
  2017-10-20  8:15 ` [PATCH 3/6] block: convert blkcg_gq.refcnt " Elena Reshetova
@ 2017-10-20  8:16 ` Elena Reshetova
  2017-10-20  8:16 ` [PATCH 5/6] block: convert bsg_device.ref_count " Elena Reshetova
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:16 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable io_context.active_ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 block/bfq-iosched.c       | 2 +-
 block/blk-ioc.c           | 4 ++--
 block/cfq-iosched.c       | 4 ++--
 include/linux/iocontext.h | 7 ++++---
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a4783da..1ec9b22 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4030,7 +4030,7 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd,
 	 * bfqq. Otherwise check average think time to
 	 * decide whether to mark as has_short_ttime
 	 */
-	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
+	if (refcount_read(&bic->icq.ioc->active_ref) == 0 ||
 	    (bfq_sample_valid(bfqq->ttime.ttime_samples) &&
 	     bfqq->ttime.ttime_mean > bfqd->bfq_slice_idle))
 		has_short_ttime = false;
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 63898d2..69704d2 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -176,7 +176,7 @@ void put_io_context_active(struct io_context *ioc)
 	unsigned long flags;
 	struct io_cq *icq;
 
-	if (!atomic_dec_and_test(&ioc->active_ref)) {
+	if (!refcount_dec_and_test(&ioc->active_ref)) {
 		put_io_context(ioc);
 		return;
 	}
@@ -275,7 +275,7 @@ int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
 	/* initialize */
 	atomic_long_set(&ioc->refcount, 1);
 	atomic_set(&ioc->nr_tasks, 1);
-	atomic_set(&ioc->active_ref, 1);
+	refcount_set(&ioc->active_ref, 1);
 	spin_lock_init(&ioc->lock);
 	INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC | __GFP_HIGH);
 	INIT_HLIST_HEAD(&ioc->icq_list);
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9f342ef..e6d5d6d 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2941,7 +2941,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 	 * task has exited, don't wait
 	 */
 	cic = cfqd->active_cic;
-	if (!cic || !atomic_read(&cic->icq.ioc->active_ref))
+	if (!cic || !refcount_read(&cic->icq.ioc->active_ref))
 		return;
 
 	/*
@@ -3933,7 +3933,7 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 
 	if (cfqq->next_rq && req_noidle(cfqq->next_rq))
 		enable_idle = 0;
-	else if (!atomic_read(&cic->icq.ioc->active_ref) ||
+	else if (!refcount_read(&cic->icq.ioc->active_ref) ||
 		 !cfqd->cfq_slice_idle ||
 		 (!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq)))
 		enable_idle = 0;
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index df38db2..a1e28c3 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -3,6 +3,7 @@
 
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
+#include <linux/refcount.h>
 #include <linux/workqueue.h>
 
 enum {
@@ -96,7 +97,7 @@ struct io_cq {
  */
 struct io_context {
 	atomic_long_t refcount;
-	atomic_t active_ref;
+	refcount_t active_ref;
 	atomic_t nr_tasks;
 
 	/* all the fields below are protected by this lock */
@@ -128,9 +129,9 @@ struct io_context {
 static inline void get_io_context_active(struct io_context *ioc)
 {
 	WARN_ON_ONCE(atomic_long_read(&ioc->refcount) <= 0);
-	WARN_ON_ONCE(atomic_read(&ioc->active_ref) <= 0);
+	WARN_ON_ONCE(refcount_read(&ioc->active_ref) == 0);
 	atomic_long_inc(&ioc->refcount);
-	atomic_inc(&ioc->active_ref);
+	refcount_inc(&ioc->active_ref);
 }
 
 static inline void ioc_task_link(struct io_context *ioc)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/6] block: convert bsg_device.ref_count from atomic_t to refcount_t
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
                   ` (3 preceding siblings ...)
  2017-10-20  8:16 ` [PATCH 4/6] block: convert io_context.active_ref " Elena Reshetova
@ 2017-10-20  8:16 ` Elena Reshetova
  2017-10-20  8:16 ` [PATCH 6/6] drivers, block: convert xen_blkif.refcnt " Elena Reshetova
  2017-10-20  8:43 ` [PATCH 0/6] v4 block refcount conversion patches Johannes Thumshirn
  6 siblings, 0 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:16 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable bsg_device.ref_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 block/bsg.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index ee1335c..6c98422 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -21,6 +21,7 @@
 #include <linux/idr.h>
 #include <linux/bsg.h>
 #include <linux/slab.h>
+#include <linux/refcount.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_ioctl.h>
@@ -38,7 +39,7 @@ struct bsg_device {
 	struct list_head busy_list;
 	struct list_head done_list;
 	struct hlist_node dev_list;
-	atomic_t ref_count;
+	refcount_t ref_count;
 	int queued_cmds;
 	int done_cmds;
 	wait_queue_head_t wq_done;
@@ -710,7 +711,7 @@ static int bsg_put_device(struct bsg_device *bd)
 
 	mutex_lock(&bsg_mutex);
 
-	do_free = atomic_dec_and_test(&bd->ref_count);
+	do_free = refcount_dec_and_test(&bd->ref_count);
 	if (!do_free) {
 		mutex_unlock(&bsg_mutex);
 		goto out;
@@ -768,7 +769,7 @@ static struct bsg_device *bsg_add_device(struct inode *inode,
 
 	bsg_set_block(bd, file);
 
-	atomic_set(&bd->ref_count, 1);
+	refcount_set(&bd->ref_count, 1);
 	mutex_lock(&bsg_mutex);
 	hlist_add_head(&bd->dev_list, bsg_dev_idx_hash(iminor(inode)));
 
@@ -788,7 +789,7 @@ static struct bsg_device *__bsg_get_device(int minor, struct request_queue *q)
 
 	hlist_for_each_entry(bd, bsg_dev_idx_hash(minor), dev_list) {
 		if (bd->queue == q) {
-			atomic_inc(&bd->ref_count);
+			refcount_inc(&bd->ref_count);
 			goto found;
 		}
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/6] drivers, block: convert xen_blkif.refcnt from atomic_t to refcount_t
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
                   ` (4 preceding siblings ...)
  2017-10-20  8:16 ` [PATCH 5/6] block: convert bsg_device.ref_count " Elena Reshetova
@ 2017-10-20  8:16 ` Elena Reshetova
  2017-10-20  8:43 ` [PATCH 0/6] v4 block refcount conversion patches Johannes Thumshirn
  6 siblings, 0 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-10-20  8:16 UTC (permalink / raw)
  To: axboe
  Cc: james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook, Elena Reshetova

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable xen_blkif.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 drivers/block/xen-blkback/common.h | 7 ++++---
 drivers/block/xen-blkback/xenbus.c | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index ecb35fe..0c3320d 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -35,6 +35,7 @@
 #include <linux/wait.h>
 #include <linux/io.h>
 #include <linux/rbtree.h>
+#include <linux/refcount.h>
 #include <asm/setup.h>
 #include <asm/pgalloc.h>
 #include <asm/hypervisor.h>
@@ -319,7 +320,7 @@ struct xen_blkif {
 	struct xen_vbd		vbd;
 	/* Back pointer to the backend_info. */
 	struct backend_info	*be;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	/* for barrier (drain) requests */
 	struct completion	drain_complete;
 	atomic_t		drain;
@@ -372,10 +373,10 @@ struct pending_req {
 			 (_v)->bdev->bd_part->nr_sects : \
 			  get_capacity((_v)->bdev->bd_disk))
 
-#define xen_blkif_get(_b) (atomic_inc(&(_b)->refcnt))
+#define xen_blkif_get(_b) (refcount_inc(&(_b)->refcnt))
 #define xen_blkif_put(_b)				\
 	do {						\
-		if (atomic_dec_and_test(&(_b)->refcnt))	\
+		if (refcount_dec_and_test(&(_b)->refcnt))	\
 			schedule_work(&(_b)->free_work);\
 	} while (0)
 
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 21c1be1..5955b61 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,7 +176,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 		return ERR_PTR(-ENOMEM);
 
 	blkif->domid = domid;
-	atomic_set(&blkif->refcnt, 1);
+	refcount_set(&blkif->refcnt, 1);
 	init_completion(&blkif->drain_complete);
 	INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] v4 block refcount conversion patches
  2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
                   ` (5 preceding siblings ...)
  2017-10-20  8:16 ` [PATCH 6/6] drivers, block: convert xen_blkif.refcnt " Elena Reshetova
@ 2017-10-20  8:43 ` Johannes Thumshirn
  2017-10-20 10:25   ` Reshetova, Elena
  6 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2017-10-20  8:43 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: axboe, james.bottomley, linux-kernel, linux-block, linux-scsi,
	linux-btrfs, peterz, gregkh, fujita.tomonori, mingo, clm, jbacik,
	dsterba, keescook

Elena Reshetova <elena.reshetova@intel.com> writes:
> Elena Reshetova (6):
>   block: convert bio.__bi_cnt from atomic_t to refcount_t
>   block: convert blk_queue_tag.refcnt from atomic_t to refcount_t
>   block: convert blkcg_gq.refcnt from atomic_t to refcount_t
>   block: convert io_context.active_ref from atomic_t to refcount_t
>   block: convert bsg_device.ref_count from atomic_t to refcount_t
>   drivers, block: convert xen_blkif.refcnt from atomic_t to refcount_t

Hi Elena,

While the bsg ref_count is cheap, do you have any numbers how the other
conversions compare in performance (throughput and latency) vs atomics?

It should be quite easy to measure against a null_blk device.

Thanks a lot,
       Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 0/6] v4 block refcount conversion patches
  2017-10-20  8:43 ` [PATCH 0/6] v4 block refcount conversion patches Johannes Thumshirn
@ 2017-10-20 10:25   ` Reshetova, Elena
  0 siblings, 0 replies; 9+ messages in thread
From: Reshetova, Elena @ 2017-10-20 10:25 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: axboe@kernel.dk, james.bottomley@hansenpartnership.com,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-btrfs@vger.kernel.org,
	peterz@infradead.org, gregkh@linuxfoundation.org,
	fujita.tomonori@lab.ntt.co.jp, mingo@redhat.com, clm@fb.com,
	jbacik@fb.com, dsterba@suse.com, keescook@chromium.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2758 bytes --]


> Elena Reshetova <elena.reshetova@intel.com> writes:
> > Elena Reshetova (6):
> >   block: convert bio.__bi_cnt from atomic_t to refcount_t
> >   block: convert blk_queue_tag.refcnt from atomic_t to refcount_t
> >   block: convert blkcg_gq.refcnt from atomic_t to refcount_t
> >   block: convert io_context.active_ref from atomic_t to refcount_t
> >   block: convert bsg_device.ref_count from atomic_t to refcount_t
> >   drivers, block: convert xen_blkif.refcnt from atomic_t to refcount_t
> 
> Hi Elena,
> 
> While the bsg ref_count is cheap, do you have any numbers how the other
> conversions compare in performance (throughput and latency) vs atomics?
Hi Johannes,

The performance would depend on which "breed" of refcount_t is used underneath. 
We currently have 3 versions:

- refcount_t defaults to atomic_t (no CONFIG_REFCOUNT_FULL enabled, no arch. support)
  Impact is zero in this case since it is just atomic functions are used. 
- refcount_t uses arch. specific implementation (arch. enables ARCH_HAS_REFCOUNT)
 Impact depends on arch. implementation. Currently only x86 provides one. 
- refcount_t uses "full" arch. independent implementation.

Here are cycle numbers for comparing these 3 (https://lwn.net/Articles/728626/):
Just copy pasting for convenience:

">These are the cycle counts comparing a loop of refcount_inc() from 1
>to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
>unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
>(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

>2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
			cycles	protections
>atomic_t	   82249267387	none
>refcount_t-fast    82211446892	overflow, untested dec-to-zero
>refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero"

So, the middle option (called here refcount_t-fast) with arch. specific
implementation gives a negligible impact. The "full" one is more pricey, but it is
disabled by default anyway, so only people who want strict security enable it.  

Are these numbers convincing enough that we don't have to measure
the block devices? :)

Best Regards,
Elena.


> 
> It should be quite easy to measure against a null_blk device.
> 
> Thanks a lot,
>        Johannes
> 
> --
> Johannes Thumshirn                                          Storage
> jthumshirn@suse.de                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-10-20 10:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-20  8:15 [PATCH 0/6] v4 block refcount conversion patches Elena Reshetova
2017-10-20  8:15 ` [PATCH 1/6] block: convert bio.__bi_cnt from atomic_t to refcount_t Elena Reshetova
2017-10-20  8:15 ` [PATCH 2/6] block: convert blk_queue_tag.refcnt " Elena Reshetova
2017-10-20  8:15 ` [PATCH 3/6] block: convert blkcg_gq.refcnt " Elena Reshetova
2017-10-20  8:16 ` [PATCH 4/6] block: convert io_context.active_ref " Elena Reshetova
2017-10-20  8:16 ` [PATCH 5/6] block: convert bsg_device.ref_count " Elena Reshetova
2017-10-20  8:16 ` [PATCH 6/6] drivers, block: convert xen_blkif.refcnt " Elena Reshetova
2017-10-20  8:43 ` [PATCH 0/6] v4 block refcount conversion patches Johannes Thumshirn
2017-10-20 10:25   ` Reshetova, Elena

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).