dm-cache race on nr_dirty in set_dirty/clear

All of lore.kernel.org
 help / color / mirror / Atom feed

* dm-cache race on nr_dirty in set_dirty/clear_dirty?
@ 2014-07-26  4:07 Anssi Hannula
  2014-07-31 18:31 ` [PATCH] dm cache: fix race affecting dirty block count Anssi Hannula
  0 siblings, 1 reply; 7+ messages in thread
From: Anssi Hannula @ 2014-07-26  4:07 UTC (permalink / raw)
  To: dm-devel

Hi,

I'm seeing the following "dmsetup status" on one volume:
> 0 5368709120 cache 8 3926/32768 128 335265/1978880 5589425 8052258 2254781 3910141 0 335265 4294967293 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 512 discard_promote_adjustment 1 read_promote_adjustment 4 write_promote_adjustment 8

Note the clearly wrong 4294967293 in the nr_dirty field.

Looking at the code, I see nr_dirty is set in the following functions in
dm-cache-target.c:

> static void set_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cblock)
> {
>         if (!test_and_set_bit(from_cblock(cblock), cache->dirty_bitset)) {
>                 cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) + 1);
>                 policy_set_dirty(cache->policy, oblock);
>         }
> }
> 
> static void clear_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cblock)
> {
>         if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) {
>                 policy_clear_dirty(cache->policy, oblock);
>                 cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) - 1);
>                 if (!from_cblock(cache->nr_dirty))
>                         dm_table_event(cache->ti->table);
>         }
> }

That looks like a race to me? As nothing is protecting cache->nr_dirty
from multiple access (unlike cache->dirty_bitset).

Unless I'm missing something, as I'm not familiar with this code...

-- 
Anssi Hannula

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] dm cache: fix race affecting dirty block count
  2014-07-26  4:07 dm-cache race on nr_dirty in set_dirty/clear_dirty? Anssi Hannula
@ 2014-07-31 18:31 ` Anssi Hannula
  2014-08-01 15:17   ` Joe Thornber
  0 siblings, 1 reply; 7+ messages in thread
From: Anssi Hannula @ 2014-07-31 18:31 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: Joe Thornber, dm-devel, stable

nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks.

Fix that by using an atomic type for nr_dirty.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Cc: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org

---

So here's a patch for the issue I reported a few days ago.

I'm not sure what the actual effects of having wrong nr_dirty are
beyond the wrong value reported to userspace. Having wrong nr_dirty
causes dm_table_event(cache->ti->table) to be called at incorrect times,
but I'm not sure what the implications of that are.

So someone more knowledged should really amend my commit message
accordingly so that it explains the user-visible issues :)

I did notice that after a reboot (after having incorrect nr_dirty) the
entire cache was considered to be dirty and a full writeback occurred,
but I don't know if that was related at all.


 drivers/md/dm-cache-target.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 0df3ec085ebb..83a7eb5ef113 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -154,7 +154,7 @@ struct cache {
 	/*
 	 * cache_size entries, dirty if set
 	 */
-	dm_cblock_t nr_dirty;
+	atomic_t nr_dirty;
 	unsigned long *dirty_bitset;
 
 	/*
@@ -403,7 +403,7 @@ static bool is_dirty(struct cache *cache, dm_cblock_t b)
 static void set_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cblock)
 {
 	if (!test_and_set_bit(from_cblock(cblock), cache->dirty_bitset)) {
-		cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) + 1);
+		atomic_inc(&cache->nr_dirty);
 		policy_set_dirty(cache->policy, oblock);
 	}
 }
@@ -412,8 +412,7 @@ static void clear_dirty(struct cache *cache, dm_oblock_t oblock, dm_cblock_t cbl
 {
 	if (test_and_clear_bit(from_cblock(cblock), cache->dirty_bitset)) {
 		policy_clear_dirty(cache->policy, oblock);
-		cache->nr_dirty = to_cblock(from_cblock(cache->nr_dirty) - 1);
-		if (!from_cblock(cache->nr_dirty))
+		if (atomic_dec_return(&cache->nr_dirty) == 0)
 			dm_table_event(cache->ti->table);
 	}
 }
@@ -2003,7 +2002,7 @@ static int cache_create(struct cache_args *ca, struct cache **result)
 	init_waitqueue_head(&cache->migration_wait);
 
 	r = -ENOMEM;
-	cache->nr_dirty = 0;
+	atomic_set(&cache->nr_dirty, 0);
 	cache->dirty_bitset = alloc_bitset(from_cblock(cache->cache_size));
 	if (!cache->dirty_bitset) {
 		*error = "could not allocate dirty bitset";
@@ -2513,7 +2512,7 @@ static void cache_status(struct dm_target *ti, status_type_t type,
 		       (unsigned) atomic_read(&cache->stats.demotion),
 		       (unsigned) atomic_read(&cache->stats.promotion),
 		       (unsigned long long) from_cblock(residency),
-		       cache->nr_dirty);
+		       (unsigned) atomic_read(&cache->nr_dirty));
 
 		if (cache->features.write_through)
 			DMEMIT("1 writethrough ");
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] dm cache: fix race affecting dirty block count
  2014-07-31 18:31 ` [PATCH] dm cache: fix race affecting dirty block count Anssi Hannula
@ 2014-08-01 15:17   ` Joe Thornber
  2014-08-01 21:02     ` Anssi Hannula
  2014-08-17 20:24     ` Anssi Hannula
  0 siblings, 2 replies; 7+ messages in thread
From: Joe Thornber @ 2014-08-01 15:17 UTC (permalink / raw)
  To: Anssi Hannula; +Cc: Alasdair G Kergon, Joe Thornber, dm-devel, stable

On Thu, Jul 31, 2014 at 09:31:27PM +0300, Anssi Hannula wrote:
> I did notice that after a reboot (after having incorrect nr_dirty) the
> entire cache was considered to be dirty and a full writeback occurred,
> but I don't know if that was related at all.

Did you take the cache down cleanly?  Normally this happens when you
shutdown without destroying the cache device first.

Patch looks good.

- Joe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dm cache: fix race affecting dirty block count
  2014-08-01 15:17   ` Joe Thornber
@ 2014-08-01 21:02     ` Anssi Hannula
  2014-08-17 20:24     ` Anssi Hannula
  1 sibling, 0 replies; 7+ messages in thread
From: Anssi Hannula @ 2014-08-01 21:02 UTC (permalink / raw)
  To: Joe Thornber; +Cc: dm-devel, stable, Alasdair G Kergon

01.08.2014, 18:17, Joe Thornber kirjoitti:
> On Thu, Jul 31, 2014 at 09:31:27PM +0300, Anssi Hannula wrote:
>> I did notice that after a reboot (after having incorrect nr_dirty) the
>> entire cache was considered to be dirty and a full writeback occurred,
>> but I don't know if that was related at all.
> 
> Did you take the cache down cleanly?  Normally this happens when you
> shutdown without destroying the cache device first.

Apparently not. Looking at the logs systemd-cryptsetup failed to
deactivate the dm-crypt volume on top of the LV, as it was still mounted
as it seems systemd (v208) tried to unmount them way too early, while
they were still in use. Or maybe it should have retried later but didn't
for some reason... will have to debug further.

In any case, not a dm-cache issue it seems.

> Patch looks good.
> 
> - Joe
> 


-- 
Anssi Hannula

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] dm cache: fix race affecting dirty block count
  2014-08-01 15:17   ` Joe Thornber
  2014-08-01 21:02     ` Anssi Hannula
@ 2014-08-17 20:24     ` Anssi Hannula
  2014-09-05  0:11       ` [PATCH] dm cache: fix race causing dirty blocks to be marked as clean Anssi Hannula
  1 sibling, 1 reply; 7+ messages in thread
From: Anssi Hannula @ 2014-08-17 20:24 UTC (permalink / raw)
  To: Joe Thornber; +Cc: dm-devel, Alasdair G Kergon

01.08.2014, 18:17, Joe Thornber kirjoitti:
> On Thu, Jul 31, 2014 at 09:31:27PM +0300, Anssi Hannula wrote:
>> nr_dirty is updated without locking, causing it to drift so that it is
>> non-zero (either a small positive integer, or a very large one when an
>> underflow occurs) even when there are no actual dirty blocks.
>> 
>> Fix that by using an atomic type for nr_dirty.
[...]
> Patch looks good.

Unfortunately it seems there is some other potentially more serious bug
still in there...

I'm again seeing spurious dirty counts on two volumes [1]. As expected
with the previous fix, they are now both positive (as nr_dirty should
now be perfectly in sync with dirty_bitset and the bitset can't have a
negative count).


My first suspect was this kind of scenario:

Thread 1                      Thread 2
enter clear_dirty(61)         .
clear dirty bit               .
.                             enter set_dirty(61)
.                             set dirty bit
.                             policy_set_dirty()
.                             => mq adds to dirty queue
policy_clear_dirty()          .
=> mq drops from dirty queue  .


So the block remains dirty on target side but clean on mq side, and it
never gets written back since it is not on the mq dirty queue.

However, if the above is really what happens, i.e. concurrent calls to
set_dirty()/clear_dirty() for the same block, is that something that is
supposed to happen? and is the block then really dirty or not?

Or maybe something else is going on...?

This is on 3.15.7 + previous patch.

[1]

0 101876654080 cache 8 30512/32768 128 936412/1978944 8028417 56720875
10942325 11368177 0 744175 13 1 writeback 2 migration_threshold 2048 mq
10 random_threshold 4 sequential_threshold 512
discard_promote_adjustment 1 read_promote_adjustment 4
write_promote_adjustment 8

0 5368709120 cache 8 7184/32768 128 366217/1978880 39346555 8471758
13307043 4709237 0 7055 1 1 writeback 2 migration_threshold 2048 mq 10
random_threshold 4 sequential_threshold 512 discard_promote_adjustment 1
read_promote_adjustment 4 write_promote_adjustment 8

-- 
Anssi Hannula

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] dm cache: fix race causing dirty blocks to be marked as clean
  2014-08-17 20:24     ` Anssi Hannula
@ 2014-09-05  0:11       ` Anssi Hannula
  2014-09-09  9:54         ` Joe Thornber
  0 siblings, 1 reply; 7+ messages in thread
From: Anssi Hannula @ 2014-09-05  0:11 UTC (permalink / raw)
  To: Alasdair G Kergon, Joe Thornber; +Cc: dm-devel, linux-kernel, stable

When a writeback or a promotion of a block is completed, the cell of
that block is removed from the prison, the block is marked as clean, and
the clear_dirty() callback of the cache policy is called.

Unfortunately, performing those actions in this order allows an incoming
new write bio for that block to come in before clearing the dirty status
is completed and therefore possibly causing one of these two scenarios:

Scenario A:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called,
.                               but block already dirty
.                               => it does nothing
clear_dirty()                 .
- block marked clean          .
- policy clear_dirty() called .

Result: Block is marked clean even though it is actually dirty. No
writeback will occur.

Scenario B:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
clear_dirty()                 .
- block marked clean          .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called
.                             - block marked dirty
.                             - policy set_dirty() called
- policy clear_dirty() called .

Result: Block is properly marked as dirty, but policy thinks it is clean
and therefore never asks us to writeback it.
This case is visible in "dmsetup status" dirty block count (which
normally decreases to 0 on a quiet device).

Fix these issues by calling clear_dirty() before calling cell_defer().
Incoming bios for that block will then be detained in the cell and
released only after clear_dirty() has completed, so the race will not
occur.

Found by inspecting the code after noticing spurious dirty counts
(scenario B).

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Cc: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org
---

> Unfortunately it seems there is some other potentially more serious bug
> still in there...

After looking through the code that indeed seems to be the case, as
explained above.

Unless I'm missing something?

I can't say with 100% certainty if this fixes the spurious counts I saw
since those took quite a long time (1-2 weeks?) to appear and the load
of that system is somewhat irregular.


 drivers/md/dm-cache-target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 1af40ee209e2..7130505c2425 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -895,8 +895,8 @@ static void migration_success_pre_commit(struct dm_cache_migration *mg)
 	struct cache *cache = mg->cache;
 
 	if (mg->writeback) {
-		cell_defer(cache, mg->old_ocell, false);
 		clear_dirty(cache, mg->old_oblock, mg->cblock);
+		cell_defer(cache, mg->old_ocell, false);
 		cleanup_migration(mg);
 		return;
 
@@ -951,13 +951,13 @@ static void migration_success_post_commit(struct dm_cache_migration *mg)
 		}
 
 	} else {
+		clear_dirty(cache, mg->new_oblock, mg->cblock);
 		if (mg->requeue_holder)
 			cell_defer(cache, mg->new_ocell, true);
 		else {
 			bio_endio(mg->new_ocell->holder, 0);
 			cell_defer(cache, mg->new_ocell, false);
 		}
-		clear_dirty(cache, mg->new_oblock, mg->cblock);
 		cleanup_migration(mg);
 	}
 }
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] dm cache: fix race causing dirty blocks to be marked as clean
  2014-09-05  0:11       ` [PATCH] dm cache: fix race causing dirty blocks to be marked as clean Anssi Hannula
@ 2014-09-09  9:54         ` Joe Thornber
  0 siblings, 0 replies; 7+ messages in thread
From: Joe Thornber @ 2014-09-09  9:54 UTC (permalink / raw)
  To: Anssi Hannula
  Cc: Alasdair G Kergon, Joe Thornber, dm-devel, linux-kernel, stable

Ack.  Thanks.

On Fri, Sep 05, 2014 at 03:11:28AM +0300, Anssi Hannula wrote:
> When a writeback or a promotion of a block is completed, the cell of
> that block is removed from the prison, the block is marked as clean, and
> the clear_dirty() callback of the cache policy is called.
> 
> Unfortunately, performing those actions in this order allows an incoming
> new write bio for that block to come in before clearing the dirty status
> is completed and therefore possibly causing one of these two scenarios:
> 
> Scenario A:
> 
> Thread 1                      Thread 2
> cell_defer()                  .
> - cell removed from prison    .
> - detained bios queued        .
> .                             incoming write bio
> .                             remapped to cache
> .                             set_dirty() called,
> .                               but block already dirty
> .                               => it does nothing
> clear_dirty()                 .
> - block marked clean          .
> - policy clear_dirty() called .
> 
> Result: Block is marked clean even though it is actually dirty. No
> writeback will occur.
> 
> Scenario B:
> 
> Thread 1                      Thread 2
> cell_defer()                  .
> - cell removed from prison    .
> - detained bios queued        .
> clear_dirty()                 .
> - block marked clean          .
> .                             incoming write bio
> .                             remapped to cache
> .                             set_dirty() called
> .                             - block marked dirty
> .                             - policy set_dirty() called
> - policy clear_dirty() called .
> 
> Result: Block is properly marked as dirty, but policy thinks it is clean
> and therefore never asks us to writeback it.
> This case is visible in "dmsetup status" dirty block count (which
> normally decreases to 0 on a quiet device).
> 
> Fix these issues by calling clear_dirty() before calling cell_defer().
> Incoming bios for that block will then be detained in the cell and
> released only after clear_dirty() has completed, so the race will not
> occur.
> 
> Found by inspecting the code after noticing spurious dirty counts
> (scenario B).
> 
> Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
> Cc: Joe Thornber <ejt@redhat.com>
> Cc: stable@vger.kernel.org
> ---
> 
> > Unfortunately it seems there is some other potentially more serious bug
> > still in there...
> 
> After looking through the code that indeed seems to be the case, as
> explained above.
> 
> Unless I'm missing something?
> 
> I can't say with 100% certainty if this fixes the spurious counts I saw
> since those took quite a long time (1-2 weeks?) to appear and the load
> of that system is somewhat irregular.
> 
> 
>  drivers/md/dm-cache-target.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
> index 1af40ee209e2..7130505c2425 100644
> --- a/drivers/md/dm-cache-target.c
> +++ b/drivers/md/dm-cache-target.c
> @@ -895,8 +895,8 @@ static void migration_success_pre_commit(struct dm_cache_migration *mg)
>  	struct cache *cache = mg->cache;
>  
>  	if (mg->writeback) {
> -		cell_defer(cache, mg->old_ocell, false);
>  		clear_dirty(cache, mg->old_oblock, mg->cblock);
> +		cell_defer(cache, mg->old_ocell, false);
>  		cleanup_migration(mg);
>  		return;
>  
> @@ -951,13 +951,13 @@ static void migration_success_post_commit(struct dm_cache_migration *mg)
>  		}
>  
>  	} else {
> +		clear_dirty(cache, mg->new_oblock, mg->cblock);
>  		if (mg->requeue_holder)
>  			cell_defer(cache, mg->new_ocell, true);
>  		else {
>  			bio_endio(mg->new_ocell->holder, 0);
>  			cell_defer(cache, mg->new_ocell, false);
>  		}
> -		clear_dirty(cache, mg->new_oblock, mg->cblock);
>  		cleanup_migration(mg);
>  	}
>  }
> -- 
> 1.8.4.5
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-09  9:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-26  4:07 dm-cache race on nr_dirty in set_dirty/clear_dirty? Anssi Hannula
2014-07-31 18:31 ` [PATCH] dm cache: fix race affecting dirty block count Anssi Hannula
2014-08-01 15:17   ` Joe Thornber
2014-08-01 21:02     ` Anssi Hannula
2014-08-17 20:24     ` Anssi Hannula
2014-09-05  0:11       ` [PATCH] dm cache: fix race causing dirty blocks to be marked as clean Anssi Hannula
2014-09-09  9:54         ` Joe Thornber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.