[PATCH] kasan: fix races in quarantine_remove

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] kasan: fix races in quarantine_remove_cache()
@ 2017-03-08 15:15 Dmitry Vyukov
  2017-03-08 23:11 ` Andrew Morton
  2017-03-09  9:25 ` Andrey Ryabinin
  0 siblings, 2 replies; 8+ messages in thread
From: Dmitry Vyukov @ 2017-03-08 15:15 UTC (permalink / raw)
  To: aryabinin, linux-mm, akpm; +Cc: Dmitry Vyukov, kasan-dev, Greg Thelen

quarantine_remove_cache() frees all pending objects that belong to the
cache, before we destroy the cache itself. However there are currently
two possibilities how it can fail to do so.

First, another thread can hold some of the objects from the cache in
temp list in quarantine_put(). quarantine_put() has a windows of enabled
interrupts, and on_each_cpu() in quarantine_remove_cache() can finish
right in that window. These objects will be later freed into the
destroyed cache.

Then, quarantine_reduce() has the same problem. It grabs a batch of
objects from the global quarantine, then unlocks quarantine_lock and
then frees the batch. quarantine_remove_cache() can finish while some
objects from the cache are still in the local to_free list in
quarantine_reduce().

Fix the race with quarantine_put() by disabling interrupts for the
whole duration of quarantine_put(). In combination with on_each_cpu()
in quarantine_remove_cache() it ensures that quarantine_remove_cache()
either sees the objects in the per-cpu list or in the global list.

Fix the race with quarantine_reduce() by protecting quarantine_reduce()
with srcu critical section and then doing synchronize_srcu() at the end
of quarantine_remove_cache().

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Greg Thelen <gthelen@google.com>

---
I've done some assessment of how good synchronize_srcu() works in this
case. And on a 4 CPU VM I see that it blocks waiting for pending read
critical sections in about 2-3% of cases. Which looks good to me.

I suspect that these races are the root cause of some GPFs that
I episodically hit. Previously I did not have any explanation for them.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
PGD 6aeea067
PUD 60ed7067
PMD 0
Oops: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88005f948040 task.stack: ffff880069818000
RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
RSP: 0018:ffff88006981f298 EFLAGS: 00010246
RAX: ffffea0000ffff00 RBX: 0000000000000000 RCX: ffffea0000ffff1f
RDX: 0000000000000000 RSI: ffff88003fffc3e0 RDI: 0000000000000000
RBP: ffff88006981f2c0 R08: ffff88002fed7bd8 R09: 00000001001f000d
R10: 00000000001f000d R11: ffff88006981f000 R12: ffff88003fffc3e0
R13: ffff88006981f2d0 R14: ffffffff81877fae R15: 0000000080000000
FS:  00007fb911a2d700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c8 CR3: 0000000060ed6000 CR4: 00000000000006f0
Call Trace:
 quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239
 kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
 slab_post_alloc_hook mm/slab.h:456 [inline]
 slab_alloc_node mm/slub.c:2718 [inline]
 kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754
 __alloc_skb+0x10f/0x770 net/core/skbuff.c:219
 alloc_skb include/linux/skbuff.h:932 [inline]
 _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388
 sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline]
 sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746
 sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266
 sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1685
 SyS_sendto+0x40/0x50 net/socket.c:1653
---
 mm/kasan/quarantine.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 6f1ed1630873..075422c3cee3 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -27,6 +27,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/types.h>
+#include <linux/srcu.h>
 
 #include "../slab.h"
 #include "kasan.h"
@@ -103,6 +104,7 @@ static int quarantine_tail;
 /* Total size of all objects in global_quarantine across all batches. */
 static unsigned long quarantine_size;
 static DEFINE_SPINLOCK(quarantine_lock);
+DEFINE_STATIC_SRCU(remove_cache_srcu);
 
 /* Maximum size of the global queue. */
 static unsigned long quarantine_max_size;
@@ -173,17 +175,22 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
 	struct qlist_head *q;
 	struct qlist_head temp = QLIST_INIT;
 
+	/*
+	 * Note: irq must be disabled until after we move the batch to the
+	 * global quarantine. Otherwise quarantine_remove_cache() can miss
+	 * some objects belonging to the cache if they are in our local temp
+	 * list. quarantine_remove_cache() executes on_each_cpu() at the
+	 * beginning which ensures that it either sees the objects in per-cpu
+	 * lists or in the global quarantine.
+	 */
 	local_irq_save(flags);
 
 	q = this_cpu_ptr(&cpu_quarantine);
 	qlist_put(q, &info->quarantine_link, cache->size);
-	if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE))
+	if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE)) {
 		qlist_move_all(q, &temp);
 
-	local_irq_restore(flags);
-
-	if (unlikely(!qlist_empty(&temp))) {
-		spin_lock_irqsave(&quarantine_lock, flags);
+		spin_lock(&quarantine_lock);
 		WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
 		qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
 		if (global_quarantine[quarantine_tail].bytes >=
@@ -196,20 +203,33 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
 			if (new_tail != quarantine_head)
 				quarantine_tail = new_tail;
 		}
-		spin_unlock_irqrestore(&quarantine_lock, flags);
+		spin_unlock(&quarantine_lock);
 	}
+
+	local_irq_restore(flags);
 }
 
 void quarantine_reduce(void)
 {
 	size_t total_size, new_quarantine_size, percpu_quarantines;
 	unsigned long flags;
+	int srcu_idx;
 	struct qlist_head to_free = QLIST_INIT;
 
 	if (likely(READ_ONCE(quarantine_size) <=
 		   READ_ONCE(quarantine_max_size)))
 		return;
 
+	/*
+	 * srcu critical section ensures that quarantine_remove_cache()
+	 * will not miss objects belonging to the cache while they are in our
+	 * local to_free list. srcu is chosen because (1) it gives us private
+	 * grace period domain that does not interfere with anything else,
+	 * and (2) it allows synchronize_srcu() to return without waiting
+	 * if there are no pending read critical sections (which is the
+	 * expected case).
+	 */
+	srcu_idx = srcu_read_lock(&remove_cache_srcu);
 	spin_lock_irqsave(&quarantine_lock, flags);
 
 	/*
@@ -237,6 +257,7 @@ void quarantine_reduce(void)
 	spin_unlock_irqrestore(&quarantine_lock, flags);
 
 	qlist_free_all(&to_free, NULL);
+	srcu_read_unlock(&remove_cache_srcu, srcu_idx);
 }
 
 static void qlist_move_cache(struct qlist_head *from,
@@ -280,6 +301,13 @@ void quarantine_remove_cache(struct kmem_cache *cache)
 	unsigned long flags, i;
 	struct qlist_head to_free = QLIST_INIT;
 
+	/*
+	 * Must be careful to not miss any objects that are being moved from
+	 * per-cpu list to the global quarantine in quarantine_put(),
+	 * nor objects being freed in quarantine_reduce(). on_each_cpu()
+	 * achieves the first goal, while synchronize_srcu() achieves the
+	 * second.
+	 */
 	on_each_cpu(per_cpu_remove_cache, cache, 1);
 
 	spin_lock_irqsave(&quarantine_lock, flags);
@@ -288,4 +316,6 @@ void quarantine_remove_cache(struct kmem_cache *cache)
 	spin_unlock_irqrestore(&quarantine_lock, flags);
 
 	qlist_free_all(&to_free, cache);
+
+	synchronize_srcu(&remove_cache_srcu);
 }
-- 
2.12.0.246.ga2ecc84866-goog

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-08 15:15 [PATCH] kasan: fix races in quarantine_remove_cache() Dmitry Vyukov
@ 2017-03-08 23:11 ` Andrew Morton
  2017-03-09  8:52   ` Dmitry Vyukov
  2017-03-09  9:25 ` Andrey Ryabinin
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2017-03-08 23:11 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: aryabinin, linux-mm, kasan-dev, Greg Thelen

On Wed,  8 Mar 2017 16:15:32 +0100 Dmitry Vyukov <dvyukov@google.com> wrote:

> quarantine_remove_cache() frees all pending objects that belong to the
> cache, before we destroy the cache itself. However there are currently
> two possibilities how it can fail to do so.
> 
> First, another thread can hold some of the objects from the cache in
> temp list in quarantine_put(). quarantine_put() has a windows of enabled
> interrupts, and on_each_cpu() in quarantine_remove_cache() can finish
> right in that window. These objects will be later freed into the
> destroyed cache.
> 
> Then, quarantine_reduce() has the same problem. It grabs a batch of
> objects from the global quarantine, then unlocks quarantine_lock and
> then frees the batch. quarantine_remove_cache() can finish while some
> objects from the cache are still in the local to_free list in
> quarantine_reduce().
> 
> Fix the race with quarantine_put() by disabling interrupts for the
> whole duration of quarantine_put(). In combination with on_each_cpu()
> in quarantine_remove_cache() it ensures that quarantine_remove_cache()
> either sees the objects in the per-cpu list or in the global list.
> 
> Fix the race with quarantine_reduce() by protecting quarantine_reduce()
> with srcu critical section and then doing synchronize_srcu() at the end
> of quarantine_remove_cache().
> 
> ...
>
> I suspect that these races are the root cause of some GPFs that
> I episodically hit. Previously I did not have any explanation for them.

The changelog doesn't convey a sense of how serious this bug is, so I'm
not in a good position to decide whether this fix should be backported.
The patch looks fairly intrusive so I tentatively decided that it
needn't be backported.  Perhaps that was wrong.

Please be more careful in describing the end-user visible impact of
bugs when fixing them.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-08 23:11 ` Andrew Morton
@ 2017-03-09  8:52   ` Dmitry Vyukov
  0 siblings, 0 replies; 8+ messages in thread
From: Dmitry Vyukov @ 2017-03-09  8:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrey Ryabinin, linux-mm@kvack.org, kasan-dev, Greg Thelen

On Thu, Mar 9, 2017 at 12:11 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed,  8 Mar 2017 16:15:32 +0100 Dmitry Vyukov <dvyukov@google.com> wrote:
>
>> quarantine_remove_cache() frees all pending objects that belong to the
>> cache, before we destroy the cache itself. However there are currently
>> two possibilities how it can fail to do so.
>>
>> First, another thread can hold some of the objects from the cache in
>> temp list in quarantine_put(). quarantine_put() has a windows of enabled
>> interrupts, and on_each_cpu() in quarantine_remove_cache() can finish
>> right in that window. These objects will be later freed into the
>> destroyed cache.
>>
>> Then, quarantine_reduce() has the same problem. It grabs a batch of
>> objects from the global quarantine, then unlocks quarantine_lock and
>> then frees the batch. quarantine_remove_cache() can finish while some
>> objects from the cache are still in the local to_free list in
>> quarantine_reduce().
>>
>> Fix the race with quarantine_put() by disabling interrupts for the
>> whole duration of quarantine_put(). In combination with on_each_cpu()
>> in quarantine_remove_cache() it ensures that quarantine_remove_cache()
>> either sees the objects in the per-cpu list or in the global list.
>>
>> Fix the race with quarantine_reduce() by protecting quarantine_reduce()
>> with srcu critical section and then doing synchronize_srcu() at the end
>> of quarantine_remove_cache().
>>
>> ...
>>
>> I suspect that these races are the root cause of some GPFs that
>> I episodically hit. Previously I did not have any explanation for them.
>
> The changelog doesn't convey a sense of how serious this bug is, so I'm
> not in a good position to decide whether this fix should be backported.
> The patch looks fairly intrusive so I tentatively decided that it
> needn't be backported.  Perhaps that was wrong.
>
> Please be more careful in describing the end-user visible impact of
> bugs when fixing them.

Will try to do better next time. Thanks for the feedback.

I am not sure myself about backporting. The back is quite hard to
trigger, I've seen it few times during our massive continuous testing
(however, it could be cause of some other episodic stray crashes as it
leads to memory corruption...). If it is triggered, the consequences
are very bad -- almost definite bad memory corruption. The fix is non
trivial and has chances of introducing new bugs. I am also not sure
how actively people use KASAN on older releases.

Can we flag it for backporting later if/when we see real need?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-08 15:15 [PATCH] kasan: fix races in quarantine_remove_cache() Dmitry Vyukov
  2017-03-08 23:11 ` Andrew Morton
@ 2017-03-09  9:25 ` Andrey Ryabinin
  2017-03-09  9:37   ` Dmitry Vyukov
  1 sibling, 1 reply; 8+ messages in thread
From: Andrey Ryabinin @ 2017-03-09  9:25 UTC (permalink / raw)
  To: Dmitry Vyukov, linux-mm, akpm; +Cc: kasan-dev, Greg Thelen

On 03/08/2017 06:15 PM, Dmitry Vyukov wrote:

> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> index 6f1ed1630873..075422c3cee3 100644
> --- a/mm/kasan/quarantine.c
> +++ b/mm/kasan/quarantine.c
> @@ -27,6 +27,7 @@
>  #include <linux/slab.h>
>  #include <linux/string.h>
>  #include <linux/types.h>
> +#include <linux/srcu.h>
>  

Nit: keep alphabetical order please.


>  
>  void quarantine_reduce(void)
>  {
>  	size_t total_size, new_quarantine_size, percpu_quarantines;
>  	unsigned long flags;
> +	int srcu_idx;
>  	struct qlist_head to_free = QLIST_INIT;
>  
>  	if (likely(READ_ONCE(quarantine_size) <=
>  		   READ_ONCE(quarantine_max_size)))
>  		return;
>  
> +	/*
> +	 * srcu critical section ensures that quarantine_remove_cache()
> +	 * will not miss objects belonging to the cache while they are in our
> +	 * local to_free list. srcu is chosen because (1) it gives us private
> +	 * grace period domain that does not interfere with anything else,
> +	 * and (2) it allows synchronize_srcu() to return without waiting
> +	 * if there are no pending read critical sections (which is the
> +	 * expected case).
> +	 */
> +	srcu_idx = srcu_read_lock(&remove_cache_srcu);

I'm puzzled why is SRCU, why not RCU? Given that we take spin_lock in the next line
we certainly don't need ability to sleep in read-side critical section.

>  	spin_lock_irqsave(&quarantine_lock, flags);
>  
>  	/*
> @@ -237,6 +257,7 @@ void quarantine_reduce(void)
>  	spin_unlock_irqrestore(&quarantine_lock, flags);
>  
>  	qlist_free_all(&to_free, NULL);
> +	srcu_read_unlock(&remove_cache_srcu, srcu_idx);
>  }
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-09  9:25 ` Andrey Ryabinin
@ 2017-03-09  9:37   ` Dmitry Vyukov
  2017-03-09 10:29     ` Andrey Ryabinin
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2017-03-09  9:37 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: linux-mm@kvack.org, Andrew Morton, kasan-dev, Greg Thelen

On Thu, Mar 9, 2017 at 10:25 AM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
> On 03/08/2017 06:15 PM, Dmitry Vyukov wrote:
>
>> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
>> index 6f1ed1630873..075422c3cee3 100644
>> --- a/mm/kasan/quarantine.c
>> +++ b/mm/kasan/quarantine.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/string.h>
>>  #include <linux/types.h>
>> +#include <linux/srcu.h>
>>
>
> Nit: keep alphabetical order please.

Doh, we really need clang-format. This is not productive.
Will send v2.


>>  void quarantine_reduce(void)
>>  {
>>       size_t total_size, new_quarantine_size, percpu_quarantines;
>>       unsigned long flags;
>> +     int srcu_idx;
>>       struct qlist_head to_free = QLIST_INIT;
>>
>>       if (likely(READ_ONCE(quarantine_size) <=
>>                  READ_ONCE(quarantine_max_size)))
>>               return;
>>
>> +     /*
>> +      * srcu critical section ensures that quarantine_remove_cache()
>> +      * will not miss objects belonging to the cache while they are in our
>> +      * local to_free list. srcu is chosen because (1) it gives us private
>> +      * grace period domain that does not interfere with anything else,
>> +      * and (2) it allows synchronize_srcu() to return without waiting
>> +      * if there are no pending read critical sections (which is the
>> +      * expected case).
>> +      */
>> +     srcu_idx = srcu_read_lock(&remove_cache_srcu);
>
> I'm puzzled why is SRCU, why not RCU? Given that we take spin_lock in the next line
> we certainly don't need ability to sleep in read-side critical section.

I've explained it in the comment above.




>>       spin_lock_irqsave(&quarantine_lock, flags);
>>
>>       /*
>> @@ -237,6 +257,7 @@ void quarantine_reduce(void)
>>       spin_unlock_irqrestore(&quarantine_lock, flags);
>>
>>       qlist_free_all(&to_free, NULL);
>> +     srcu_read_unlock(&remove_cache_srcu, srcu_idx);
>>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-09  9:37   ` Dmitry Vyukov
@ 2017-03-09 10:29     ` Andrey Ryabinin
  2017-03-09 10:43       ` Dmitry Vyukov
  0 siblings, 1 reply; 8+ messages in thread
From: Andrey Ryabinin @ 2017-03-09 10:29 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: linux-mm@kvack.org, Andrew Morton, kasan-dev, Greg Thelen

On 03/09/2017 12:37 PM, Dmitry Vyukov wrote:
>>>  void quarantine_reduce(void)
>>>  {
>>>       size_t total_size, new_quarantine_size, percpu_quarantines;
>>>       unsigned long flags;
>>> +     int srcu_idx;
>>>       struct qlist_head to_free = QLIST_INIT;
>>>
>>>       if (likely(READ_ONCE(quarantine_size) <=
>>>                  READ_ONCE(quarantine_max_size)))
>>>               return;
>>>
>>> +     /*
>>> +      * srcu critical section ensures that quarantine_remove_cache()
>>> +      * will not miss objects belonging to the cache while they are in our
>>> +      * local to_free list. srcu is chosen because (1) it gives us private
>>> +      * grace period domain that does not interfere with anything else,
>>> +      * and (2) it allows synchronize_srcu() to return without waiting
>>> +      * if there are no pending read critical sections (which is the
>>> +      * expected case).
>>> +      */
>>> +     srcu_idx = srcu_read_lock(&remove_cache_srcu);
>>
>> I'm puzzled why is SRCU, why not RCU? Given that we take spin_lock in the next line
>> we certainly don't need ability to sleep in read-side critical section.
> 
> I've explained it in the comment above.
 
I've read it. It doesn't explain to me why is SRCU is better than RCU here.
 a) We can't sleep in read-side critical section. Given that RCU is almost always
	faster than SRCU, RCU seem preferable.
 b) synchronize_rcu() indeed might take longer to complete. But does it matter?
    We to synchronize_[s]rcu() only on cache destruction which relatively rare operation and 
    it's not a hotpath. Performance of the quarantine_reduce() is more important

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-09 10:29     ` Andrey Ryabinin
@ 2017-03-09 10:43       ` Dmitry Vyukov
  2017-03-09 11:09         ` Andrey Ryabinin
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2017-03-09 10:43 UTC (permalink / raw)
  To: Andrey Ryabinin; +Cc: linux-mm@kvack.org, Andrew Morton, kasan-dev, Greg Thelen

On Thu, Mar 9, 2017 at 11:29 AM, Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
> On 03/09/2017 12:37 PM, Dmitry Vyukov wrote:
>>>>  void quarantine_reduce(void)
>>>>  {
>>>>       size_t total_size, new_quarantine_size, percpu_quarantines;
>>>>       unsigned long flags;
>>>> +     int srcu_idx;
>>>>       struct qlist_head to_free = QLIST_INIT;
>>>>
>>>>       if (likely(READ_ONCE(quarantine_size) <=
>>>>                  READ_ONCE(quarantine_max_size)))
>>>>               return;
>>>>
>>>> +     /*
>>>> +      * srcu critical section ensures that quarantine_remove_cache()
>>>> +      * will not miss objects belonging to the cache while they are in our
>>>> +      * local to_free list. srcu is chosen because (1) it gives us private
>>>> +      * grace period domain that does not interfere with anything else,
>>>> +      * and (2) it allows synchronize_srcu() to return without waiting
>>>> +      * if there are no pending read critical sections (which is the
>>>> +      * expected case).
>>>> +      */
>>>> +     srcu_idx = srcu_read_lock(&remove_cache_srcu);
>>>
>>> I'm puzzled why is SRCU, why not RCU? Given that we take spin_lock in the next line
>>> we certainly don't need ability to sleep in read-side critical section.
>>
>> I've explained it in the comment above.
>
> I've read it. It doesn't explain to me why is SRCU is better than RCU here.
>  a) We can't sleep in read-side critical section. Given that RCU is almost always
>         faster than SRCU, RCU seem preferable.
>  b) synchronize_rcu() indeed might take longer to complete. But does it matter?
>     We to synchronize_[s]rcu() only on cache destruction which relatively rare operation and
>     it's not a hotpath. Performance of the quarantine_reduce() is more important


As far as I understand container destruction will cause destruction of
a bunch of caches. synchronize_sched() caused serious problems on
these paths in the past, see 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6.
srcu_read_lock/unlock are not too expensive, that's some atomic
operations on per-cpu variables, so cheaper than the existing
spinlock. And this is already not the fast-fast-path (which is
kmalloc/free). But hundreds of synchronize_rcu in a row can cause
hangup and panic. The fact that it's a rare operation won't help. Also
if we free a substantial batch of objects under rcu lock, it will
affect latency of all rcu callbacks in kernel which can have undesired
effects.
I am trying to make this more predictable and tolerant to unexpected
workloads, rather than sacrifice everything in the name of fast path
performance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kasan: fix races in quarantine_remove_cache()
  2017-03-09 10:43       ` Dmitry Vyukov
@ 2017-03-09 11:09         ` Andrey Ryabinin
  0 siblings, 0 replies; 8+ messages in thread
From: Andrey Ryabinin @ 2017-03-09 11:09 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: linux-mm@kvack.org, Andrew Morton, kasan-dev, Greg Thelen



On 03/09/2017 01:43 PM, Dmitry Vyukov wrote:
> On Thu, Mar 9, 2017 at 11:29 AM, Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
>> On 03/09/2017 12:37 PM, Dmitry Vyukov wrote:
>>>>>  void quarantine_reduce(void)
>>>>>  {
>>>>>       size_t total_size, new_quarantine_size, percpu_quarantines;
>>>>>       unsigned long flags;
>>>>> +     int srcu_idx;
>>>>>       struct qlist_head to_free = QLIST_INIT;
>>>>>
>>>>>       if (likely(READ_ONCE(quarantine_size) <=
>>>>>                  READ_ONCE(quarantine_max_size)))
>>>>>               return;
>>>>>
>>>>> +     /*
>>>>> +      * srcu critical section ensures that quarantine_remove_cache()
>>>>> +      * will not miss objects belonging to the cache while they are in our
>>>>> +      * local to_free list. srcu is chosen because (1) it gives us private
>>>>> +      * grace period domain that does not interfere with anything else,
>>>>> +      * and (2) it allows synchronize_srcu() to return without waiting
>>>>> +      * if there are no pending read critical sections (which is the
>>>>> +      * expected case).
>>>>> +      */
>>>>> +     srcu_idx = srcu_read_lock(&remove_cache_srcu);
>>>>
>>>> I'm puzzled why is SRCU, why not RCU? Given that we take spin_lock in the next line
>>>> we certainly don't need ability to sleep in read-side critical section.
>>>
>>> I've explained it in the comment above.
>>
>> I've read it. It doesn't explain to me why is SRCU is better than RCU here.
>>  a) We can't sleep in read-side critical section. Given that RCU is almost always
>>         faster than SRCU, RCU seem preferable.
>>  b) synchronize_rcu() indeed might take longer to complete. But does it matter?
>>     We to synchronize_[s]rcu() only on cache destruction which relatively rare operation and
>>     it's not a hotpath. Performance of the quarantine_reduce() is more important
> 
> 
> As far as I understand container destruction will cause destruction of
> a bunch of caches. synchronize_sched() caused serious problems on
> these paths in the past, see 86d9f48534e800e4d62cdc1b5aaf539f4c1d47d6.
> srcu_read_lock/unlock are not too expensive, that's some atomic
> operations on per-cpu variables, so cheaper than the existing
> spinlock. And this is already not the fast-fast-path (which is
> kmalloc/free). But hundreds of synchronize_rcu in a row can cause
> hangup and panic. The fact that it's a rare operation won't help. Also
> if we free a substantial batch of objects under rcu lock, it will
> affect latency of all rcu callbacks in kernel which can have undesired
> effects.
> I am trying to make this more predictable and tolerant to unexpected
> workloads, rather than sacrifice everything in the name of fast path
> performance.
> 

Ok, fair enough.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-09 11:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-08 15:15 [PATCH] kasan: fix races in quarantine_remove_cache() Dmitry Vyukov
2017-03-08 23:11 ` Andrew Morton
2017-03-09  8:52   ` Dmitry Vyukov
2017-03-09  9:25 ` Andrey Ryabinin
2017-03-09  9:37   ` Dmitry Vyukov
2017-03-09 10:29     ` Andrey Ryabinin
2017-03-09 10:43       ` Dmitry Vyukov
2017-03-09 11:09         ` Andrey Ryabinin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).