All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
Date: Thu, 18 Jul 2013 15:25:41 -0600	[thread overview]
Message-ID: <51E85D55.9000501@kernel.dk> (raw)
In-Reply-To: <20130717161200.40a97074623be2685beb8156@linux-foundation.org>

On 07/17/2013 05:12 PM, Andrew Morton wrote:
> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> 
>> With users of radix_tree_preload() run from interrupt (CFQ is one such
>> possible user), the following race can happen:
>>
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>   radix_tree_node_alloc()
>>     if (rtp->nr) {
>>       ret = rtp->nodes[rtp->nr - 1];
>> <interrupt>
>> ...
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>   radix_tree_node_alloc()
>>     if (rtp->nr) {
>>       ret = rtp->nodes[rtp->nr - 1];
>>
>> And we give out one radix tree node twice. That clearly results in radix
>> tree corruption with different results (usually OOPS) depending on which
>> two users of radix tree race.
>>
>> Fix the problem by disabling interrupts when working with rtp variable.
>> In-interrupt user can still deplete our preloaded nodes but at least we
>> won't corrupt radix trees.
>>
>> ...
>>
>>   There are some questions regarding this patch:
>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
>> could certainly do this in older kernels but that particular call site where I
>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
>> recent kernels.
> 
> Well, it was never anticipated that interrupt-time code would run
> radix_tree_preload().  The whole point in the preloading was to be able
> to perform GFP_KERNEL allocations before entering the spinlocked region
> which needs to allocate memory.
> 
> Doing all that from within an interrupt is daft, because the interrupt code
> can't use GFP_KERNEL anyway.
> 
>> Also it is actually harmful to do preloading if you are in interrupt context
>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
>> whether the caller wants it to use preloaded nodes or not and that callers
>> would have to get it right (although maybe some magic in radix_tree_preload()
>> could handle that).
>>
>> Opinions?
> 
> BUG_ON(in_interrupt()) :)

Good point Andrew, it'd be better to "document" the restriction (since
the use is non-sensical). It's actually not CFQ code that does this,
it's the io context management.

Excuse the crappy mailer, but something ala:

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..bcb9b17 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -366,7 +366,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc,
struct
        if (!icq)
                return NULL;

-       if (radix_tree_preload(gfp_mask) < 0) {
+       if ((gfp_mask & __GFP_WAIT) && radix_tree_preload(gfp_mask) < 0) {
                kmem_cache_free(et->icq_cache, icq);
                return NULL;
        }
@@ -394,7 +394,10 @@ struct io_cq *ioc_create_icq(struct io_context
*ioc, struct

        spin_unlock(&ioc->lock);
        spin_unlock_irq(q->queue_lock);
-       radix_tree_preload_end();
+
+       if (gfp_mask & __GFP_WAIT)
+               radix_tree_preload_end();
+
        return icq;
 }



-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>, LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe
Date: Thu, 18 Jul 2013 15:25:41 -0600	[thread overview]
Message-ID: <51E85D55.9000501@kernel.dk> (raw)
In-Reply-To: <20130717161200.40a97074623be2685beb8156@linux-foundation.org>

On 07/17/2013 05:12 PM, Andrew Morton wrote:
> On Tue, 16 Jul 2013 19:06:30 +0200 Jan Kara <jack@suse.cz> wrote:
> 
>> With users of radix_tree_preload() run from interrupt (CFQ is one such
>> possible user), the following race can happen:
>>
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>   radix_tree_node_alloc()
>>     if (rtp->nr) {
>>       ret = rtp->nodes[rtp->nr - 1];
>> <interrupt>
>> ...
>> radix_tree_preload()
>> ...
>> radix_tree_insert()
>>   radix_tree_node_alloc()
>>     if (rtp->nr) {
>>       ret = rtp->nodes[rtp->nr - 1];
>>
>> And we give out one radix tree node twice. That clearly results in radix
>> tree corruption with different results (usually OOPS) depending on which
>> two users of radix tree race.
>>
>> Fix the problem by disabling interrupts when working with rtp variable.
>> In-interrupt user can still deplete our preloaded nodes but at least we
>> won't corrupt radix trees.
>>
>> ...
>>
>>   There are some questions regarding this patch:
>> Do we really want to allow in-interrupt users of radix_tree_preload()?  CFQ
>> could certainly do this in older kernels but that particular call site where I
>> saw the bug hit isn't there anymore so I'm not sure this can really happen with
>> recent kernels.
> 
> Well, it was never anticipated that interrupt-time code would run
> radix_tree_preload().  The whole point in the preloading was to be able
> to perform GFP_KERNEL allocations before entering the spinlocked region
> which needs to allocate memory.
> 
> Doing all that from within an interrupt is daft, because the interrupt code
> can't use GFP_KERNEL anyway.
> 
>> Also it is actually harmful to do preloading if you are in interrupt context
>> anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
>> that we would need to tweak radix_tree_node_alloc() to somehow recognize
>> whether the caller wants it to use preloaded nodes or not and that callers
>> would have to get it right (although maybe some magic in radix_tree_preload()
>> could handle that).
>>
>> Opinions?
> 
> BUG_ON(in_interrupt()) :)

Good point Andrew, it'd be better to "document" the restriction (since
the use is non-sensical). It's actually not CFQ code that does this,
it's the io context management.

Excuse the crappy mailer, but something ala:

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..bcb9b17 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -366,7 +366,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc,
struct
        if (!icq)
                return NULL;

-       if (radix_tree_preload(gfp_mask) < 0) {
+       if ((gfp_mask & __GFP_WAIT) && radix_tree_preload(gfp_mask) < 0) {
                kmem_cache_free(et->icq_cache, icq);
                return NULL;
        }
@@ -394,7 +394,10 @@ struct io_cq *ioc_create_icq(struct io_context
*ioc, struct

        spin_unlock(&ioc->lock);
        spin_unlock_irq(q->queue_lock);
-       radix_tree_preload_end();
+
+       if (gfp_mask & __GFP_WAIT)
+               radix_tree_preload_end();
+
        return icq;
 }



-- 
Jens Axboe


  parent reply	other threads:[~2013-07-18 21:25 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 17:06 [PATCH RFC] lib: Make radix_tree_node_alloc() irq safe Jan Kara
2013-07-16 17:06 ` Jan Kara
2013-07-17 20:14 ` Jens Axboe
2013-07-17 20:14   ` Jens Axboe
2013-07-17 23:12 ` Andrew Morton
2013-07-17 23:12   ` Andrew Morton
2013-07-17 23:16   ` David Daney
2013-07-17 23:16     ` David Daney
2013-07-18 13:09   ` Jan Kara
2013-07-18 13:09     ` Jan Kara
2013-07-18 21:30     ` Jens Axboe
2013-07-18 21:30       ` Jens Axboe
2013-07-22 15:21       ` Jan Kara
2013-07-22 15:21         ` Jan Kara
2013-07-22 15:38         ` Jens Axboe
2013-07-22 15:38           ` Jens Axboe
2013-07-18 21:37     ` Andrew Morton
2013-07-18 21:37       ` Andrew Morton
2013-07-22 20:30       ` Jan Kara
2013-07-22 20:30         ` Jan Kara
2013-07-18 21:25   ` Jens Axboe [this message]
2013-07-18 21:25     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E85D55.9000501@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.