[BUG] possible race between md_free_disk and md_notify

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] possible race between md_free_disk and md_notify_reboot
@ 2025-02-19 19:43 Guillaume Morin
  2025-02-20  1:26 ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Guillaume Morin @ 2025-02-19 19:43 UTC (permalink / raw)
  To: linux-raid; +Cc: linux-kernel, song, yukuai3, guillaume

Hello, we experienced the following GPF during a systemd shutdown

[ 1221.198632] systemd-shutdown[1]: Rebooting.                                                                                                                                                                        
[ 1221.602276] md: md0: resync interrupted.                                                                                                                                                                           
[ 1221.604492] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI                                                                                                   
[ 1221.613621] CPU: 87 PID: 1 Comm: systemd-shutdow Kdump: loaded Tainted: G           O       6.6.57 #1                                                                                                       
[ 1221.632201] RIP: 0010:md_notify_reboot+0xe8/0x150                                                                                                                                                                  
[ 1221.635868] Code: c6 58 59 40 b9 4c 89 f7 e8 65 d3 23 00 85 c0 74 08 4c 89 e7 e8 49 3b ff ff 48 c7 c7 58 59 40 b9 e8 ad 0f 28 00 b9 01 00 00 00 <48> 8b 83 d8 03 00 00 48 8d 93 d8 03 00 00 49 89 dc 48 2d d8 03 00
[ 1221.653555] RSP: 0018:ffffa7f200067d28 EFLAGS: 00010202                                                                                                                                                            
[ 1221.657740] RAX: 0000000000001002 RBX: deacfffffffffd28 RCX: 0000000000000001                                                                                                                                      
[ 1221.663829] RDX: ffff8a32cf2c33d8 RSI: ffffffffb9405958 RDI: ffffffffb9405958                                                                                                                                      
[ 1221.669918] RBP: ffffa7f200067d48 R08: 0000000000000000 R09: ffffa7f200067c70                                                                                                                                      
[ 1221.676008] R10: 0000000000000005 R11: 0000000000000057 R12: ffff8a32cf2c3000                                                                                                                                      
[ 1221.682097] R13: ffffffffb9405958 R14: ffff89b356ca0218 R15: ffffffffb8f03620                                                                                                                                      
[ 1221.688186] FS:  00007f6877c24900(0000) GS:ffff8a30bffc0000(0000) knlGS:0000000000000000                                                                                                                           
[ 1221.695226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                      
[ 1221.699930] CR2: 00007f6877c015a0 CR3: 00000003bb402005 CR4: 0000000000770ee0                                                                                                                                      
[ 1221.706019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                                                                      
[ 1221.712110] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                                                                      
[ 1221.718197] PKRU: 55555554                                                                                                                                                                                         
[ 1221.719885] Call Trace:                                                                                                                                                                                            
[ 1221.721311]  <TASK>                                                                                                                                                                                                
[ 1221.722394]  ? show_regs+0x6a/0x80                                                                                                                                                                                 
[ 1221.724772]  ? die_addr+0x38/0xa0                                                                                                                                                                                  
[ 1221.727065]  ? exc_general_protection+0x192/0x2f0                                                                                                                                                                  
[ 1221.730739]  ? asm_exc_general_protection+0x27/0x30                                                                                                                                                                
[ 1221.734589]  ? md_notify_reboot+0xe8/0x150                                                                                                                                                                         
[ 1221.737660]  ? md_notify_reboot+0xe3/0x150                                                                                                                                                                         
[ 1221.740727]  notifier_call_chain+0x5c/0xc0                                                                                                                                                                         
[ 1221.743799]  blocking_notifier_call_chain+0x43/0x60                                                                                                                                                                
[ 1221.747648]  kernel_restart+0x22/0xa0                                                                                                                                                                              
[ 1221.750286]  __do_sys_reboot+0x1a2/0x220                                                                                                                                                                           
[ 1221.753183]  ? vfs_writev+0xd8/0x150                                                                                                                                                                               
[ 1221.755735]  ? __fput+0x198/0x320                                                                                                                                                                                  
[ 1221.758028]  ? do_writev+0x75/0x120                                                                                                                                                                                
[ 1221.760491]  __x64_sys_reboot+0x1e/0x30                                                                                                                                                                            
[ 1221.763304]  x64_sys_call+0x2000/0x2020                                                                                                                                                                            
[ 1221.766114]  do_syscall_64+0x33/0x80                                                                                                                                                                               
[ 1221.768665]  entry_SYSCALL_64_after_hwframe+0x78/0xe2                                                                                                                                                              
[ 1221.772689] RIP: 0033:0x7f68776d0453                                                                                                                                                                               
[ 1221.775258] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 09 fa 52 00 f7 d8
[ 1221.793040] RSP: 002b:00007ffdfee13e28 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9                                                                                                                                 
[ 1221.799570] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f68776d0453                                                                                                                                      
[ 1221.805666] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead                                                                                                                                      
[ 1221.811764] RBP: 00007f6877c247b0 R08: 0000000000000000 R09: 00007ffdfee13230                                                                                                                                      
[ 1221.817862] R10: 00007f6877c247b0 R11: 0000000000000206 R12: 0000000000000000                                                                                                                                      
[ 1221.823960] R13: 00007ffdfee13e90 R14: 00000000ffffffff R15: 0000000000000000                                                                                                                                      
[ 1221.830059]  </TASK>                                        

md_notify_reboot() tried to load a list poison value which triggered the
GPF.  The poison values are set during list_del(&mddev->all_mddevs). The
only call for this list is in mddev_free() called by md_free_disk()
(well it's also called in md_alloc() on failure but it's not relevant
here).  The kdump show all cpus besides the one triggering the GPF to be
idle.  Note that this crash happened on 6.6.x but afaict the related code
is mostly the same in the current tree.

We believe there is a simple race between the list_for_each_entry_safe()
loop in md_notify_reboot() which releases all_mddevs_lock during its
iteration over all_mddevs and mddev_free() which removes the mddev from
this list.
mddev_free() only grabs that lock before calling list_del().  If the
item pointed by the "n" pointer passed to list_for_each_entry_safe() is
removed while we're processing the current item, n->all_mddevs will
contain poisoned values.  Additionally since mddev_free() also frees the
mddev, there might be a possible use-after-free issue as well since
mddev_free() seems to make no attempt to avoid concurrent access to the
mddev.

There seems to be nothing in the code that tries to prevent this
specific race and not being familiar with this code I am not sure what
the right fix would be. In the current state of the code, it does not
seem safe to release all_mddevs_lock if mddev_free() can be called
concurrently.

Please let me know if we're missing anything or can provide some
additional info.

HTH

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-19 19:43 [BUG] possible race between md_free_disk and md_notify_reboot Guillaume Morin
@ 2025-02-20  1:26 ` Yu Kuai
  2025-02-20  3:05   ` Guillaume Morin
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2025-02-20  1:26 UTC (permalink / raw)
  To: Guillaume Morin, linux-raid; +Cc: linux-kernel, song, yukuai (C)

Hi

在 2025/02/20 3:43, Guillaume Morin 写道:
> There seems to be nothing in the code that tries to prevent this
> specific race

Did you noticed that mddev_get() is called from md_notify_reboot() while
the lock is still held, and how can mddev_free() can race here if
mddev_get() succeed?

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20  1:26 ` Yu Kuai
@ 2025-02-20  3:05   ` Guillaume Morin
  2025-02-20  3:19     ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Guillaume Morin @ 2025-02-20  3:05 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, linux-kernel, song, yukuai

> On Feb 19, 2025, at 10:26 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> 
> Hi
> 
>> 在 2025/02/20 3:43, Guillaume Morin 写道:
>> There seems to be nothing in the code that tries to prevent this
>> specific race
> 
> Did you noticed that mddev_get() is called from md_notify_reboot() while
> the lock is still held, and how can mddev_free() can race here if
> mddev_get() succeed?
> 

Yes, I did notice that. Though it was not clear to me how it was guaranteed that mddev_get() would fail as mddev_free() does not check or synchronize with the active atomic. But i might be missing some logic outside of md.c. If there is guarantee that mddev_get() fails after mddev_free is called this would definitely address the uaf concern. 

However  the main race we have reported is where the item pointed by the *next* pointer (n) in the loop is erased by mddev_free().  The next pointer is saved by the list iteration macro and afaict there is nothing preventing from this item to be deleted while the lock is released and then the poisoned values to be accessed at the end of loop when the next pointer is used. 

If this is incorrect, I am not sure how the crash would  happen. I should have mentioned that originally but it is reproducible so it’s not a one off issue. Do you see  an alternative explanation for the crash?

Thanks in advance for your help

> Thanks,
> Kuai
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20  3:05   ` Guillaume Morin
@ 2025-02-20  3:19     ` Yu Kuai
  2025-02-20  3:45       ` Guillaume Morin
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2025-02-20  3:19 UTC (permalink / raw)
  To: Guillaume Morin, Yu Kuai; +Cc: linux-raid, linux-kernel, song

Hi,

在 2025/02/20 11:05, Guillaume Morin 写道:
> how it was guaranteed that mddev_get() would fail as mddev_free() does not check or synchronize with the active atomic

Please check how mddev is freed, start from mddev_put(). There might be
something wrong, but it's not what you said.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20  3:19     ` Yu Kuai
@ 2025-02-20  3:45       ` Guillaume Morin
  2025-02-20  4:06         ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Guillaume Morin @ 2025-02-20  3:45 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Guillaume Morin, Yu Kuai, linux-raid, linux-kernel, song

On 20 Feb 11:19, Yu Kuai wrote:
>
> Hi,
> 
> 在 2025/02/20 11:05, Guillaume Morin 写道:
> > how it was guaranteed that mddev_get() would fail as mddev_free() does not check or synchronize with the active atomic
> 
> Please check how mddev is freed, start from mddev_put(). There might be
> something wrong, but it's not what you said.

I will take a look. Though if you're confident that this logic protects
any uaf, that makes sense to me.

However as I mentioned this is not what the crash was about (I mentioned
the UAF in passing). The GPF seems to be about deleting the _next_
pointer while iterating over all mddevs. The mddev_get on the
current item is not going to help with this.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20  3:45       ` Guillaume Morin
@ 2025-02-20  4:06         ` Yu Kuai
  2025-02-20 11:55           ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2025-02-20  4:06 UTC (permalink / raw)
  To: Guillaume Morin; +Cc: Yu Kuai, linux-raid, linux-kernel, song, yukuai (C)

Hi,

在 2025/02/20 11:45, Guillaume Morin 写道:
> On 20 Feb 11:19, Yu Kuai wrote:
>>
>> Hi,
>>
>> 在 2025/02/20 11:05, Guillaume Morin 写道:
>>> how it was guaranteed that mddev_get() would fail as mddev_free() does not check or synchronize with the active atomic
>>
>> Please check how mddev is freed, start from mddev_put(). There might be
>> something wrong, but it's not what you said.
> 
> I will take a look. Though if you're confident that this logic protects
> any uaf, that makes sense to me.
> 
> However as I mentioned this is not what the crash was about (I mentioned
> the UAF in passing). The GPF seems to be about deleting the _next_
> pointer while iterating over all mddevs. The mddev_get on the
> current item is not going to help with this.
> 

You don't need to emphasize this, it is still speculate without solid
theoretical analysis. The point about mddev_get() is that it's done
inside the lock, it shoud gurantee continue iterating should be fine.

I just take a quick look, the problem looks obviously to me, see how
md_seq_show() handle the iteration.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 465ca2af1e6e..7c7a58f618c1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9911,8 +9911,11 @@ static int md_notify_reboot(struct notifier_block 
*this,
                         mddev_unlock(mddev);
                 }
                 need_delay = 1;
-               mddev_put(mddev);
-               spin_lock(&all_mddevs_lock);
+
+               spin_lock(&all_mddevs_lock)
+               if (atomic_dec_and_test(&mddev->active))
+                       __mddev_put(mddev);
+
         }
         spin_unlock(&all_mddevs_lock);


Thanks,
Kuai


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20  4:06         ` Yu Kuai
@ 2025-02-20 11:55           ` Yu Kuai
  2025-02-20 13:39             ` Guillaume Morin
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2025-02-20 11:55 UTC (permalink / raw)
  To: Yu Kuai, Guillaume Morin; +Cc: linux-raid, linux-kernel, song, yukuai (C)

Hi,

在 2025/02/20 12:06, Yu Kuai 写道:
> Hi,
> 
> 在 2025/02/20 11:45, Guillaume Morin 写道:
>> On 20 Feb 11:19, Yu Kuai wrote:
>>>
>>> Hi,
>>>
>>> 在 2025/02/20 11:05, Guillaume Morin 写道:
>>>> how it was guaranteed that mddev_get() would fail as mddev_free() 
>>>> does not check or synchronize with the active atomic
>>>
>>> Please check how mddev is freed, start from mddev_put(). There might be
>>> something wrong, but it's not what you said.
>>
>> I will take a look. Though if you're confident that this logic protects
>> any uaf, that makes sense to me.
>>
>> However as I mentioned this is not what the crash was about (I mentioned
>> the UAF in passing). The GPF seems to be about deleting the _next_
>> pointer while iterating over all mddevs. The mddev_get on the
>> current item is not going to help with this.
>>
> 
> You don't need to emphasize this, it is still speculate without solid
> theoretical analysis. The point about mddev_get() is that it's done
> inside the lock, it shoud gurantee continue iterating should be fine.
> 
> I just take a quick look, the problem looks obviously to me, see how
> md_seq_show() handle the iteration.
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 465ca2af1e6e..7c7a58f618c1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9911,8 +9911,11 @@ static int md_notify_reboot(struct notifier_block 
> *this,
>                          mddev_unlock(mddev);
>                  }
>                  need_delay = 1;
> -               mddev_put(mddev);
> -               spin_lock(&all_mddevs_lock);
> +
> +               spin_lock(&all_mddevs_lock)
> +               if (atomic_dec_and_test(&mddev->active))
> +                       __mddev_put(mddev);
> +
>          }
>          spin_unlock(&all_mddevs_lock);

While cooking the patch, this is not enough, list_for_each_entry_safe()
should be replaced with list_for_each_entry() as well.

Will send the patch soon, with:

Reported-by: Guillaume Morin <guillaume@morinfr.org>

Thanks,
Kuai
> 
> 
> Thanks,
> Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20 11:55           ` Yu Kuai
@ 2025-02-20 13:39             ` Guillaume Morin
  2025-02-21  1:27               ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Guillaume Morin @ 2025-02-20 13:39 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Guillaume Morin, linux-raid, linux-kernel, song, yukuai (C)

On 20 Feb 19:55, Yu Kuai wrote:
>
> > I just take a quick look, the problem looks obviously to me, see how
> > md_seq_show() handle the iteration.
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 465ca2af1e6e..7c7a58f618c1 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -9911,8 +9911,11 @@ static int md_notify_reboot(struct notifier_block
> > *this,
> >                          mddev_unlock(mddev);
> >                  }
> >                  need_delay = 1;
> > -               mddev_put(mddev);
> > -               spin_lock(&all_mddevs_lock);
> > +
> > +               spin_lock(&all_mddevs_lock)
> > +               if (atomic_dec_and_test(&mddev->active))
> > +                       __mddev_put(mddev);
> > +
> >          }
> >          spin_unlock(&all_mddevs_lock);
> 
> While cooking the patch, this is not enough, list_for_each_entry_safe()
> should be replaced with list_for_each_entry() as well.
> 
> Will send the patch soon, with:
> 
> Reported-by: Guillaume Morin <guillaume@morinfr.org>

Thank you! I just saw the patch and we are going to test it and let you
know.

The issue with the next pointer seems to be fixed with your change.
Though I am still unclear how the 2nd potential issue I mentioned -
where the current item would be freed concurrently by mddev_free() - is
prevented. I am not finding anything in the code that seems to prevent a
concurrent call to mddev_free() for the current item in the
list_for_each_entry() loop (and therefore accessing mddev after the
kfree()).

I understand that we are getting a reference through the active atomic
in mddev_get() under the lock in md_notify_reboot() but how is that
preventing mddev_free() from freeing the mddev as soon as we release the
all_mddevs_lock in the loop?

I am not not familiar with this code so I am most likely missing
osmething but if you had the time to explain, that would be very
helpful.

TIA

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-20 13:39             ` Guillaume Morin
@ 2025-02-21  1:27               ` Yu Kuai
  2025-02-21 15:19                 ` Guillaume Morin
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2025-02-21  1:27 UTC (permalink / raw)
  To: Guillaume Morin, Yu Kuai; +Cc: linux-raid, linux-kernel, song, yukuai (C)

Hi,

在 2025/02/20 21:39, Guillaume Morin 写道:
> On 20 Feb 19:55, Yu Kuai wrote:
>>
>>> I just take a quick look, the problem looks obviously to me, see how
>>> md_seq_show() handle the iteration.
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 465ca2af1e6e..7c7a58f618c1 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -9911,8 +9911,11 @@ static int md_notify_reboot(struct notifier_block
>>> *this,
>>>                           mddev_unlock(mddev);
>>>                   }
>>>                   need_delay = 1;
>>> -               mddev_put(mddev);
>>> -               spin_lock(&all_mddevs_lock);
>>> +
>>> +               spin_lock(&all_mddevs_lock)
>>> +               if (atomic_dec_and_test(&mddev->active))
>>> +                       __mddev_put(mddev);
>>> +
>>>           }
>>>           spin_unlock(&all_mddevs_lock);
>>
>> While cooking the patch, this is not enough, list_for_each_entry_safe()
>> should be replaced with list_for_each_entry() as well.
>>
>> Will send the patch soon, with:
>>
>> Reported-by: Guillaume Morin <guillaume@morinfr.org>
> 
> Thank you! I just saw the patch and we are going to test it and let you
> know.
> 
> The issue with the next pointer seems to be fixed with your change.
> Though I am still unclear how the 2nd potential issue I mentioned -
> where the current item would be freed concurrently by mddev_free() - is
> prevented. I am not finding anything in the code that seems to prevent a
> concurrent call to mddev_free() for the current item in the
> list_for_each_entry() loop (and therefore accessing mddev after the
> kfree()).
> 
> I understand that we are getting a reference through the active atomic
> in mddev_get() under the lock in md_notify_reboot() but how is that
> preventing mddev_free() from freeing the mddev as soon as we release the
> all_mddevs_lock in the loop?
> 
> I am not not familiar with this code so I am most likely missing
> osmething but if you had the time to explain, that would be very
> helpful.

I'm not quite sure what you're confused. mddev lifetime are both
protected by lock and reference.

In this case:

hold lock
get first mddev
release lock
// handle first mddev

hold lock
release mddev
get next mddev
release lock
-> mddev can be freed now
// handle the next mddev
...

Thanks,
Kuai

> 
> TIA
> 
> Guillaume.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-21  1:27               ` Yu Kuai
@ 2025-02-21 15:19                 ` Guillaume Morin
  2025-02-22  1:08                   ` Yu Kuai
  0 siblings, 1 reply; 11+ messages in thread
From: Guillaume Morin @ 2025-02-21 15:19 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Guillaume Morin, linux-raid, linux-kernel, song, yukuai (C)

On 21 Feb  9:27, Yu Kuai wrote:
> > The issue with the next pointer seems to be fixed with your change.
> > Though I am still unclear how the 2nd potential issue I mentioned -
> > where the current item would be freed concurrently by mddev_free() - is
> > prevented. I am not finding anything in the code that seems to prevent a
> > concurrent call to mddev_free() for the current item in the
> > list_for_each_entry() loop (and therefore accessing mddev after the
> > kfree()).
> > 
> > I understand that we are getting a reference through the active atomic
> > in mddev_get() under the lock in md_notify_reboot() but how is that
> > preventing mddev_free() from freeing the mddev as soon as we release the
> > all_mddevs_lock in the loop?
> > 
> > I am not not familiar with this code so I am most likely missing
> > osmething but if you had the time to explain, that would be very
> > helpful.
> 
> I'm not quite sure what you're confused. mddev lifetime are both
> protected by lock and reference.
> 
> In this case:
> 
> hold lock
> get first mddev
> release lock
> // handle first mddev
> 
> hold lock
> release mddev
> get next mddev
> release lock
> -> mddev can be freed now
> // handle the next mddev
> ...
> 

In my original message, I mentioned 2 potential issues
Let's say md_notify_reboot() is handling mddev N from the all_mddevs
list.

1) The GPF we pasted in the original message happened when mddev_free()
is called concurrently for mddev N+1. This lead to the GPF since the cpu
would try to load the list poisoned values from the 'n' pointer.

Your patch definitely fixes this race and we cannot reproduce the GPF
anymore.

2) Instead of mddev_free() being called for mddev N+1 like in 1, I wonder
what's preventing mddev_free() being called for mddev N (the one we're
iterating about). Something like

CPU1							CPU2
list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
if (!mddev_get(mddev))
    continue;
spin_unlock(&all_mddevs_lock);
						        mddev_free(mddev) (same mddev as CPU1)

mddev_free() does not check the active atomic, or acquire the
reconfig_mutex/md_lock and will kfree() mddev. So the loop execution
on CPU1 after spin_unlock() could be a UAF.

So I was wondering if you could clarify what is preventing race 2?
i.e what is preventing mddev_free(mddev) from being calling kfree(mddev)
while the md_notify_reboot() loop is handling mddev.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] possible race between md_free_disk and md_notify_reboot
  2025-02-21 15:19                 ` Guillaume Morin
@ 2025-02-22  1:08                   ` Yu Kuai
  0 siblings, 0 replies; 11+ messages in thread
From: Yu Kuai @ 2025-02-22  1:08 UTC (permalink / raw)
  To: Guillaume Morin, Yu Kuai; +Cc: linux-raid, linux-kernel, song, yukuai (C)

Hi,

在 2025/02/21 23:19, Guillaume Morin 写道:
> 2) Instead of mddev_free() being called for mddev N+1 like in 1, I wonder
> what's preventing mddev_free() being called for mddev N (the one we're
> iterating about). Something like
> 
> CPU1							CPU2
> list_for_each_entry(mddev, &all_mddevs, all_mddevs) {
> if (!mddev_get(mddev))
>      continue;
> spin_unlock(&all_mddevs_lock);
> 						        mddev_free(mddev) (same mddev as CPU1)
> 
> mddev_free() does not check the active atomic, or acquire the
> reconfig_mutex/md_lock and will kfree() mddev. So the loop execution
> on CPU1 after spin_unlock() could be a UAF.
> 
> So I was wondering if you could clarify what is preventing race 2?
> i.e what is preventing mddev_free(mddev) from being calling kfree(mddev)
> while the md_notify_reboot() loop is handling mddev.

I said already, please take a look how mddev_put() works, mddev_free()
can't be called untill the reference is released after mddev_get().

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-02-22  1:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 19:43 [BUG] possible race between md_free_disk and md_notify_reboot Guillaume Morin
2025-02-20  1:26 ` Yu Kuai
2025-02-20  3:05   ` Guillaume Morin
2025-02-20  3:19     ` Yu Kuai
2025-02-20  3:45       ` Guillaume Morin
2025-02-20  4:06         ` Yu Kuai
2025-02-20 11:55           ` Yu Kuai
2025-02-20 13:39             ` Guillaume Morin
2025-02-21  1:27               ` Yu Kuai
2025-02-21 15:19                 ` Guillaume Morin
2025-02-22  1:08                   ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).