cgroup null pointer dereference

All of lore.kernel.org
 help / color / mirror / Atom feed

* cgroup null pointer dereference
@ 2025-04-23 17:30 Kamaljit Singh
  2025-04-23 21:26 ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Kamaljit Singh @ 2025-04-23 17:30 UTC (permalink / raw)
  To: cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org

Hello,

While running IOs to an nvme fabrics target we're hitting this null pointer which causes 
CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran successfully for ~23 hours.

I did not find any panics listing nvme or block driver calls.

RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
points to rstat.c, cgroup_rstat_push_children(), line 162 under second while() to the following code.

160                 /* updated_next is parent cgroup terminated */
161                 while (child != parent) {
162                         child->rstat_flush_next = head;
163                         head = child;
164                         crstatc = cgroup_rstat_cpu(child, cpu);
165                         grandchild = crstatc->updated_children;

In my test env I've added a null check to 'child' and re-running the long-term test.
I'm wondering if this patch is sufficient to address any underlying issue or is just a band-aid.
Please share any known patches or suggestions.
             -          while (child != parent) {
             +         while (child && child != parent) {

Reference: git://git.infradead.org/nvme.git tags/nvme-6.15-2025-04-10

===========================
2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] BUG: kernel NULL pointer dereference, address: 00000000000003d8
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] #PF: supervisor read access in kernel mode
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] #PF: error_code(0x0000) - not-present page
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] PGD 0 P4D 0
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Oops: Oops: 0000 [#1] SMP NOPTI
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] CPU: 19 UID: 0 PID: 349623 Comm: kworker/u1029:0 Tainted: G            E       6.14.0+ #1 PREEMPT(voluntary)
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Tainted: [E]=UNSIGNED_MODULE
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Hardware name: Supermicro AS -1124US-TNRP/H12DSU-iN, BIOS 1.2 08/10/2020
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Workqueue: events_unbound flush_memcg_stats_dwork
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Code: 0f 85 90 00 00 00 48 85 d2 0f 84 95 00 00 00 4c 8b b2 c0 00 00 00 4c 8b 82 00 04 00 00 49 39 d6 75 08 e9 d8 03 00 00 48 89 f2 <48> 8b 82 d8 03 00 00 4c 89 ba 00 04 00 00 49 81 fd 00 20 00 00 0f
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RSP: 0018:ffffd08eb9a8bd90 EFLAGS: 00010086
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RAX: ffff8eefcb7c9760 RBX: 0000000000000013 RCX: ffff8ef0dd42c000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RBP: ffffd08eb9a8be00 R08: 0000000000000000 R09: 0000000000000000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff89bfd200
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] R13: 0000000000000013 R14: ffffffff89bfd200 R15: ffff8eb1979db000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] FS:  0000000000000000(0000) GS:ffff8ef041434000(0000) knlGS:0000000000000000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] CR2: 00000000000003d8 CR3: 000000113b642000 CR4: 0000000000350ef0
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Call Trace:
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  <TASK>
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  __mem_cgroup_flush_stats+0xf6/0x100
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  flush_memcg_stats_dwork+0x1a/0x50
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  process_one_work+0x191/0x3e0
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  worker_thread+0x2e3/0x420
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? srso_return_thunk+0x5/0x5f
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? __pfx_worker_thread+0x10/0x10
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  kthread+0x10d/0x230
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? __pfx_kthread+0x10/0x10
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ret_from_fork+0x47/0x70
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? __pfx_kthread+0x10/0x10
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ret_from_fork_asm+0x1a/0x30
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  </TASK>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-23 17:30 cgroup null pointer dereference Kamaljit Singh
@ 2025-04-23 21:26 ` Waiman Long
  2025-04-25  0:53   ` Kamaljit Singh
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2025-04-23 21:26 UTC (permalink / raw)
  To: Kamaljit Singh, cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org


On 4/23/25 1:30 PM, Kamaljit Singh wrote:
> Hello,
>
> While running IOs to an nvme fabrics target we're hitting this null pointer which causes
> CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran successfully for ~23 hours.
>
> I did not find any panics listing nvme or block driver calls.
>
> RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
> points to rstat.c, cgroup_rstat_push_children(), line 162 under second while() to the following code.
>
> 160                 /* updated_next is parent cgroup terminated */
> 161                 while (child != parent) {
> 162                         child->rstat_flush_next = head;
> 163                         head = child;
> 164                         crstatc = cgroup_rstat_cpu(child, cpu);
> 165                         grandchild = crstatc->updated_children;
>
> In my test env I've added a null check to 'child' and re-running the long-term test.
> I'm wondering if this patch is sufficient to address any underlying issue or is just a band-aid.
> Please share any known patches or suggestions.
>               -          while (child != parent) {
>               +         while (child && child != parent) {

Child can become NULL only if the updated_next list isn't parent 
terminated. This should not happen. A warning is needed if it really 
happens. I will take a further look to see if there is a bug somewhere.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-23 21:26 ` Waiman Long
@ 2025-04-25  0:53   ` Kamaljit Singh
  2025-04-25  1:33     ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: Kamaljit Singh @ 2025-04-25  0:53 UTC (permalink / raw)
  To: Waiman Long, cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org

Hi Waiman,

>On 4/23/25 1:30 PM, Kamaljit Singh wrote:
>> Hello,
>>
>> While running IOs to an nvme fabrics target we're hitting this null pointer which causes
>> CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran successfully for ~23 hours.
>>
>> I did not find any panics listing nvme or block driver calls.
>>
>> RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
>> points to rstat.c, cgroup_rstat_push_children(), line 162 under second while() to the following code.
>>
>> 160                 /* updated_next is parent cgroup terminated */
>> 161                 while (child != parent) {
>> 162                         child->rstat_flush_next = head;
>> 163                         head = child;
>> 164                         crstatc = cgroup_rstat_cpu(child, cpu);
>> 165                         grandchild = crstatc->updated_children;
>>
>> In my test env I've added a null check to 'child' and re-running the long-term test.
>> I'm wondering if this patch is sufficient to address any underlying issue or is just a band-aid.
>> Please share any known patches or suggestions.
>>               -          while (child != parent) {
>>               +         while (child && child != parent) {
>
>Child can become NULL only if the updated_next list isn't parent
>terminated. This should not happen. A warning is needed if it really
>happens. I will take a further look to see if there is a bug somewhere.

My test re-ran for 36+ hours without any CPU lockups or NMI. This patch seems to have helped.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25  0:53   ` Kamaljit Singh
@ 2025-04-25  1:33     ` Waiman Long
  2025-04-25  1:43       ` Waiman Long
  2025-04-25  1:49       ` Waiman Long
  0 siblings, 2 replies; 14+ messages in thread
From: Waiman Long @ 2025-04-25  1:33 UTC (permalink / raw)
  To: Kamaljit Singh, Waiman Long, cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org


On 4/24/25 8:53 PM, Kamaljit Singh wrote:
> Hi Waiman,
>
>> On 4/23/25 1:30 PM, Kamaljit Singh wrote:
>>> Hello,
>>>
>>> While running IOs to an nvme fabrics target we're hitting this null pointer which causes
>>> CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran successfully for ~23 hours.
>>>
>>> I did not find any panics listing nvme or block driver calls.
>>>
>>> RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
>>> points to rstat.c, cgroup_rstat_push_children(), line 162 under second while() to the following code.
>>>
>>> 160                 /* updated_next is parent cgroup terminated */
>>> 161                 while (child != parent) {
>>> 162                         child->rstat_flush_next = head;
>>> 163                         head = child;
>>> 164                         crstatc = cgroup_rstat_cpu(child, cpu);
>>> 165                         grandchild = crstatc->updated_children;
>>>
>>> In my test env I've added a null check to 'child' and re-running the long-term test.
>>> I'm wondering if this patch is sufficient to address any underlying issue or is just a band-aid.
>>> Please share any known patches or suggestions.
>>>                -          while (child != parent) {
>>>                +         while (child && child != parent) {
>> Child can become NULL only if the updated_next list isn't parent
>> terminated. This should not happen. A warning is needed if it really
>> happens. I will take a further look to see if there is a bug somewhere.
> My test re-ran for 36+ hours without any CPU lockups or NMI. This patch seems to have helped.
>
I now see what is wrong. The cgroup_rstat_push_children() function is 
supposed to be called with cgroup_rstat_lock held, but commit 
093c8812de2d3 ("cgroup: rstat: Cleanup flushing functions and locking") 
changes that. Hence racing can corrupt the list. I will work on a patch 
to fix that regression.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25  1:33     ` Waiman Long
@ 2025-04-25  1:43       ` Waiman Long
  2025-04-25  1:49       ` Waiman Long
  1 sibling, 0 replies; 14+ messages in thread
From: Waiman Long @ 2025-04-25  1:43 UTC (permalink / raw)
  To: Waiman Long, Kamaljit Singh, cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org

On 4/24/25 9:33 PM, Waiman Long wrote:
>
> On 4/24/25 8:53 PM, Kamaljit Singh wrote:
>> Hi Waiman,
>>
>>> On 4/23/25 1:30 PM, Kamaljit Singh wrote:
>>>> Hello,
>>>>
>>>> While running IOs to an nvme fabrics target we're hitting this null 
>>>> pointer which causes
>>>> CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran 
>>>> successfully for ~23 hours.
>>>>
>>>> I did not find any panics listing nvme or block driver calls.
>>>>
>>>> RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
>>>> points to rstat.c, cgroup_rstat_push_children(), line 162 under 
>>>> second while() to the following code.
>>>>
>>>> 160                 /* updated_next is parent cgroup terminated */
>>>> 161                 while (child != parent) {
>>>> 162                         child->rstat_flush_next = head;
>>>> 163                         head = child;
>>>> 164                         crstatc = cgroup_rstat_cpu(child, cpu);
>>>> 165                         grandchild = crstatc->updated_children;
>>>>
>>>> In my test env I've added a null check to 'child' and re-running 
>>>> the long-term test.
>>>> I'm wondering if this patch is sufficient to address any underlying 
>>>> issue or is just a band-aid.
>>>> Please share any known patches or suggestions.
>>>>                -          while (child != parent) {
>>>>                +         while (child && child != parent) {
>>> Child can become NULL only if the updated_next list isn't parent
>>> terminated. This should not happen. A warning is needed if it really
>>> happens. I will take a further look to see if there is a bug somewhere.
>> My test re-ran for 36+ hours without any CPU lockups or NMI. This 
>> patch seems to have helped.
>>
> I now see what is wrong. The cgroup_rstat_push_children() function is 
> supposed to be called with cgroup_rstat_lock held, but commit 
> 093c8812de2d3 ("cgroup: rstat: Cleanup flushing functions and 
> locking") changes that. Hence racing can corrupt the list. I will work 
> on a patch to fix that regression.

Oh, this problem has already been fixed in the cgroup/for-6.16 commit 
7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with 
cgroup_rstat_lock"). It should be in the linux-next tree.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25  1:33     ` Waiman Long
  2025-04-25  1:43       ` Waiman Long
@ 2025-04-25  1:49       ` Waiman Long
  2025-04-25  2:22         ` Kamaljit Singh
  1 sibling, 1 reply; 14+ messages in thread
From: Waiman Long @ 2025-04-25  1:49 UTC (permalink / raw)
  To: Waiman Long, Kamaljit Singh, cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org


On 4/24/25 9:33 PM, Waiman Long wrote:
>
> On 4/24/25 8:53 PM, Kamaljit Singh wrote:
>> Hi Waiman,
>>
>>> On 4/23/25 1:30 PM, Kamaljit Singh wrote:
>>>> Hello,
>>>>
>>>> While running IOs to an nvme fabrics target we're hitting this null 
>>>> pointer which causes
>>>> CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran 
>>>> successfully for ~23 hours.
>>>>
>>>> I did not find any panics listing nvme or block driver calls.
>>>>
>>>> RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
>>>> points to rstat.c, cgroup_rstat_push_children(), line 162 under 
>>>> second while() to the following code.
>>>>
>>>> 160                 /* updated_next is parent cgroup terminated */
>>>> 161                 while (child != parent) {
>>>> 162                         child->rstat_flush_next = head;
>>>> 163                         head = child;
>>>> 164                         crstatc = cgroup_rstat_cpu(child, cpu);
>>>> 165                         grandchild = crstatc->updated_children;
>>>>
>>>> In my test env I've added a null check to 'child' and re-running 
>>>> the long-term test.
>>>> I'm wondering if this patch is sufficient to address any underlying 
>>>> issue or is just a band-aid.
>>>> Please share any known patches or suggestions.
>>>>                -          while (child != parent) {
>>>>                +         while (child && child != parent) {
>>> Child can become NULL only if the updated_next list isn't parent
>>> terminated. This should not happen. A warning is needed if it really
>>> happens. I will take a further look to see if there is a bug somewhere.
>> My test re-ran for 36+ hours without any CPU lockups or NMI. This 
>> patch seems to have helped.
>>
> I now see what is wrong. The cgroup_rstat_push_children() function is 
> supposed to be called with cgroup_rstat_lock held, but commit 
> 093c8812de2d3 ("cgroup: rstat: Cleanup flushing functions and 
> locking") changes that. Hence racing can corrupt the list. I will work 
> on a patch to fix that regression.

It should also be in v6.15-rc1 branch but is missing in the nvme branch 
that you are using. So you need to use a more updated nvme, when 
available, to avoid this problem.

Cheers,
Longman

>
> Cheers,
> Longman
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25  1:49       ` Waiman Long
@ 2025-04-25  2:22         ` Kamaljit Singh
  2025-04-25 14:54           ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Kamaljit Singh @ 2025-04-25  2:22 UTC (permalink / raw)
  To: Waiman Long, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org, hch, kbusch@kernel.org,
	sagi@grimberg.me
  Cc: linux-kernel@vger.kernel.org

Waiman,

>>>>> In my test env I've added a null check to 'child' and re-running
>>>>> the long-term test.
>>>>> I'm wondering if this patch is sufficient to address any underlying
>>>>> issue or is just a band-aid.
>>>>> Please share any known patches or suggestions.
>>>>>                -          while (child != parent) {
>>>>>                +         while (child && child != parent) {
>>>> Child can become NULL only if the updated_next list isn't parent
>>>> terminated. This should not happen. A warning is needed if it really
>>>> happens. I will take a further look to see if there is a bug somewhere.
>>> My test re-ran for 36+ hours without any CPU lockups or NMI. This
>>> patch seems to have helped.
>>>
>> I now see what is wrong. The cgroup_rstat_push_children() function is
>> supposed to be called with cgroup_rstat_lock held, but commit
>> 093c8812de2d3 ("cgroup: rstat: Cleanup flushing functions and
>> locking") changes that. Hence racing can corrupt the list. I will work
>> on a patch to fix that regression.
>
>It should also be in v6.15-rc1 branch but is missing in the nvme branch
>that you are using. So you need to use a more updated nvme, when
>available, to avoid this problem.
>
Thank you for finding that commit. I'll look for it.

Christoph, Sagi, Keith, Others,
Can this commit be merged into the nvme-6.15 branch please?

Thanks & Regards,
Kamaljit

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25  2:22         ` Kamaljit Singh
@ 2025-04-25 14:54           ` hch
  2025-04-25 15:04             ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2025-04-25 14:54 UTC (permalink / raw)
  To: Kamaljit Singh
  Cc: Waiman Long, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org, hch, kbusch@kernel.org,
	sagi@grimberg.me, linux-kernel@vger.kernel.org

On Fri, Apr 25, 2025 at 02:22:31AM +0000, Kamaljit Singh wrote:
> >It should also be in v6.15-rc1 branch but is missing in the nvme branch
> >that you are using. So you need to use a more updated nvme, when
> >available, to avoid this problem.
> >
> Thank you for finding that commit. I'll look for it.
> 
> Christoph, Sagi, Keith, Others,
> Can this commit be merged into the nvme-6.15 branch please?

What commit?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25 14:54           ` hch
@ 2025-04-25 15:04             ` Waiman Long
  2025-04-25 15:11               ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2025-04-25 15:04 UTC (permalink / raw)
  To: hch, Kamaljit Singh
  Cc: Waiman Long, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org, kbusch@kernel.org,
	sagi@grimberg.me, linux-kernel@vger.kernel.org


On 4/25/25 10:54 AM, hch wrote:
> On Fri, Apr 25, 2025 at 02:22:31AM +0000, Kamaljit Singh wrote:
>>> It should also be in v6.15-rc1 branch but is missing in the nvme branch
>>> that you are using. So you need to use a more updated nvme, when
>>> available, to avoid this problem.
>>>
>> Thank you for finding that commit. I'll look for it.
>>
>> Christoph, Sagi, Keith, Others,
>> Can this commit be merged into the nvme-6.15 branch please?
> What commit?
>
commit 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with 
cgroup_rstat_lock")

Cheers,
Longman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25 15:04             ` Waiman Long
@ 2025-04-25 15:11               ` hch
  2025-04-25 15:22                 ` Waiman Long
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2025-04-25 15:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: hch, Kamaljit Singh, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org, kbusch@kernel.org,
	sagi@grimberg.me, linux-kernel@vger.kernel.org

On Fri, Apr 25, 2025 at 11:04:58AM -0400, Waiman Long wrote:
>
> On 4/25/25 10:54 AM, hch wrote:
>> On Fri, Apr 25, 2025 at 02:22:31AM +0000, Kamaljit Singh wrote:
>>>> It should also be in v6.15-rc1 branch but is missing in the nvme branch
>>>> that you are using. So you need to use a more updated nvme, when
>>>> available, to avoid this problem.
>>>>
>>> Thank you for finding that commit. I'll look for it.
>>>
>>> Christoph, Sagi, Keith, Others,
>>> Can this commit be merged into the nvme-6.15 branch please?
>> What commit?
>>
> commit 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with 
> cgroup_rstat_lock")

I don't see how that is relevant for the nvme tree?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25 15:11               ` hch
@ 2025-04-25 15:22                 ` Waiman Long
  2025-04-25 15:26                   ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2025-04-25 15:22 UTC (permalink / raw)
  To: hch, Waiman Long
  Cc: Kamaljit Singh, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org, kbusch@kernel.org,
	sagi@grimberg.me, linux-kernel@vger.kernel.org


On 4/25/25 11:11 AM, hch wrote:
> On Fri, Apr 25, 2025 at 11:04:58AM -0400, Waiman Long wrote:
>> On 4/25/25 10:54 AM, hch wrote:
>>> On Fri, Apr 25, 2025 at 02:22:31AM +0000, Kamaljit Singh wrote:
>>>>> It should also be in v6.15-rc1 branch but is missing in the nvme branch
>>>>> that you are using. So you need to use a more updated nvme, when
>>>>> available, to avoid this problem.
>>>>>
>>>> Thank you for finding that commit. I'll look for it.
>>>>
>>>> Christoph, Sagi, Keith, Others,
>>>> Can this commit be merged into the nvme-6.15 branch please?
>>> What commit?
>>>
>> commit 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with
>> cgroup_rstat_lock")
> I don't see how that is relevant for the nvme tree?
>
The nvme-6.15-2025-04-10 branch used by Kmaljit includes some v6.15
commits like the cgroup commit 093c8812de2d3 ("cgroup: rstat:
Cleanup flushing functions and locking") but not its fix commit
7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with
cgroup_rstat_lock"). That can cause system crash in some cases. That
problem will be resolved if nvme is rebased on top of v6.15-rc1 or
later as the fix commit will be included.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25 15:22                 ` Waiman Long
@ 2025-04-25 15:26                   ` hch
  2025-04-25 17:20                       ` Kamaljit Singh
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2025-04-25 15:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: hch, Kamaljit Singh, cgroups@vger.kernel.org,
	linux-nvme@lists.infradead.org, kbusch@kernel.org,
	sagi@grimberg.me, linux-kernel@vger.kernel.org

On Fri, Apr 25, 2025 at 11:22:50AM -0400, Waiman Long wrote:
> The nvme-6.15-2025-04-10 branch used by Kmaljit includes some v6.15
> commits like the cgroup commit 093c8812de2d3 ("cgroup: rstat:
> Cleanup flushing functions and locking") but not its fix commit
> 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with
> cgroup_rstat_lock"). That can cause system crash in some cases. That
> problem will be resolved if nvme is rebased on top of v6.15-rc1 or
> later as the fix commit will be included.

The nvme branches are always rebased on top of the current relevant block
branches, i.e. block-6.15 in thise case.  Everything else would create
merge issues.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
  2025-04-25 15:26                   ` hch
@ 2025-04-25 17:20                       ` Kamaljit Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Kamaljit Singh @ 2025-04-25 17:20 UTC (permalink / raw)
  To: hch, Waiman Long
  Cc: hch, cgroups@vger.kernel.org, linux-nvme@lists.infradead.org,
	kbusch@kernel.org, sagi@grimberg.me, linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

Christoph,

>On Fri, Apr 25, 2025 at 11:22:50AM -0400, Waiman Long wrote:
>> The nvme-6.15-2025-04-10 branch used by Kmaljit includes some v6.15
>> commits like the cgroup commit 093c8812de2d3 ("cgroup: rstat:
>> Cleanup flushing functions and locking") but not its fix commit
>> 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with
>> cgroup_rstat_lock"). That can cause system crash in some cases. That
>> problem will be resolved if nvme is rebased on top of v6.15-rc1 or
>> later as the fix commit will be included.
>
>The nvme branches are always rebased on top of the current relevant block
>branches, i.e. block-6.15 in thise case.  Everything else would create
>merge issues.
Please ignore my request. 

I'll pull in from the mainline. Thanks Damien!

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 13248 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: cgroup null pointer dereference
@ 2025-04-25 17:20                       ` Kamaljit Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Kamaljit Singh @ 2025-04-25 17:20 UTC (permalink / raw)
  To: hch, Waiman Long
  Cc: hch, cgroups@vger.kernel.org, linux-nvme@lists.infradead.org,
	kbusch@kernel.org, sagi@grimberg.me, linux-kernel@vger.kernel.org

Christoph,

>On Fri, Apr 25, 2025 at 11:22:50AM -0400, Waiman Long wrote:
>> The nvme-6.15-2025-04-10 branch used by Kmaljit includes some v6.15
>> commits like the cgroup commit 093c8812de2d3 ("cgroup: rstat:
>> Cleanup flushing functions and locking") but not its fix commit
>> 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with
>> cgroup_rstat_lock"). That can cause system crash in some cases. That
>> problem will be resolved if nvme is rebased on top of v6.15-rc1 or
>> later as the fix commit will be included.
>
>The nvme branches are always rebased on top of the current relevant block
>branches, i.e. block-6.15 in thise case.  Everything else would create
>merge issues.
Please ignore my request. 

I'll pull in from the mainline. Thanks Damien!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-04-25 18:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 17:30 cgroup null pointer dereference Kamaljit Singh
2025-04-23 21:26 ` Waiman Long
2025-04-25  0:53   ` Kamaljit Singh
2025-04-25  1:33     ` Waiman Long
2025-04-25  1:43       ` Waiman Long
2025-04-25  1:49       ` Waiman Long
2025-04-25  2:22         ` Kamaljit Singh
2025-04-25 14:54           ` hch
2025-04-25 15:04             ` Waiman Long
2025-04-25 15:11               ` hch
2025-04-25 15:22                 ` Waiman Long
2025-04-25 15:26                   ` hch
2025-04-25 17:20                     ` Kamaljit Singh
2025-04-25 17:20                       ` Kamaljit Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.