[PATCH RFC 0/2] mm/page_alloc: free_pcppages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
@ 2023-08-18  6:05 Chris Li
  2023-08-18  6:05 ` [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk Chris Li
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Chris Li @ 2023-08-18  6:05 UTC (permalink / raw)
  To: Andrew Morton, Kemeng Shi
  Cc: akpm, baolin.wang, mgorman, Michal Hocko, david, willy, linux-mm,
	Namhyung Kim, Greg Thelen, linux-kernel, Chris Li, John Sperbeck

In this patch series I want to safeguard
the free_pcppage_bulk against change in the
pcp->count outside of this function. e.g.
by BPF program inject on the function tracepoint.

I break up the patches into two seperate patches
for the safeguard and clean up.

Hopefully that is easier to review.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
Chris Li (2):
      mm/page_alloc: safeguard free_pcppages_bulk
      mm/page_alloc: free_pcppages_bulk clean up

 mm/page_alloc.c | 44 +++++++++++++-------------------------------
 1 file changed, 13 insertions(+), 31 deletions(-)
---
base-commit: 5fb2ea3111f4ecc6dc4891ce5b00f0217aae9a04
change-id: 20230817-free_pcppages_bulk-facc18d6fee7

Best regards,
-- 
Chris Li <chrisl@kernel.org>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk
  2023-08-18  6:05 [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Chris Li
@ 2023-08-18  6:05 ` Chris Li
  2023-08-18  6:05 ` [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up Chris Li
  2023-08-21 10:32 ` [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Mel Gorman
  2 siblings, 0 replies; 17+ messages in thread
From: Chris Li @ 2023-08-18  6:05 UTC (permalink / raw)
  To: Andrew Morton, Kemeng Shi
  Cc: akpm, baolin.wang, mgorman, Michal Hocko, david, willy, linux-mm,
	Namhyung Kim, Greg Thelen, linux-kernel, Chris Li, John Sperbeck

The current free_pcppages_bulk() can panic when
pcp->count is changed outside of this function by
the BPF program injected in ftrace function entry.

Commit c66a36af7ba3a628 was to fix on the BPF program side
to not allocate memory inside the spinlock.

But the kernel can still panic loading similar BPF without the fix.
Here is the step to reproduce it:

$ git checkout 19030564ab116757e32
$ cd tools/perf
$ make perf
$ ./perf lock con -ab -- ./perf bench sched messaging

You should be able to see the kernel panic within 20 seconds.

Here is what happened in the panic:

count = min(pcp->count, count);

free_pcppages_bulk() assumes count and pcp->count are in sync.
There are no pcp->count changes outside of this function.

That assumption gets broken when BPF lock contention code
allocates memory inside spinlock. pcp->count is one less than
"count". The loop only checks against "count" and runs into
a deadloop because pcp->count drops to zero and all lists
are empty. In a deadloop pindex_min can grow bigger than pindex_max
and pindex_max can lower to negative. The kernel panic is happening
on the pindex trying to access outside of pcp->lists ranges.

Notice that this is just one of the (buggy) BPF programs that
can break it.  Other than the spin lock, there are other function
tracepoints under this function can be hooked up to the BPF program
which can allocate memory and change the pcp->count.

One argument is that BPF should not allocate memory under the
spinlock. On the other hand, the kernel can just check pcp->count
inside the loop to avoid the kernel panic.

Signed-off-by: Chris Li <chrisl@kernel.org>
Reported-by: John Sperbeck<jsperbeck@google.com>
---
 mm/page_alloc.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1eb3864e1dbc7..347cb93081a02 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1215,12 +1215,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	bool isolated_pageblocks;
 	struct page *page;

-	/*
-	 * Ensure proper count is passed which otherwise would stuck in the
-	 * below while (list_empty(list)) loop.
-	 */
-	count = min(pcp->count, count);
-
 	/* Ensure requested pindex is drained first. */
 	pindex = pindex - 1;

@@ -1266,7 +1260,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,

 			__free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
 			trace_mm_page_pcpu_drain(page, order, mt);
-		} while (count > 0 && !list_empty(list));
+		} while (count > 0 && pcp->count > 0 && !list_empty(list));
 	}

 	spin_unlock_irqrestore(&zone->lock, flags);

-- 
2.42.0.rc1.204.g551eb34607-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up
  2023-08-18  6:05 [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Chris Li
  2023-08-18  6:05 ` [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk Chris Li
@ 2023-08-18  6:05 ` Chris Li
  2023-08-24  6:28   ` kernel test robot
  2023-08-21 10:32 ` [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Mel Gorman
  2 siblings, 1 reply; 17+ messages in thread
From: Chris Li @ 2023-08-18  6:05 UTC (permalink / raw)
  To: Andrew Morton, Kemeng Shi
  Cc: akpm, baolin.wang, mgorman, Michal Hocko, david, willy, linux-mm,
	Namhyung Kim, Greg Thelen, linux-kernel, Chris Li

This patch does not have functional change. Just pure clean up.

It removes the pindex_max and pindex_min and replaces it with a simpler
loop.

It uses for_each_entry_safe_reverse() to replace the loop over
list_last_entry(). It produces slightly better machine code.

Signed-off-by: Chris Li <chrisl@kernel.org>
---
 mm/page_alloc.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 347cb93081a02..d64d0f5ec70b4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1209,11 +1209,9 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 					int pindex)
 {
 	unsigned long flags;
-	int min_pindex = 0;
-	int max_pindex = NR_PCP_LISTS - 1;
 	unsigned int order;
 	bool isolated_pageblocks;
-	struct page *page;
+	int i;
 
 	/* Ensure requested pindex is drained first. */
 	pindex = pindex - 1;
@@ -1221,31 +1219,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	spin_lock_irqsave(&zone->lock, flags);
 	isolated_pageblocks = has_isolate_pageblock(zone);
 
-	while (count > 0) {
+	for (i = 0; i < NR_PCP_LISTS; i++, pindex++) {
 		struct list_head *list;
 		int nr_pages;
+		struct page *page, *next;
 
-		/* Remove pages from lists in a round-robin fashion. */
-		do {
-			if (++pindex > max_pindex)
-				pindex = min_pindex;
-			list = &pcp->lists[pindex];
-			if (!list_empty(list))
-				break;
-
-			if (pindex == max_pindex)
-				max_pindex--;
-			if (pindex == min_pindex)
-				min_pindex++;
-		} while (1);
-
+		if (pindex == NR_PCP_LISTS)
+			pindex = 0;
+		list = pcp->lists + pindex;
 		order = pindex_to_order(pindex);
 		nr_pages = 1 << order;
-		do {
-			int mt;
-
-			page = list_last_entry(list, struct page, pcp_list);
-			mt = get_pcppage_migratetype(page);
+		list_for_each_entry_safe_reverse(page, next, list, lru) {
+			int mt = get_pcppage_migratetype(page);
 
 			/* must delete to avoid corrupting pcp list */
 			list_del(&page->pcp_list);
@@ -1260,9 +1245,12 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 
 			__free_one_page(page, page_to_pfn(page), zone, order, mt, FPI_NONE);
 			trace_mm_page_pcpu_drain(page, order, mt);
-		} while (count > 0 && pcp->count > 0 && !list_empty(list));
-	}
 
+			if (count <= 0 || pcp->count <= 0)
+				goto out;
+		}
+	}
+out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 

-- 
2.42.0.rc1.204.g551eb34607-goog



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-18  6:05 [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Chris Li
  2023-08-18  6:05 ` [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk Chris Li
  2023-08-18  6:05 ` [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up Chris Li
@ 2023-08-21 10:32 ` Mel Gorman
  2023-08-22  1:27   ` Kemeng Shi
  2023-08-22 17:48   ` Chris Li
  2 siblings, 2 replies; 17+ messages in thread
From: Mel Gorman @ 2023-08-21 10:32 UTC (permalink / raw)
  To: Chris Li
  Cc: Andrew Morton, Kemeng Shi, baolin.wang, Michal Hocko, david,
	willy, linux-mm, Namhyung Kim, Greg Thelen, linux-kernel,
	John Sperbeck

On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> In this patch series I want to safeguard
> the free_pcppage_bulk against change in the
> pcp->count outside of this function. e.g.
> by BPF program inject on the function tracepoint.
> 
> I break up the patches into two seperate patches
> for the safeguard and clean up.
> 
> Hopefully that is easier to review.
> 
> Signed-off-by: Chris Li <chrisl@kernel.org>

This sounds like a maintenance nightmare if internal state can be arbitrary
modified by a BPF program and still expected to work properly in all cases.
Every review would have to take into account "what if a BPF script modifies
state behind our back?"

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-21 10:32 ` [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Mel Gorman
@ 2023-08-22  1:27   ` Kemeng Shi
  2023-08-22 21:14     ` Chris Li
  2023-08-22 17:48   ` Chris Li
  1 sibling, 1 reply; 17+ messages in thread
From: Kemeng Shi @ 2023-08-22  1:27 UTC (permalink / raw)
  To: Mel Gorman, Chris Li
  Cc: Andrew Morton, baolin.wang, Michal Hocko, david, willy, linux-mm,
	Namhyung Kim, Greg Thelen, linux-kernel, John Sperbeck



on 8/21/2023 6:32 PM, Mel Gorman wrote:
> On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
>> In this patch series I want to safeguard
>> the free_pcppage_bulk against change in the
>> pcp->count outside of this function. e.g.
>> by BPF program inject on the function tracepoint.
>>
>> I break up the patches into two seperate patches
>> for the safeguard and clean up.
>>
>> Hopefully that is easier to review.
>>
>> Signed-off-by: Chris Li <chrisl@kernel.org>
> 
> This sounds like a maintenance nightmare if internal state can be arbitrary
> modified by a BPF program and still expected to work properly in all cases.
> Every review would have to take into account "what if a BPF script modifies
> state behind our back?"
> 
Agreed. We assume pcp->count is protected by pcp->lock. Instead of make code
work in case pcp->count could be changed without lock held, it's more reasonble
to modify pcp->count with pcp->lock held in BPF program.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-21 10:32 ` [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Mel Gorman
  2023-08-22  1:27   ` Kemeng Shi
@ 2023-08-22 17:48   ` Chris Li
  2023-08-22 18:28     ` Matthew Wilcox
  2023-08-22 18:57     ` Alexei Starovoitov
  1 sibling, 2 replies; 17+ messages in thread
From: Chris Li @ 2023-08-22 17:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kemeng Shi, baolin.wang, Michal Hocko, david,
	willy, linux-mm, Namhyung Kim, Greg Thelen, linux-kernel,
	John Sperbeck, Huang Ying, Alexei Starovoitov

Hi Mel,

Adding Alexei to the discussion.

On Mon, Aug 21, 2023 at 3:32 AM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> > In this patch series I want to safeguard
> > the free_pcppage_bulk against change in the
> > pcp->count outside of this function. e.g.
> > by BPF program inject on the function tracepoint.
> >
> > I break up the patches into two seperate patches
> > for the safeguard and clean up.
> >
> > Hopefully that is easier to review.
> >
> > Signed-off-by: Chris Li <chrisl@kernel.org>
>
> This sounds like a maintenance nightmare if internal state can be arbitrary
> modified by a BPF program and still expected to work properly in all cases.
> Every review would have to take into account "what if a BPF script modifies
> state behind our back?"

Thanks for the feedback.

I agree that it is hard to support if we allow BPF to change any internal
stage as a rule.  That is why it is a RFC. Would you consider it case
by case basis?
The kernel panic is bad, the first patch is actually very small. I can
also change it
to generate warnings if we detect the inconsistent state.

How about the second (clean up) patch or Keming's clean up version? I can modify
it to take out the pcp->count if the verdict is just not supporting
BPF changing internal
state at all. I do wish to get rid of the pindex_min and pindex_max.

Thanks

Chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 17:48   ` Chris Li
@ 2023-08-22 18:28     ` Matthew Wilcox
  2023-08-22 18:57     ` Alexei Starovoitov
  1 sibling, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2023-08-22 18:28 UTC (permalink / raw)
  To: Chris Li
  Cc: Mel Gorman, Andrew Morton, Kemeng Shi, baolin.wang, Michal Hocko,
	david, linux-mm, Namhyung Kim, Greg Thelen, linux-kernel,
	John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 10:48:42AM -0700, Chris Li wrote:
> Hi Mel,
> 
> Adding Alexei to the discussion.
> 
> On Mon, Aug 21, 2023 at 3:32 AM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> > > In this patch series I want to safeguard
> > > the free_pcppage_bulk against change in the
> > > pcp->count outside of this function. e.g.
> > > by BPF program inject on the function tracepoint.
> > >
> > > I break up the patches into two seperate patches
> > > for the safeguard and clean up.
> > >
> > > Hopefully that is easier to review.
> > >
> > > Signed-off-by: Chris Li <chrisl@kernel.org>
> >
> > This sounds like a maintenance nightmare if internal state can be arbitrary
> > modified by a BPF program and still expected to work properly in all cases.
> > Every review would have to take into account "what if a BPF script modifies
> > state behind our back?"
> 
> Thanks for the feedback.
> 
> I agree that it is hard to support if we allow BPF to change any internal
> stage as a rule.  That is why it is a RFC. Would you consider it case
> by case basis?
> The kernel panic is bad, the first patch is actually very small. I can
> also change it
> to generate warnings if we detect the inconsistent state.

We wouldn't allow C code that hooks spinlocks (eg lockdep) to allocate
memory.  I don't understand why BPF code should allocate memory either.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 17:48   ` Chris Li
  2023-08-22 18:28     ` Matthew Wilcox
@ 2023-08-22 18:57     ` Alexei Starovoitov
  2023-08-22 21:34       ` Chris Li
  1 sibling, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 18:57 UTC (permalink / raw)
  To: Chris Li
  Cc: Mel Gorman, Andrew Morton, Kemeng Shi, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 10:48 AM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Mel,
>
> Adding Alexei to the discussion.
>
> On Mon, Aug 21, 2023 at 3:32 AM Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> > > In this patch series I want to safeguard
> > > the free_pcppage_bulk against change in the
> > > pcp->count outside of this function. e.g.
> > > by BPF program inject on the function tracepoint.
> > >
> > > I break up the patches into two seperate patches
> > > for the safeguard and clean up.
> > >
> > > Hopefully that is easier to review.
> > >
> > > Signed-off-by: Chris Li <chrisl@kernel.org>
> >
> > This sounds like a maintenance nightmare if internal state can be arbitrary
> > modified by a BPF program and still expected to work properly in all cases.
> > Every review would have to take into account "what if a BPF script modifies
> > state behind our back?"

Where did this concern come from?
Since when BPF can modify arbitrary state?

But I wasn't cc-ed on the original patch, so not sure what it attempts to do.
Maybe concern is valid.

> Thanks for the feedback.
>
> I agree that it is hard to support if we allow BPF to change any internal
> stage as a rule.  That is why it is a RFC. Would you consider it case
> by case basis?
> The kernel panic is bad, the first patch is actually very small. I can
> also change it
> to generate warnings if we detect the inconsistent state.

panic and warns because of bpf prog?!
bpf infra takes all the precaution to make sure that bpf progs
can never cause such damage.

>
> How about the second (clean up) patch or Keming's clean up version? I can modify
> it to take out the pcp->count if the verdict is just not supporting
> BPF changing internal
> state at all. I do wish to get rid of the pindex_min and pindex_max.
>
> Thanks
>
> Chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22  1:27   ` Kemeng Shi
@ 2023-08-22 21:14     ` Chris Li
  2023-08-22 21:19       ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Li @ 2023-08-22 21:14 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: Mel Gorman, Andrew Morton, baolin.wang, Michal Hocko, david,
	Matthew Wilcox, linux-mm, Namhyung Kim, Greg Thelen, LKML,
	John Sperbeck, Huang Ying, Alexei Starovoitov

Hi Kemeng,

On Mon, Aug 21, 2023 at 6:27 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> >
> Agreed. We assume pcp->count is protected by pcp->lock. Instead of make code
> work in case pcp->count could be changed without lock held, it's more reasonble
> to modify pcp->count with pcp->lock held in BPF program.

The lock is holded when pcp->count is modified. It is going through
the kernel page
allocation API. The issue is nest memory allocation inside spin_lock()
introduced by BPF.

The execution sequence is like this:

       count = min(pcp->count, count);

        /* Ensure requested pindex is drained first. */
        pindex = pindex - 1;
        bpf_injected_spin_lock_irqsave {
                 alloc_page();
                 original spin_lock_irqsave(&zone->lock, flags) ;
        }

Chris





Chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 21:14     ` Chris Li
@ 2023-08-22 21:19       ` Alexei Starovoitov
  2023-08-22 21:29         ` Chris Li
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 21:19 UTC (permalink / raw)
  To: Chris Li
  Cc: Kemeng Shi, Mel Gorman, Andrew Morton, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 2:15 PM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Kemeng,
>
> On Mon, Aug 21, 2023 at 6:27 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> > >
> > Agreed. We assume pcp->count is protected by pcp->lock. Instead of make code
> > work in case pcp->count could be changed without lock held, it's more reasonble
> > to modify pcp->count with pcp->lock held in BPF program.
>
> The lock is holded when pcp->count is modified. It is going through
> the kernel page
> allocation API. The issue is nest memory allocation inside spin_lock()
> introduced by BPF.
>
> The execution sequence is like this:
>
>        count = min(pcp->count, count);
>
>         /* Ensure requested pindex is drained first. */
>         pindex = pindex - 1;
>         bpf_injected_spin_lock_irqsave {
>                  alloc_page();
>                  original spin_lock_irqsave(&zone->lock, flags) ;
>         }

bpf doesn't call into alloc_page() or slab alloc or pcpu alloc from
tracing progs.
All memory is preallocated.
Can you reproduce the issue on the latest upstream kernel?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 21:19       ` Alexei Starovoitov
@ 2023-08-22 21:29         ` Chris Li
  2023-08-22 21:35           ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Li @ 2023-08-22 21:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kemeng Shi, Mel Gorman, Andrew Morton, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 2:19 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> >
> > The execution sequence is like this:
> >
> >        count = min(pcp->count, count);
> >
> >         /* Ensure requested pindex is drained first. */
> >         pindex = pindex - 1;
> >         bpf_injected_spin_lock_irqsave {
> >                  alloc_page();
> >                  original spin_lock_irqsave(&zone->lock, flags) ;
> >         }
>
> bpf doesn't call into alloc_page() or slab alloc or pcpu alloc from
> tracing progs.
> All memory is preallocated.

Here is the other patch submission thread which have more detail of
how to reproduce it:
https://lore.kernel.org/linux-mm/20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org/

It is on older version of the kernel.
> Can you reproduce the issue on the latest upstream kernel?

Hope, the fix on the BPF side went in as commit c66a36af7ba3a628.
I am not aware of other cases.

It seems the consensus is so far is that we don't support BPF doing
nested allocation on spin locks.
That will implite any function called under the spinlocks as well.

Do we care about adding more warnings on this kind of allocation at all?

Chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 18:57     ` Alexei Starovoitov
@ 2023-08-22 21:34       ` Chris Li
  2023-08-22 21:37         ` Alexei Starovoitov
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Li @ 2023-08-22 21:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mel Gorman, Andrew Morton, Kemeng Shi, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

Hi Alexei,

On Tue, Aug 22, 2023 at 11:57 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 10:48 AM Chris Li <chrisl@kernel.org> wrote:
> >
> > Hi Mel,
> >
> > Adding Alexei to the discussion.
> >
> > On Mon, Aug 21, 2023 at 3:32 AM Mel Gorman <mgorman@techsingularity.net> wrote:
> > >
> > > On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> > > > In this patch series I want to safeguard
> > > > the free_pcppage_bulk against change in the
> > > > pcp->count outside of this function. e.g.
> > > > by BPF program inject on the function tracepoint.
> > > >
> > > > I break up the patches into two seperate patches
> > > > for the safeguard and clean up.
> > > >
> > > > Hopefully that is easier to review.
> > > >
> > > > Signed-off-by: Chris Li <chrisl@kernel.org>
> > >
> > > This sounds like a maintenance nightmare if internal state can be arbitrary
> > > modified by a BPF program and still expected to work properly in all cases.
> > > Every review would have to take into account "what if a BPF script modifies
> > > state behind our back?"
>
> Where did this concern come from?
> Since when BPF can modify arbitrary state?
>
> But I wasn't cc-ed on the original patch, so not sure what it attempts to do.
> Maybe concern is valid.

Sorry I did not CC you on the original patch submission.  I should.

Here is the link for the 1/2 patch, which has the step to reproduce.

https://lore.kernel.org/linux-mm/20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org/

It is using an older version of the BPF program. That spinlock
allocation was fixed
in  commit c66a36af7ba3a628.

Chris


>
> > Thanks for the feedback.
> >
> > I agree that it is hard to support if we allow BPF to change any internal
> > stage as a rule.  That is why it is a RFC. Would you consider it case
> > by case basis?
> > The kernel panic is bad, the first patch is actually very small. I can
> > also change it
> > to generate warnings if we detect the inconsistent state.
>
> panic and warns because of bpf prog?!


> bpf infra takes all the precaution to make sure that bpf progs
> can never cause such damage.
>
> >
> > How about the second (clean up) patch or Keming's clean up version? I can modify
> > it to take out the pcp->count if the verdict is just not supporting
> > BPF changing internal
> > state at all. I do wish to get rid of the pindex_min and pindex_max.
> >
> > Thanks
> >
> > Chris
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 21:29         ` Chris Li
@ 2023-08-22 21:35           ` Alexei Starovoitov
  2023-08-22 21:46             ` Chris Li
  0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 21:35 UTC (permalink / raw)
  To: Chris Li
  Cc: Kemeng Shi, Mel Gorman, Andrew Morton, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 2:29 PM Chris Li <chrisl@kernel.org> wrote:
>
> On Tue, Aug 22, 2023 at 2:19 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > >
> > > The execution sequence is like this:
> > >
> > >        count = min(pcp->count, count);
> > >
> > >         /* Ensure requested pindex is drained first. */
> > >         pindex = pindex - 1;
> > >         bpf_injected_spin_lock_irqsave {
> > >                  alloc_page();
> > >                  original spin_lock_irqsave(&zone->lock, flags) ;
> > >         }
> >
> > bpf doesn't call into alloc_page() or slab alloc or pcpu alloc from
> > tracing progs.
> > All memory is preallocated.
>
> Here is the other patch submission thread which have more detail of
> how to reproduce it:
> https://lore.kernel.org/linux-mm/20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org/
>
> It is on older version of the kernel.

Please demonstrate the issue on the latest kernel.
It's an unnecessary time sink for everyone to review patches
targeting an issue in the old kernel.

> > Can you reproduce the issue on the latest upstream kernel?
>
> Hope, the fix on the BPF side went in as commit c66a36af7ba3a628.
> I am not aware of other cases.

That was a temporary workaround on perf side.
bpf task local storage was properly fixed later.

> It seems the consensus is so far is that we don't support BPF doing
> nested allocation on spin locks.
> That will implite any function called under the spinlocks as well.

We're still talking past each other. bpf uses preallocated memory.
It might look like bpf prog is allocating, but it's actually
not calling into slab.

> Do we care about adding more warnings on this kind of allocation at all?

bpf doesn't mess with mm state.
If you somehow managed to cause mm splat with bpf prog talk to bpf folks first.
It's a bug somewhere in bpf. Not with mm.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 21:34       ` Chris Li
@ 2023-08-22 21:37         ` Alexei Starovoitov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 21:37 UTC (permalink / raw)
  To: Chris Li
  Cc: Mel Gorman, Andrew Morton, Kemeng Shi, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 2:34 PM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Alexei,
>
> On Tue, Aug 22, 2023 at 11:57 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 22, 2023 at 10:48 AM Chris Li <chrisl@kernel.org> wrote:
> > >
> > > Hi Mel,
> > >
> > > Adding Alexei to the discussion.
> > >
> > > On Mon, Aug 21, 2023 at 3:32 AM Mel Gorman <mgorman@techsingularity.net> wrote:
> > > >
> > > > On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> > > > > In this patch series I want to safeguard
> > > > > the free_pcppage_bulk against change in the
> > > > > pcp->count outside of this function. e.g.
> > > > > by BPF program inject on the function tracepoint.
> > > > >
> > > > > I break up the patches into two seperate patches
> > > > > for the safeguard and clean up.
> > > > >
> > > > > Hopefully that is easier to review.
> > > > >
> > > > > Signed-off-by: Chris Li <chrisl@kernel.org>
> > > >
> > > > This sounds like a maintenance nightmare if internal state can be arbitrary
> > > > modified by a BPF program and still expected to work properly in all cases.
> > > > Every review would have to take into account "what if a BPF script modifies
> > > > state behind our back?"
> >
> > Where did this concern come from?
> > Since when BPF can modify arbitrary state?
> >
> > But I wasn't cc-ed on the original patch, so not sure what it attempts to do.
> > Maybe concern is valid.
>
> Sorry I did not CC you on the original patch submission.  I should.
>
> Here is the link for the 1/2 patch, which has the step to reproduce.
>
> https://lore.kernel.org/linux-mm/20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org/
>
> It is using an older version of the BPF program. That spinlock
> allocation was fixed
> in  commit c66a36af7ba3a628.

No. It was a temp workaround. It was fixed on bpf local storage side later.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard
  2023-08-22 21:35           ` Alexei Starovoitov
@ 2023-08-22 21:46             ` Chris Li
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Li @ 2023-08-22 21:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kemeng Shi, Mel Gorman, Andrew Morton, baolin.wang, Michal Hocko,
	David Hildenbrand, Matthew Wilcox, linux-mm, Namhyung Kim,
	Greg Thelen, LKML, John Sperbeck, Huang Ying, Alexei Starovoitov

On Tue, Aug 22, 2023 at 2:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 2:29 PM Chris Li <chrisl@kernel.org> wrote:
> >
> > On Tue, Aug 22, 2023 at 2:19 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > The execution sequence is like this:
> > > >
> > > >        count = min(pcp->count, count);
> > > >
> > > >         /* Ensure requested pindex is drained first. */
> > > >         pindex = pindex - 1;
> > > >         bpf_injected_spin_lock_irqsave {
> > > >                  alloc_page();
> > > >                  original spin_lock_irqsave(&zone->lock, flags) ;
> > > >         }
> > >
> > > bpf doesn't call into alloc_page() or slab alloc or pcpu alloc from
> > > tracing progs.
> > > All memory is preallocated.

That is good to know. Thanks.

> >
> > Here is the other patch submission thread which have more detail of
> > how to reproduce it:
> > https://lore.kernel.org/linux-mm/20230817-free_pcppages_bulk-v1-1-c14574a9f80c@kernel.org/
> >
> > It is on older version of the kernel.
>
> Please demonstrate the issue on the latest kernel.
> It's an unnecessary time sink for everyone to review patches
> targeting an issue in the old kernel.

Thanks, that is the answer I am looking for. That is why I tag it
as RFC.

>
> > > Can you reproduce the issue on the latest upstream kernel?
> >
> > Hope, the fix on the BPF side went in as commit c66a36af7ba3a628.
> > I am not aware of other cases.
>
> That was a temporary workaround on perf side.
> bpf task local storage was properly fixed later.

Ack.

> > It seems the consensus is so far is that we don't support BPF doing
> > nested allocation on spin locks.
> > That will implite any function called under the spinlocks as well.
>
> We're still talking past each other. bpf uses preallocated memory.
> It might look like bpf prog is allocating, but it's actually
> not calling into slab.

Ack.


> > Do we care about adding more warnings on this kind of allocation at all?
>
> bpf doesn't mess with mm state.
> If you somehow managed to cause mm splat with bpf prog talk to bpf folks first.
> It's a bug somewhere in bpf. Not with mm.

Noted. It started as a MM clean up patch. Should include you earlier.

I will spit out the part 2 of the patch as clean up without touching
pcp->count then.

Chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up
  2023-08-18  6:05 ` [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up Chris Li
@ 2023-08-24  6:28   ` kernel test robot
  2023-08-24 15:25     ` Chris Li
  0 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2023-08-24  6:28 UTC (permalink / raw)
  To: Chris Li
  Cc: oe-lkp, lkp, linux-mm, Andrew Morton, Kemeng Shi, baolin.wang,
	mgorman, Michal Hocko, david, willy, Namhyung Kim, Greg Thelen,
	linux-kernel, Chris Li, oliver.sang


hi, Chris Li,

we noticed "This patch does not have functional change" in commit message.
however, the issue seems keep happening randomly while we run up to 100 times.
at the same time, the parent keeps clean.

55aea7978bd8df28 3373e582e78e8aaaf8977b42bc8
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :100         12%          12:100   dmesg.BUG:kernel_NULL_pointer_dereference,address
           :100          3%           3:100   dmesg.BUG:unable_to_handle_page_fault_for_address
           :100         16%          16:100   dmesg.EIP:free_pcppages_bulk
           :100         15%          15:100   dmesg.Kernel_panic-not_syncing:Fatal_exception
           :100          1%           1:100   dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
           :100         16%          16:100   dmesg.Oops:#[##]
           :100         16%          16:100   dmesg.boot_failures

and since there is
[   15.898250][    C0] EIP: free_pcppages_bulk+0x7d/0x200
and free_pcppages_bulk() is changed in this commit, we just report this to you.
FYI


Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: 3373e582e78e8aaaf8977b42bc8edd8487310033 ("[PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up")
url: https://github.com/intel-lab-lkp/linux/commits/Chris-Li/mm-page_alloc-safeguard-free_pcppages_bulk/20230818-140815
patch link: https://lore.kernel.org/all/20230817-free_pcppages_bulk-v1-2-c14574a9f80c@kernel.org/
patch subject: [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308241221.cc5ac84a-oliver.sang@intel.com



[   15.890907][    C0] BUG: kernel NULL pointer dereference, address: 00000005
[   15.891555][    C0] #PF: supervisor read access in kernel mode
[   15.892037][    C0] #PF: error_code(0x0000) - not-present page
[   15.893161][    C0] *pdpt = 000000002c9e7001 *pde = 0000000000000000
[   15.894410][    C0] Oops: 0000 [#1] SMP PTI
[   15.895385][    C0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E      6.5.0-rc4-00242-g3373e582e78e #1
[   15.896801][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   15.898250][    C0] EIP: free_pcppages_bulk+0x7d/0x200
[   15.899281][    C0] Code: 8d 34 c6 89 75 e4 83 f8 0c 0f 84 ee 00 00 00 ba ab aa aa aa f7 e2 b8 01 00 00 00 89 d1 d1 e9 d3 e0 89 45 e0 8b 45 e4 8b 50 04 <8b> 72 04
 8d 5a fc 83 ee 04 39 c2 0f 84 0a 01 00 00 89 4d e8 eb 28
[   15.914130][    C0] EAX: e4c7520c EBX: e4c9f510 ECX: 55555555 EDX: 00000001
[   15.915477][    C0] ESI: e4c7520c EDI: e4c75200 EBP: c1819d98 ESP: c1819d60
[   15.916755][    C0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210006
[   15.918067][    C0] CR0: 80050033 CR2: 00000005 CR3: 2ca6a000 CR4: 000406f0
[   15.919278][    C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   15.920468][    C0] DR6: fffe0ff0 DR7: 00000400
[   15.921446][    C0] Call Trace:
[   15.922298][    C0]  <SOFTIRQ>
[   15.923149][    C0]  ? show_regs+0x55/0x80
[   15.924069][    C0]  ? __die+0x1d/0x80
[   15.924934][    C0]  ? page_fault_oops+0x65/0xc0
[   15.925890][    C0]  ? kernelmode_fixup_or_oops+0x73/0x100
[   15.926986][    C0]  ? __bad_area_nosemaphore+0xdc/0x1c0
[   15.928086][    C0]  ? bad_area_nosemaphore+0xf/0x40
[   15.929061][    C0]  ? do_user_addr_fault+0x1ac/0x3c0
[   15.930036][    C0]  ? exc_page_fault+0x51/0x140
[   15.930957][    C0]  ? pvclock_clocksource_read_nowd+0x140/0x140
[   15.931997][    C0]  ? handle_exception+0x133/0x133
[   15.932938][    C0]  ? pmd_clear_huge+0x7b/0x80
[   15.934885][    C0]  ? pvclock_clocksource_read_nowd+0x140/0x140
[   15.935963][    C0]  ? free_pcppages_bulk+0x7d/0x200
[   15.936915][    C0]  ? pvclock_clocksource_read_nowd+0x140/0x140
[   15.937954][    C0]  ? free_pcppages_bulk+0x7d/0x200
[   15.938876][    C0]  free_unref_page_commit+0x120/0x180
[   15.939826][    C0]  free_unref_page+0xe7/0x100
[   15.940688][    C0]  __free_pages+0x87/0xc0
[   15.941519][    C0]  __free_slab+0xa1/0x100
[   15.942383][    C0]  free_slab+0x27/0xc0
[   15.943187][    C0]  discard_slab+0x38/0x40
[   15.944014][    C0]  __unfreeze_partials+0x20c/0x240
[   15.946439][    C0]  put_cpu_partial+0x5b/0x80
[   15.947347][    C0]  __slab_free+0x287/0x380
[   15.948207][    C0]  ? __mod_memcg_lruvec_state+0x3e/0x80
[   15.949138][    C0]  kmem_cache_free+0x329/0x340
[   15.950003][    C0]  ? mt_free_rcu+0x10/0x40
[   15.950805][    C0]  ? free_task+0x4d/0x80
[   15.951594][    C0]  ? mt_free_rcu+0x10/0x40
[   15.952390][    C0]  mt_free_rcu+0x10/0x40
[   15.953181][    C0]  rcu_do_batch+0x158/0x440
[   15.953998][    C0]  rcu_core+0xce/0x1c0
[   15.954751][    C0]  rcu_core_si+0xd/0x40
[   15.955512][    C0]  __do_softirq+0xad/0x233
[   15.956293][    C0]  ? __lock_text_end+0x3/0x3
[   15.957084][    C0]  call_on_stack+0x45/0x80
[   15.957871][    C0]  </SOFTIRQ>
[   15.958525][    C0]  ? irq_exit_rcu+0x6a/0xc0
[   15.959284][    C0]  ? sysvec_apic_timer_interrupt+0x27/0x40
[   15.960136][    C0]  ? handle_exception+0x133/0x133
[   15.960900][    C0]  ? alarm_handle_timer+0xfb/0x100
[   15.961669][    C0]  ? sysvec_call_function_single+0x40/0x40
[   15.962542][    C0]  ? default_idle+0xb/0x40
[   15.963234][    C0]  ? sysvec_call_function_single+0x40/0x40
[   15.964026][    C0]  ? default_idle+0xb/0x40
[   15.964703][    C0]  ? arch_cpu_idle+0x8/0x40
[   15.965379][    C0]  ? default_idle_call+0x2a/0xc0
[   15.967867][    C0]  ? cpuidle_idle_call+0x122/0x180
[   15.968680][    C0]  ? do_idle+0x79/0xc0
[   15.969357][    C0]  ? cpu_startup_entry+0x25/0x40
[   15.970106][    C0]  ? rest_init+0x96/0xc0
[   15.970770][    C0]  ? arch_call_rest_init+0xd/0x80
[   15.971505][    C0]  ? start_kernel+0x347/0x480
[   15.972209][    C0]  ? early_idt_handler_common+0x44/0x44
[   15.972969][    C0]  ? i386_start_kernel+0x48/0x80
[   15.973678][    C0]  ? startup_32_smp+0x156/0x158
[   15.974397][    C0] Modules linked in: intel_rapl_msr(E) intel_rapl_common(E) ata_generic(E) ppdev(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) ipmi_devintf(
E) ipmi_msghandler(E) crypto_simd(E) ata_piix(E) cryptd(E) rapl(E) i2c_piix4(E) psmouse(E) evdev(E) serio_raw(E) bochs(E) drm_vram_helper(E) drm_kms_helper(E) drm_tt
m_helper(E) ttm(E) libata(E) parport_pc(E) floppy(E) parport(E) qemu_fw_cfg(E) button(E) drm(E) configfs(E) fuse(E) autofs4(E)
[   15.978905][    C0] CR2: 0000000000000005
[   15.979621][    C0] ---[ end trace 0000000000000000 ]---
[   15.980419][    C0] EIP: free_pcppages_bulk+0x7d/0x200
[   15.981222][    C0] Code: 8d 34 c6 89 75 e4 83 f8 0c 0f 84 ee 00 00 00 ba ab aa aa aa f7 e2 b8 01 00 00 00 89 d1 d1 e9 d3 e0 89 45 e0 8b 45 e4 8b 50 04 <8b> 72 04
 8d 5a fc 83 ee 04 39 c2 0f 84 0a 01 00 00 89 4d e8 eb 28
[   15.983663][    C0] EAX: e4c7520c EBX: e4c9f510 ECX: 55555555 EDX: 00000001
[   15.984705][    C0] ESI: e4c7520c EDI: e4c75200 EBP: c1819d98 ESP: c1819d60
[   15.985738][    C0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210006
[   15.986822][    C0] CR0: 80050033 CR2: 00000005 CR3: 2ca6a000 CR4: 000406f0
[   15.987832][    C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   15.988839][    C0] DR6: fffe0ff0 DR7: 00000400
[   15.989679][    C0] Kernel panic - not syncing: Fatal exception in interrupt
[   15.996118][    C0] Kernel Offset: disabled



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230824/202308241221.cc5ac84a-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up
  2023-08-24  6:28   ` kernel test robot
@ 2023-08-24 15:25     ` Chris Li
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Li @ 2023-08-24 15:25 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-mm, Andrew Morton, Kemeng Shi, baolin.wang,
	Mel Gorman, Michal Hocko, David Hildenbrand, Matthew Wilcox,
	Namhyung Kim, Greg Thelen, LKML

Hi Oliver,

Indeed, that is my bad. Thanks for reporting it.

The patch has been sitting in my tree for a very long time.
When I adopt it for the later kernel, I accidentally drop this chunk:

-                       if (!list_empty(list))
-                               break;

I think that is what is missing. I will address that in the V2 and do
more testing before I send it out.

On Wed, Aug 23, 202

>
> in testcase: boot
>
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)

The kernel test robot is very interesting.
I see that is how to start the qemu system.

How do I invoke the test once I have qemu up and running with my testing kernel?

I want to replicate it before I send out the V2 version.

Chris

>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202308241221.cc5ac84a-oliver.sang@intel.com
>
>
>
> [   15.890907][    C0] BUG: kernel NULL pointer dereference, address: 00000005
> [   15.891555][    C0] #PF: supervisor read access in kernel mode
> [   15.892037][    C0] #PF: error_code(0x0000) - not-present page
> [   15.893161][    C0] *pdpt = 000000002c9e7001 *pde = 0000000000000000
> [   15.894410][    C0] Oops: 0000 [#1] SMP PTI
> [   15.895385][    C0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E      6.5.0-rc4-00242-g3373e582e78e #1
> [   15.896801][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   15.898250][    C0] EIP: free_pcppages_bulk+0x7d/0x200
> [   15.899281][    C0] Code: 8d 34 c6 89 75 e4 83 f8 0c 0f 84 ee 00 00 00 ba ab aa aa aa f7 e2 b8 01 00 00 00 89 d1 d1 e9 d3 e0 89 45 e0 8b 45 e4 8b 50 04 <8b> 72 04
>  8d 5a fc 83 ee 04 39 c2 0f 84 0a 01 00 00 89 4d e8 eb 28
> [   15.914130][    C0] EAX: e4c7520c EBX: e4c9f510 ECX: 55555555 EDX: 00000001
> [   15.915477][    C0] ESI: e4c7520c EDI: e4c75200 EBP: c1819d98 ESP: c1819d60
> [   15.916755][    C0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210006
> [   15.918067][    C0] CR0: 80050033 CR2: 00000005 CR3: 2ca6a000 CR4: 000406f0
> [   15.919278][    C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [   15.920468][    C0] DR6: fffe0ff0 DR7: 00000400
> [   15.921446][    C0] Call Trace:
> [   15.922298][    C0]  <SOFTIRQ>
> [   15.923149][    C0]  ? show_regs+0x55/0x80
> [   15.924069][    C0]  ? __die+0x1d/0x80
> [   15.924934][    C0]  ? page_fault_oops+0x65/0xc0
> [   15.925890][    C0]  ? kernelmode_fixup_or_oops+0x73/0x100
> [   15.926986][    C0]  ? __bad_area_nosemaphore+0xdc/0x1c0
> [   15.928086][    C0]  ? bad_area_nosemaphore+0xf/0x40
> [   15.929061][    C0]  ? do_user_addr_fault+0x1ac/0x3c0
> [   15.930036][    C0]  ? exc_page_fault+0x51/0x140
> [   15.930957][    C0]  ? pvclock_clocksource_read_nowd+0x140/0x140
> [   15.931997][    C0]  ? handle_exception+0x133/0x133
> [   15.932938][    C0]  ? pmd_clear_huge+0x7b/0x80
> [   15.934885][    C0]  ? pvclock_clocksource_read_nowd+0x140/0x140
> [   15.935963][    C0]  ? free_pcppages_bulk+0x7d/0x200
> [   15.936915][    C0]  ? pvclock_clocksource_read_nowd+0x140/0x140
> [   15.937954][    C0]  ? free_pcppages_bulk+0x7d/0x200
> [   15.938876][    C0]  free_unref_page_commit+0x120/0x180
> [   15.939826][    C0]  free_unref_page+0xe7/0x100
> [   15.940688][    C0]  __free_pages+0x87/0xc0
> [   15.941519][    C0]  __free_slab+0xa1/0x100
> [   15.942383][    C0]  free_slab+0x27/0xc0
> [   15.943187][    C0]  discard_slab+0x38/0x40
> [   15.944014][    C0]  __unfreeze_partials+0x20c/0x240
> [   15.946439][    C0]  put_cpu_partial+0x5b/0x80
> [   15.947347][    C0]  __slab_free+0x287/0x380
> [   15.948207][    C0]  ? __mod_memcg_lruvec_state+0x3e/0x80
> [   15.949138][    C0]  kmem_cache_free+0x329/0x340
> [   15.950003][    C0]  ? mt_free_rcu+0x10/0x40
> [   15.950805][    C0]  ? free_task+0x4d/0x80
> [   15.951594][    C0]  ? mt_free_rcu+0x10/0x40
> [   15.952390][    C0]  mt_free_rcu+0x10/0x40
> [   15.953181][    C0]  rcu_do_batch+0x158/0x440
> [   15.953998][    C0]  rcu_core+0xce/0x1c0
> [   15.954751][    C0]  rcu_core_si+0xd/0x40
> [   15.955512][    C0]  __do_softirq+0xad/0x233
> [   15.956293][    C0]  ? __lock_text_end+0x3/0x3
> [   15.957084][    C0]  call_on_stack+0x45/0x80
> [   15.957871][    C0]  </SOFTIRQ>
> [   15.958525][    C0]  ? irq_exit_rcu+0x6a/0xc0
> [   15.959284][    C0]  ? sysvec_apic_timer_interrupt+0x27/0x40
> [   15.960136][    C0]  ? handle_exception+0x133/0x133
> [   15.960900][    C0]  ? alarm_handle_timer+0xfb/0x100
> [   15.961669][    C0]  ? sysvec_call_function_single+0x40/0x40
> [   15.962542][    C0]  ? default_idle+0xb/0x40
> [   15.963234][    C0]  ? sysvec_call_function_single+0x40/0x40
> [   15.964026][    C0]  ? default_idle+0xb/0x40
> [   15.964703][    C0]  ? arch_cpu_idle+0x8/0x40
> [   15.965379][    C0]  ? default_idle_call+0x2a/0xc0
> [   15.967867][    C0]  ? cpuidle_idle_call+0x122/0x180
> [   15.968680][    C0]  ? do_idle+0x79/0xc0
> [   15.969357][    C0]  ? cpu_startup_entry+0x25/0x40
> [   15.970106][    C0]  ? rest_init+0x96/0xc0
> [   15.970770][    C0]  ? arch_call_rest_init+0xd/0x80
> [   15.971505][    C0]  ? start_kernel+0x347/0x480
> [   15.972209][    C0]  ? early_idt_handler_common+0x44/0x44
> [   15.972969][    C0]  ? i386_start_kernel+0x48/0x80
> [   15.973678][    C0]  ? startup_32_smp+0x156/0x158
> [   15.974397][    C0] Modules linked in: intel_rapl_msr(E) intel_rapl_common(E) ata_generic(E) ppdev(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) ipmi_devintf(
> E) ipmi_msghandler(E) crypto_simd(E) ata_piix(E) cryptd(E) rapl(E) i2c_piix4(E) psmouse(E) evdev(E) serio_raw(E) bochs(E) drm_vram_helper(E) drm_kms_helper(E) drm_tt
> m_helper(E) ttm(E) libata(E) parport_pc(E) floppy(E) parport(E) qemu_fw_cfg(E) button(E) drm(E) configfs(E) fuse(E) autofs4(E)
> [   15.978905][    C0] CR2: 0000000000000005
> [   15.979621][    C0] ---[ end trace 0000000000000000 ]---
> [   15.980419][    C0] EIP: free_pcppages_bulk+0x7d/0x200
> [   15.981222][    C0] Code: 8d 34 c6 89 75 e4 83 f8 0c 0f 84 ee 00 00 00 ba ab aa aa aa f7 e2 b8 01 00 00 00 89 d1 d1 e9 d3 e0 89 45 e0 8b 45 e4 8b 50 04 <8b> 72 04
>  8d 5a fc 83 ee 04 39 c2 0f 84 0a 01 00 00 89 4d e8 eb 28
> [   15.983663][    C0] EAX: e4c7520c EBX: e4c9f510 ECX: 55555555 EDX: 00000001
> [   15.984705][    C0] ESI: e4c7520c EDI: e4c75200 EBP: c1819d98 ESP: c1819d60
> [   15.985738][    C0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210006
> [   15.986822][    C0] CR0: 80050033 CR2: 00000005 CR3: 2ca6a000 CR4: 000406f0
> [   15.987832][    C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [   15.988839][    C0] DR6: fffe0ff0 DR7: 00000400
> [   15.989679][    C0] Kernel panic - not syncing: Fatal exception in interrupt
> [   15.996118][    C0] Kernel Offset: disabled
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20230824/202308241221.cc5ac84a-oliver.sang@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-24 15:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18  6:05 [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Chris Li
2023-08-18  6:05 ` [PATCH RFC 1/2] mm/page_alloc: safeguard free_pcppages_bulk Chris Li
2023-08-18  6:05 ` [PATCH RFC 2/2] mm/page_alloc: free_pcppages_bulk clean up Chris Li
2023-08-24  6:28   ` kernel test robot
2023-08-24 15:25     ` Chris Li
2023-08-21 10:32 ` [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard Mel Gorman
2023-08-22  1:27   ` Kemeng Shi
2023-08-22 21:14     ` Chris Li
2023-08-22 21:19       ` Alexei Starovoitov
2023-08-22 21:29         ` Chris Li
2023-08-22 21:35           ` Alexei Starovoitov
2023-08-22 21:46             ` Chris Li
2023-08-22 17:48   ` Chris Li
2023-08-22 18:28     ` Matthew Wilcox
2023-08-22 18:57     ` Alexei Starovoitov
2023-08-22 21:34       ` Chris Li
2023-08-22 21:37         ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).