* [PATCH splitout] mm: page_reporting: allow driver to set batch capacity
@ 2026-06-09 15:53 Michael S. Tsirkin
2026-06-09 17:44 ` Gregory Price
0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2026-06-09 15:53 UTC (permalink / raw)
To: linux-kernel
Cc: Miaohe Lin, David Hildenbrand (Arm), Jason Wang, Xuan Zhuo,
Eugenio Pérez, Muchun Song, Oscar Salvador, Andrew Morton,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi,
Alexander Duyck
At the moment, if a virtio balloon device has a page reporting vq but
its size is < PAGE_REPORTING_CAPACITY (32), the balloon driver fails
probe.
But, there's no way for host to know this value, so it can easily
create a smaller vq and suddenly adding the reporting capability
to the device makes all of the driver fail. Not pretty.
Add a capacity field to page_reporting_dev_info so drivers can
control the maximum number of pages per report batch.
In virtio-balloon, set the capacity to the reporting virtqueue size,
letting page_reporting adapt to whatever the device provides.
Fixes: b0c504f15471 ("virtio-balloon: add support for providing free page reports to host")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Assisted-by: Claude:claude-opus-4-6
---
drivers/virtio/virtio_balloon.c | 5 +----
include/linux/page_reporting.h | 3 +++
mm/page_reporting.c | 25 ++++++++++++++-----------
3 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f6c2dff33f8a..6a1a610c2cb1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -1017,10 +1017,6 @@ static int virtballoon_probe(struct virtio_device *vdev)
unsigned int capacity;
capacity = virtqueue_get_vring_size(vb->reporting_vq);
- if (capacity < PAGE_REPORTING_CAPACITY) {
- err = -ENOSPC;
- goto out_unregister_oom;
- }
vb->pr_dev_info.order = PAGE_REPORTING_ORDER_UNSPECIFIED;
@@ -1041,6 +1037,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->pr_dev_info.order = 5;
#endif
+ vb->pr_dev_info.capacity = capacity;
err = page_reporting_register(&vb->pr_dev_info);
if (err)
goto out_unregister_oom;
diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index 9d4ca5c218a0..5ab5be02fa15 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -22,6 +22,9 @@ struct page_reporting_dev_info {
/* Minimal order of page reporting */
unsigned int order;
+
+ /* Max pages per report batch (default PAGE_REPORTING_CAPACITY) */
+ unsigned int capacity;
};
/* Tear-down and bring-up for page reporting devices */
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 7418f2e500bb..5b6b17f67131 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -174,10 +174,10 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
* list processed. This should result in us reporting all pages on
* an idle system in about 30 seconds.
*
- * The division here should be cheap since PAGE_REPORTING_CAPACITY
- * should always be a power of 2.
+ * The division here uses integer division; capacity need
+ * not be a power of 2.
*/
- budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16);
+ budget = DIV_ROUND_UP(area->nr_free, prdev->capacity * 16);
/* loop through free list adding unreported pages to sg list */
list_for_each_entry_safe(page, next, list, lru) {
@@ -222,10 +222,10 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
spin_unlock_irq(&zone->lock);
/* begin processing pages in local list */
- err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY);
+ err = prdev->report(prdev, sgl, prdev->capacity);
/* reset offset since the full list was reported */
- *offset = PAGE_REPORTING_CAPACITY;
+ *offset = prdev->capacity;
/* update budget to reflect call to report function */
budget--;
@@ -234,7 +234,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
spin_lock_irq(&zone->lock);
/* flush reported pages from the sg list */
- page_reporting_drain(prdev, sgl, PAGE_REPORTING_CAPACITY, !err);
+ page_reporting_drain(prdev, sgl, prdev->capacity, !err);
/*
* Reset next to first entry, the old next isn't valid
@@ -260,13 +260,13 @@ static int
page_reporting_process_zone(struct page_reporting_dev_info *prdev,
struct scatterlist *sgl, struct zone *zone)
{
- unsigned int order, mt, leftover, offset = PAGE_REPORTING_CAPACITY;
+ unsigned int order, mt, leftover, offset = prdev->capacity;
unsigned long watermark;
int err = 0;
/* Generate minimum watermark to be able to guarantee progress */
watermark = low_wmark_pages(zone) +
- (PAGE_REPORTING_CAPACITY << page_reporting_order);
+ (prdev->capacity << page_reporting_order);
/*
* Cancel request if insufficient free memory or if we failed
@@ -290,7 +290,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
}
/* report the leftover pages before going idle */
- leftover = PAGE_REPORTING_CAPACITY - offset;
+ leftover = prdev->capacity - offset;
if (leftover) {
sgl = &sgl[offset];
err = prdev->report(prdev, sgl, leftover);
@@ -322,11 +322,11 @@ static void page_reporting_process(struct work_struct *work)
atomic_set(&prdev->state, state);
/* allocate scatterlist to store pages being reported on */
- sgl = kmalloc_objs(*sgl, PAGE_REPORTING_CAPACITY);
+ sgl = kmalloc_objs(*sgl, prdev->capacity);
if (!sgl)
goto err_out;
- sg_init_table(sgl, PAGE_REPORTING_CAPACITY);
+ sg_init_table(sgl, prdev->capacity);
for_each_zone(zone) {
err = page_reporting_process_zone(prdev, sgl, zone);
@@ -377,6 +377,9 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
page_reporting_order = pageblock_order;
}
+ if (!prdev->capacity || prdev->capacity > PAGE_REPORTING_CAPACITY)
+ prdev->capacity = PAGE_REPORTING_CAPACITY;
+
/* initialize state and work structures */
atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
--
MST
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH splitout] mm: page_reporting: allow driver to set batch capacity
2026-06-09 15:53 [PATCH splitout] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
@ 2026-06-09 17:44 ` Gregory Price
2026-06-09 20:08 ` Michael S. Tsirkin
0 siblings, 1 reply; 4+ messages in thread
From: Gregory Price @ 2026-06-09 17:44 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Miaohe Lin, David Hildenbrand (Arm), Jason Wang,
Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador,
Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi,
Alexander Duyck
On Tue, Jun 09, 2026 at 11:53:20AM -0400, Michael S. Tsirkin wrote:
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -174,10 +174,10 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
> * list processed. This should result in us reporting all pages on
> * an idle system in about 30 seconds.
> *
> - * The division here should be cheap since PAGE_REPORTING_CAPACITY
> - * should always be a power of 2.
> + * The division here uses integer division; capacity need
> + * not be a power of 2.
> */
> - budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16);
> + budget = DIV_ROUND_UP(area->nr_free, prdev->capacity * 16);
>
Initial look - is there a div-by-0 here? I noticed the old check
prevents this from being (0 * 16), but i don't see (on first pass)
the same check anywhere.
Unless this line below always forces the above to be a
PAGE_REPORTING_CAPCAITY if it's set to 0.
> + if (!prdev->capacity || prdev->capacity > PAGE_REPORTING_CAPACITY)
> + prdev->capacity = PAGE_REPORTING_CAPACITY;
> +
It's worth making this corner condition a little more obvious.
The code intends for
if (capacity == 0)
capacity = PAGE_REPORTING_CAPACITY
but that's not reflected in the changelog as a default value.
When happens if a driver sets (capacity=0) either on purpose (???) or
because there's a bug (???) and then page_reporting.c forces it up to
32?
There's something to improve here.
~Gregory
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH splitout] mm: page_reporting: allow driver to set batch capacity
2026-06-09 17:44 ` Gregory Price
@ 2026-06-09 20:08 ` Michael S. Tsirkin
2026-06-09 21:44 ` Gregory Price
0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2026-06-09 20:08 UTC (permalink / raw)
To: Gregory Price
Cc: linux-kernel, Miaohe Lin, David Hildenbrand (Arm), Jason Wang,
Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador,
Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi,
Alexander Duyck
On Tue, Jun 09, 2026 at 01:44:37PM -0400, Gregory Price wrote:
> On Tue, Jun 09, 2026 at 11:53:20AM -0400, Michael S. Tsirkin wrote:
> > --- a/mm/page_reporting.c
> > +++ b/mm/page_reporting.c
> > @@ -174,10 +174,10 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
> > * list processed. This should result in us reporting all pages on
> > * an idle system in about 30 seconds.
> > *
> > - * The division here should be cheap since PAGE_REPORTING_CAPACITY
> > - * should always be a power of 2.
> > + * The division here uses integer division; capacity need
> > + * not be a power of 2.
> > */
> > - budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16);
> > + budget = DIV_ROUND_UP(area->nr_free, prdev->capacity * 16);
> >
>
> Initial look - is there a div-by-0 here? I noticed the old check
> prevents this from being (0 * 16), but i don't see (on first pass)
> the same check anywhere.
>
> Unless this line below always forces the above to be a
> PAGE_REPORTING_CAPCAITY if it's set to 0.
It does, does it not?
> > + if (!prdev->capacity || prdev->capacity > PAGE_REPORTING_CAPACITY)
> > + prdev->capacity = PAGE_REPORTING_CAPACITY;
> > +
>
> It's worth making this corner condition a little more obvious.
>
> The code intends for
>
> if (capacity == 0)
> capacity = PAGE_REPORTING_CAPACITY
>
> but that's not reflected in the changelog as a default value.
>
> When happens if a driver sets (capacity=0) either on purpose (???)
what would the purpose be? if you don't want reporting do not register.
> or
> because there's a bug (???)
exactly ??? since where are we practicing defensive programming in kernel
APIs?
> and then page_reporting.c forces it up to
> 32?
>
> There's something to improve here.
>
> ~Gregory
So I'll update the commit log to mention PAGE_REPORTING_CAPACITY.
And maybe a comment near capacity field?
Should be enough?
--
MST
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH splitout] mm: page_reporting: allow driver to set batch capacity
2026-06-09 20:08 ` Michael S. Tsirkin
@ 2026-06-09 21:44 ` Gregory Price
0 siblings, 0 replies; 4+ messages in thread
From: Gregory Price @ 2026-06-09 21:44 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: linux-kernel, Miaohe Lin, David Hildenbrand (Arm), Jason Wang,
Xuan Zhuo, Eugenio Pérez, Muchun Song, Oscar Salvador,
Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
Johannes Weiner, Zi Yan, Baolin Wang, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Hugh Dickins, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Ying Huang,
Alistair Popple, Christoph Lameter, David Rientjes,
Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi,
Alexander Duyck
On Tue, Jun 09, 2026 at 04:08:08PM -0400, Michael S. Tsirkin wrote:
> On Tue, Jun 09, 2026 at 01:44:37PM -0400, Gregory Price wrote:
> > On Tue, Jun 09, 2026 at 11:53:20AM -0400, Michael S. Tsirkin wrote:
> > > --- a/mm/page_reporting.c
> > > +++ b/mm/page_reporting.c
> > > @@ -174,10 +174,10 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone,
> > > * list processed. This should result in us reporting all pages on
> > > * an idle system in about 30 seconds.
> > > *
> > > - * The division here should be cheap since PAGE_REPORTING_CAPACITY
> > > - * should always be a power of 2.
> > > + * The division here uses integer division; capacity need
> > > + * not be a power of 2.
> > > */
> > > - budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16);
> > > + budget = DIV_ROUND_UP(area->nr_free, prdev->capacity * 16);
> > >
> >
> > Initial look - is there a div-by-0 here? I noticed the old check
> > prevents this from being (0 * 16), but i don't see (on first pass)
> > the same check anywhere.
> >
> > Unless this line below always forces the above to be a
> > PAGE_REPORTING_CAPCAITY if it's set to 0.
>
> It does, does it not?
>
> > > + if (!prdev->capacity || prdev->capacity > PAGE_REPORTING_CAPACITY)
> > > + prdev->capacity = PAGE_REPORTING_CAPACITY;
> > > +
> >
> > It's worth making this corner condition a little more obvious.
> >
> > The code intends for
> >
> > if (capacity == 0)
> > capacity = PAGE_REPORTING_CAPACITY
> >
> > but that's not reflected in the changelog as a default value.
> >
> > When happens if a driver sets (capacity=0) either on purpose (???)
>
> what would the purpose be? if you don't want reporting do not register.
>
> > or
> > because there's a bug (???)
>
> exactly ??? since where are we practicing defensive programming in kernel
> APIs?
>
> > and then page_reporting.c forces it up to
> > 32?
> >
> > There's something to improve here.
> >
> > ~Gregory
>
>
> So I'll update the commit log to mention PAGE_REPORTING_CAPACITY.
> And maybe a comment near capacity field?
> Should be enough?
I suppose the question is whether capacity=0 should cause a WARN (i.e.
only a bug explains that value), or if capacity=0 means something
special (i.e. use the default) and therefore that should be documented.
I don't know which of these is the case, but if it's the latter than
that deserves a comment yes.
~Gregory
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-09 21:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 15:53 [PATCH splitout] mm: page_reporting: allow driver to set batch capacity Michael S. Tsirkin
2026-06-09 17:44 ` Gregory Price
2026-06-09 20:08 ` Michael S. Tsirkin
2026-06-09 21:44 ` Gregory Price
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox