* [PATCH] block: integrity: Do not call set_page_dirty_lock()
@ 2025-04-16 20:04 Martin K. Petersen
2025-04-16 20:15 ` Keith Busch
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Martin K. Petersen @ 2025-04-16 20:04 UTC (permalink / raw)
To: Keith Busch
Cc: linux-block, axboe, Matthew Wilcox, Liam R. Howlett, Anuj Gupta
Placing multiple protection information buffers inside the same page
can lead to oopses because set_page_dirty_lock() can't be called from
interrupt context.
Since a protection information buffer is not backed by a file there is
no point in setting its page dirty, there is nothing to synchronize.
Drop the call to set_page_dirty_lock() and remove the last argument to
bio_integrity_unpin_bvec().
Cc: stable@vger.kernel.org
Fixes: 492c5d455969 ("block: bio-integrity: directly map user buffers")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 608594a154a5..43ef6bd06c85 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -66,16 +66,12 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
}
EXPORT_SYMBOL(bio_integrity_alloc);
-static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs,
- bool dirty)
+static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs)
{
int i;
- for (i = 0; i < nr_vecs; i++) {
- if (dirty && !PageCompound(bv[i].bv_page))
- set_page_dirty_lock(bv[i].bv_page);
+ for (i = 0; i < nr_vecs; i++)
unpin_user_page(bv[i].bv_page);
- }
}
static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip)
@@ -91,7 +87,7 @@ static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip)
ret = copy_to_iter(bvec_virt(bounce_bvec), bytes, &orig_iter);
WARN_ON_ONCE(ret != bytes);
- bio_integrity_unpin_bvec(orig_bvecs, orig_nr_vecs, true);
+ bio_integrity_unpin_bvec(orig_bvecs, orig_nr_vecs);
}
/**
@@ -111,8 +107,7 @@ void bio_integrity_unmap_user(struct bio *bio)
return;
}
- bio_integrity_unpin_bvec(bip->bip_vec, bip->bip_max_vcnt,
- bio_data_dir(bio) == READ);
+ bio_integrity_unpin_bvec(bip->bip_vec, bip->bip_max_vcnt);
}
/**
@@ -198,7 +193,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
}
if (write)
- bio_integrity_unpin_bvec(bvec, nr_vecs, false);
+ bio_integrity_unpin_bvec(bvec, nr_vecs);
else
memcpy(&bip->bip_vec[1], bvec, nr_vecs * sizeof(*bvec));
@@ -319,7 +314,7 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
return 0;
release_pages:
- bio_integrity_unpin_bvec(bvec, nr_bvecs, false);
+ bio_integrity_unpin_bvec(bvec, nr_bvecs);
free_bvec:
if (bvec != stack_vec)
kfree(bvec);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-16 20:04 [PATCH] block: integrity: Do not call set_page_dirty_lock() Martin K. Petersen
@ 2025-04-16 20:15 ` Keith Busch
2025-04-16 22:01 ` Martin K. Petersen
2025-04-16 20:17 ` Jens Axboe
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2025-04-16 20:15 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Keith Busch, linux-block, axboe, Matthew Wilcox, Liam R. Howlett,
Anuj Gupta
On Wed, Apr 16, 2025 at 04:04:10PM -0400, Martin K. Petersen wrote:
>
> Placing multiple protection information buffers inside the same page
> can lead to oopses because set_page_dirty_lock() can't be called from
> interrupt context.
>
> Since a protection information buffer is not backed by a file there is
> no point in setting its page dirty, there is nothing to synchronize.
> Drop the call to set_page_dirty_lock() and remove the last argument to
> bio_integrity_unpin_bvec().
Thanks! I had just posted a test for this scenario earlier today for
liburing:
https://lore.kernel.org/io-uring/20250416162802.3614051-1-kbusch@meta.com/T/#u
I was wondering why it didn't blow up.
Anyway, patch makes sense. Looks like I got carried away copying this
from __bio_release_pages.
Reviewed-by: Keith Busch <kbusch@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-16 20:04 [PATCH] block: integrity: Do not call set_page_dirty_lock() Martin K. Petersen
2025-04-16 20:15 ` Keith Busch
@ 2025-04-16 20:17 ` Jens Axboe
2025-04-17 6:16 ` Christoph Hellwig
2025-04-17 14:36 ` Matthew Wilcox
3 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2025-04-16 20:17 UTC (permalink / raw)
To: Keith Busch, Martin K. Petersen
Cc: linux-block, Matthew Wilcox, Liam R. Howlett, Anuj Gupta
On Wed, 16 Apr 2025 16:04:10 -0400, Martin K. Petersen wrote:
> Placing multiple protection information buffers inside the same page
> can lead to oopses because set_page_dirty_lock() can't be called from
> interrupt context.
>
> Since a protection information buffer is not backed by a file there is
> no point in setting its page dirty, there is nothing to synchronize.
> Drop the call to set_page_dirty_lock() and remove the last argument to
> bio_integrity_unpin_bvec().
>
> [...]
Applied, thanks!
[1/1] block: integrity: Do not call set_page_dirty_lock()
(no commit info)
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-16 20:15 ` Keith Busch
@ 2025-04-16 22:01 ` Martin K. Petersen
0 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2025-04-16 22:01 UTC (permalink / raw)
To: Keith Busch
Cc: Martin K. Petersen, Keith Busch, linux-block, axboe,
Matthew Wilcox, Liam R. Howlett, Anuj Gupta
Hi Keith!
> Thanks! I had just posted a test for this scenario earlier today for
> liburing:
>
> https://lore.kernel.org/io-uring/20250416162802.3614051-1-kbusch@meta.com/T/#u
>
> I was wondering why it didn't blow up.
It triggered on our end with a heavily threaded workload. Each thread is
reading fixed-size 8KB blocks. The small block size results in needing
small PI allocations. And with thousands of I/Os in flight at any point
in time, we are fairly likely to end up with PI allocations sharing a
page.
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-16 20:04 [PATCH] block: integrity: Do not call set_page_dirty_lock() Martin K. Petersen
2025-04-16 20:15 ` Keith Busch
2025-04-16 20:17 ` Jens Axboe
@ 2025-04-17 6:16 ` Christoph Hellwig
2025-04-17 14:05 ` Martin K. Petersen
2025-04-17 14:36 ` Matthew Wilcox
3 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-04-17 6:16 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Keith Busch, linux-block, axboe, Matthew Wilcox, Liam R. Howlett,
Anuj Gupta
On Wed, Apr 16, 2025 at 04:04:10PM -0400, Martin K. Petersen wrote:
>
> Placing multiple protection information buffers inside the same page
> can lead to oopses because set_page_dirty_lock() can't be called from
> interrupt context.
>
> Since a protection information buffer is not backed by a file there is
> no point in setting its page dirty, there is nothing to synchronize.
> Drop the call to set_page_dirty_lock() and remove the last argument to
> bio_integrity_unpin_bvec().
Who sais it it not backed by a file? Which that would be a very unusual
use case, there is nothing limiting us from using integity or metadata
from a file mapping. Instead we'll need to do the same thing for the
data path and defer the unmapping to a user context when needed.
(Assuming we can get rid of this entirely, about which we have another
discussion I don't have time to follow up on as I'm on vacation)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-17 6:16 ` Christoph Hellwig
@ 2025-04-17 14:05 ` Martin K. Petersen
2025-04-21 5:12 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2025-04-17 14:05 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K. Petersen, Keith Busch, linux-block, axboe,
Matthew Wilcox, Liam R. Howlett, Anuj Gupta
Hi Christoph!
> Who sais it it not backed by a file? Which that would be a very
> unusual use case, there is nothing limiting us from using integity or
> metadata from a file mapping. Instead we'll need to do the same thing
> for the data path and defer the unmapping to a user context when
> needed.
If you have a use case for file-backed then supporting deferred dirtying
is perfectly fine with me.
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-16 20:04 [PATCH] block: integrity: Do not call set_page_dirty_lock() Martin K. Petersen
` (2 preceding siblings ...)
2025-04-17 6:16 ` Christoph Hellwig
@ 2025-04-17 14:36 ` Matthew Wilcox
2025-04-21 12:06 ` Christoph Hellwig
3 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2025-04-17 14:36 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Keith Busch, linux-block, axboe, Liam R. Howlett, Anuj Gupta
On Wed, Apr 16, 2025 at 04:04:10PM -0400, Martin K. Petersen wrote:
> Placing multiple protection information buffers inside the same page
> can lead to oopses because set_page_dirty_lock() can't be called from
> interrupt context.
>
> Since a protection information buffer is not backed by a file there is
> no point in setting its page dirty, there is nothing to synchronize.
> Drop the call to set_page_dirty_lock() and remove the last argument to
> bio_integrity_unpin_bvec().
I think the patch is right, but the commit message is wrong.
Let's suppose we're allocating the PI buffer in anonymous memory.
Also, we're under memory pressure. We've already swapped out the page
containing the PI buffer once, so it's in the swap cache and marked
as clean. We do a READ from the device, and the new metadata is written
to the page. Then a new round of memory reclaim happens and this page
is chosen. If it's still clean, the new contents will not be written
to swap and the page will simply be discarded. When we go to validate
the PI data, the page will be swapped back in, but it will have old PI
information in it so the verification will fail.
What we need to do is mark the folio dirty at pin time. I believe
O_DIRECT does this properly, and I'm not sure whether this code does it
properly or not.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-17 14:05 ` Martin K. Petersen
@ 2025-04-21 5:12 ` Christoph Hellwig
2025-04-22 2:14 ` Martin K. Petersen
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-04-21 5:12 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, Keith Busch, linux-block, axboe,
Matthew Wilcox, Liam R. Howlett, Anuj Gupta
On Thu, Apr 17, 2025 at 10:05:11AM -0400, Martin K. Petersen wrote:
>
> Hi Christoph!
>
> > Who sais it it not backed by a file? Which that would be a very
> > unusual use case, there is nothing limiting us from using integity or
> > metadata from a file mapping. Instead we'll need to do the same thing
> > for the data path and defer the unmapping to a user context when
> > needed.
>
> If you have a use case for file-backed then supporting deferred dirtying
> is perfectly fine with me.
I do not personally have a use case. But we support using file backed
memory right now and have since adding these user interfaces. Suddenly
removing the dirtying will cause silent data corruptions for these use
cases if they exist, which is not a good change.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-17 14:36 ` Matthew Wilcox
@ 2025-04-21 12:06 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2025-04-21 12:06 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Martin K. Petersen, Keith Busch, linux-block, axboe,
Liam R. Howlett, Anuj Gupta
On Thu, Apr 17, 2025 at 03:36:43PM +0100, Matthew Wilcox wrote:
> Let's suppose we're allocating the PI buffer in anonymous memory.
> Also, we're under memory pressure. We've already swapped out the page
> containing the PI buffer once, so it's in the swap cache and marked
> as clean. We do a READ from the device, and the new metadata is written
> to the page. Then a new round of memory reclaim happens and this page
> is chosen. If it's still clean, the new contents will not be written
> to swap and the page will simply be discarded. When we go to validate
> the PI data, the page will be swapped back in, but it will have old PI
> information in it so the verification will fail.
>
> What we need to do is mark the folio dirty at pin time. I believe
> O_DIRECT does this properly, and I'm not sure whether this code does it
> properly or not.
O_DIRECT reads dirty right after pinning and then check if the dirty
bit has been cleared in the I/O completion handler and redirty from a
workqueue if so. We're currently trying to figure out if we still need
that redirtying with proper pinning.
Either way metadata should follow the data behavior 1:1 here and
preferably also share the code for that as much as possible.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-21 5:12 ` Christoph Hellwig
@ 2025-04-22 2:14 ` Martin K. Petersen
2025-04-22 6:13 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2025-04-22 2:14 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Martin K. Petersen, Keith Busch, linux-block, axboe,
Matthew Wilcox, Liam R. Howlett, Anuj Gupta
Christoph,
> I do not personally have a use case. But we support using file backed
> memory right now and have since adding these user interfaces. Suddenly
> removing the dirtying will cause silent data corruptions for these use
> cases if they exist, which is not a good change.
Oopsing in the common use case isn't desirable either, though :(
In any case I'll work some more on this tomorrow unless you beat me to
it.
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] block: integrity: Do not call set_page_dirty_lock()
2025-04-22 2:14 ` Martin K. Petersen
@ 2025-04-22 6:13 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2025-04-22 6:13 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, Keith Busch, linux-block, axboe,
Matthew Wilcox, Liam R. Howlett, Anuj Gupta
On Mon, Apr 21, 2025 at 10:14:27PM -0400, Martin K. Petersen wrote:
> > I do not personally have a use case. But we support using file backed
> > memory right now and have since adding these user interfaces. Suddenly
> > removing the dirtying will cause silent data corruptions for these use
> > cases if they exist, which is not a good change.
>
> Oopsing in the common use case isn't desirable either, though :(
Agreed.
> In any case I'll work some more on this tomorrow unless you beat me to
> it.
I'm not going to get to this today, I'm still in holiday catchup mode
with a huge unread inbox.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-04-22 6:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 20:04 [PATCH] block: integrity: Do not call set_page_dirty_lock() Martin K. Petersen
2025-04-16 20:15 ` Keith Busch
2025-04-16 22:01 ` Martin K. Petersen
2025-04-16 20:17 ` Jens Axboe
2025-04-17 6:16 ` Christoph Hellwig
2025-04-17 14:05 ` Martin K. Petersen
2025-04-21 5:12 ` Christoph Hellwig
2025-04-22 2:14 ` Martin K. Petersen
2025-04-22 6:13 ` Christoph Hellwig
2025-04-17 14:36 ` Matthew Wilcox
2025-04-21 12:06 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox