* [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way
@ 2022-10-28 18:02 Al Viro
2022-10-28 18:51 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Al Viro @ 2022-10-28 18:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block
We have this:
static inline bool blk_queue_may_bounce(struct request_queue *q)
{
return IS_ENABLED(CONFIG_BOUNCE) &&
q->limits.bounce == BLK_BOUNCE_HIGH &&
max_low_pfn >= max_pfn;
}
static inline struct bio *blk_queue_bounce(struct bio *bio,
struct request_queue *q)
{
if (unlikely(blk_queue_may_bounce(q) && bio_has_data(bio)))
return __blk_queue_bounce(bio, q);
return bio;
}
Now, the last term in expression in blk_queue_may_bounce() is
true only on the boxen where max_pfn is no greater than max_low_pfn.
Unless I'm very confused, that's the boxen where we don't *have*
any highmem pages.
What's more, consider this:
static __init int init_emergency_pool(void)
{
int ret;
#ifndef CONFIG_MEMORY_HOTPLUG
if (max_pfn <= max_low_pfn)
return 0;
#endif
ret = mempool_init_page_pool(&page_pool, POOL_SIZE, 0);
BUG_ON(ret);
pr_info("pool size: %d pages\n", POOL_SIZE);
init_bounce_bioset();
return 0;
}
On the same boxen (assuming we've not hotplug) page_pool won't be set up
at all, so we wouldn't be able to bounce any highmem page if we ever
ran into one.
AFAICS, this condition is backwards - it should be
static inline bool blk_queue_may_bounce(struct request_queue *q)
{
return IS_ENABLED(CONFIG_BOUNCE) &&
q->limits.bounce == BLK_BOUNCE_HIGH &&
max_low_pfn < max_pfn;
}
What am I missing here?
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-28 18:02 [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way Al Viro @ 2022-10-28 18:51 ` Jens Axboe 2022-10-28 19:21 ` Al Viro 2022-10-28 19:14 ` Al Viro 2022-10-30 7:45 ` Christoph Hellwig 2 siblings, 1 reply; 10+ messages in thread From: Jens Axboe @ 2022-10-28 18:51 UTC (permalink / raw) To: Al Viro, Christoph Hellwig; +Cc: linux-block On 10/28/22 12:02 PM, Al Viro wrote: > We have this: > > static inline bool blk_queue_may_bounce(struct request_queue *q) > { > return IS_ENABLED(CONFIG_BOUNCE) && > q->limits.bounce == BLK_BOUNCE_HIGH && > max_low_pfn >= max_pfn; > } > > static inline struct bio *blk_queue_bounce(struct bio *bio, > struct request_queue *q) > { > if (unlikely(blk_queue_may_bounce(q) && bio_has_data(bio))) > return __blk_queue_bounce(bio, q); > return bio; > } > > Now, the last term in expression in blk_queue_may_bounce() is > true only on the boxen where max_pfn is no greater than max_low_pfn. > > Unless I'm very confused, that's the boxen where we don't *have* > any highmem pages. > > What's more, consider this: > > static __init int init_emergency_pool(void) > { > int ret; > > #ifndef CONFIG_MEMORY_HOTPLUG > if (max_pfn <= max_low_pfn) > return 0; > #endif > > ret = mempool_init_page_pool(&page_pool, POOL_SIZE, 0); > BUG_ON(ret); > pr_info("pool size: %d pages\n", POOL_SIZE); > > init_bounce_bioset(); > return 0; > } > > On the same boxen (assuming we've not hotplug) page_pool won't be set up > at all, so we wouldn't be able to bounce any highmem page if we ever > ran into one. > > AFAICS, this condition is backwards - it should be > > static inline bool blk_queue_may_bounce(struct request_queue *q) > { > return IS_ENABLED(CONFIG_BOUNCE) && > q->limits.bounce == BLK_BOUNCE_HIGH && > max_low_pfn < max_pfn; > } > > What am I missing here? I don't think you're missing anything, the case we need it for is when max_pfn > max_low_pfn. I wonder when this got botched? Because some of those statements date back probably about 20 years when we started allowing highmem pages to do IO. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-28 18:51 ` Jens Axboe @ 2022-10-28 19:21 ` Al Viro 2022-10-29 16:45 ` Al Viro 2022-10-30 7:48 ` Christoph Hellwig 0 siblings, 2 replies; 10+ messages in thread From: Al Viro @ 2022-10-28 19:21 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-block On Fri, Oct 28, 2022 at 12:51:00PM -0600, Jens Axboe wrote: > On 10/28/22 12:02 PM, Al Viro wrote: > > We have this: > > > > static inline bool blk_queue_may_bounce(struct request_queue *q) > > { > > return IS_ENABLED(CONFIG_BOUNCE) && > > q->limits.bounce == BLK_BOUNCE_HIGH && > > max_low_pfn >= max_pfn; > > } > > > > static inline struct bio *blk_queue_bounce(struct bio *bio, > > struct request_queue *q) > > { > > if (unlikely(blk_queue_may_bounce(q) && bio_has_data(bio))) > > return __blk_queue_bounce(bio, q); > > return bio; > > } > > > > Now, the last term in expression in blk_queue_may_bounce() is > > true only on the boxen where max_pfn is no greater than max_low_pfn. > > > > Unless I'm very confused, that's the boxen where we don't *have* > > any highmem pages. > > > > What's more, consider this: > > > > static __init int init_emergency_pool(void) > > { > > int ret; > > > > #ifndef CONFIG_MEMORY_HOTPLUG > > if (max_pfn <= max_low_pfn) > > return 0; > > #endif > > > > ret = mempool_init_page_pool(&page_pool, POOL_SIZE, 0); > > BUG_ON(ret); > > pr_info("pool size: %d pages\n", POOL_SIZE); > > > > init_bounce_bioset(); > > return 0; > > } > > > > On the same boxen (assuming we've not hotplug) page_pool won't be set up > > at all, so we wouldn't be able to bounce any highmem page if we ever > > ran into one. > > > > AFAICS, this condition is backwards - it should be > > > > static inline bool blk_queue_may_bounce(struct request_queue *q) > > { > > return IS_ENABLED(CONFIG_BOUNCE) && > > q->limits.bounce == BLK_BOUNCE_HIGH && > > max_low_pfn < max_pfn; > > } > > > > What am I missing here? > > I don't think you're missing anything, the case we need it for is when > max_pfn > max_low_pfn. I wonder when this got botched? Because some > of those statements date back probably about 20 years when we started > allowing highmem pages to do IO. 9bb33f24abbd "block: refactor the bounce buffering code" about a year ago. But fixing the test alone is not going to be enough - just a quick look through __blk_queue_bounce() catches an obvious bug on write (we copy the part of highmem page we covered by bio into the beginning of the bounce page - and leave ->bv_offset as-is) as well as a possible bug in ->bi_status handling (if we really can run into bio_split() there). I wonder if we ought to simply add "depends on BROKEN" for CONFIG_BOUNCE... ;-/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-28 19:21 ` Al Viro @ 2022-10-29 16:45 ` Al Viro 2022-10-30 7:50 ` Christoph Hellwig 2022-10-30 7:48 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Al Viro @ 2022-10-29 16:45 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, linux-block On Fri, Oct 28, 2022 at 08:21:55PM +0100, Al Viro wrote: > On Fri, Oct 28, 2022 at 12:51:00PM -0600, Jens Axboe wrote: > > On 10/28/22 12:02 PM, Al Viro wrote: > > > We have this: > > > > > > static inline bool blk_queue_may_bounce(struct request_queue *q) > > > { > > > return IS_ENABLED(CONFIG_BOUNCE) && > > > q->limits.bounce == BLK_BOUNCE_HIGH && > > > max_low_pfn >= max_pfn; > > > } > > > > > > static inline struct bio *blk_queue_bounce(struct bio *bio, > > > struct request_queue *q) > > > { > > > if (unlikely(blk_queue_may_bounce(q) && bio_has_data(bio))) > > > return __blk_queue_bounce(bio, q); > > > return bio; > > > } > > > > > > Now, the last term in expression in blk_queue_may_bounce() is > > > true only on the boxen where max_pfn is no greater than max_low_pfn. > > > > > > Unless I'm very confused, that's the boxen where we don't *have* > > > any highmem pages. > > > > > > What's more, consider this: > > > > > > static __init int init_emergency_pool(void) > > > { > > > int ret; > > > > > > #ifndef CONFIG_MEMORY_HOTPLUG > > > if (max_pfn <= max_low_pfn) > > > return 0; > > > #endif > > > > > > ret = mempool_init_page_pool(&page_pool, POOL_SIZE, 0); > > > BUG_ON(ret); > > > pr_info("pool size: %d pages\n", POOL_SIZE); > > > > > > init_bounce_bioset(); > > > return 0; > > > } > > > > > > On the same boxen (assuming we've not hotplug) page_pool won't be set up > > > at all, so we wouldn't be able to bounce any highmem page if we ever > > > ran into one. > > > > > > AFAICS, this condition is backwards - it should be > > > > > > static inline bool blk_queue_may_bounce(struct request_queue *q) > > > { > > > return IS_ENABLED(CONFIG_BOUNCE) && > > > q->limits.bounce == BLK_BOUNCE_HIGH && > > > max_low_pfn < max_pfn; > > > } > > > > > > What am I missing here? > > > > I don't think you're missing anything, the case we need it for is when > > max_pfn > max_low_pfn. I wonder when this got botched? Because some > > of those statements date back probably about 20 years when we started > > allowing highmem pages to do IO. > > 9bb33f24abbd "block: refactor the bounce buffering code" about a year ago. > > But fixing the test alone is not going to be enough - just a quick look > through __blk_queue_bounce() catches an obvious bug on write (we copy > the part of highmem page we covered by bio into the beginning of the > bounce page - and leave ->bv_offset as-is) as well as a possible bug in > ->bi_status handling (if we really can run into bio_split() there). > > I wonder if we ought to simply add "depends on BROKEN" for CONFIG_BOUNCE... ;-/ BTW, ->bi_status propagation looks brittle in general. Are there any places other than bio_init() where we would want to store 0 in ->bi_status of anything? Look at e.g. null_blk; AFAICS, we store to ->bi_status on any request completion (in NULL_Q_BIO case, at least). What happens if request gets split and split-off part finishes first with an error? AFAICS, its ->bi_status will be copied to parent (original bio, the one that covers the tail). Now the IO on the original bio is over as well and we hit drivers/block/null_blk/main.c:end_cmd(). Suppose this part succeeds; won't we end up overwriting ->bi_status with zero and assuming that the entire thing had succeeded, despite the (now lost) error on the split-off part? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-29 16:45 ` Al Viro @ 2022-10-30 7:50 ` Christoph Hellwig 2022-10-30 18:47 ` Al Viro 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2022-10-30 7:50 UTC (permalink / raw) To: Al Viro; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Sat, Oct 29, 2022 at 05:45:45PM +0100, Al Viro wrote: > completion (in NULL_Q_BIO case, at least). What happens if request > gets split and split-off part finishes first with an error? AFAICS, > its ->bi_status will be copied to parent (original bio, the one that > covers the tail). Now the IO on the original bio is over as well > and we hit drivers/block/null_blk/main.c:end_cmd(). Suppose this > part succeeds; won't we end up overwriting ->bi_status with zero > and assuming that the entire thing had succeeded, despite the > (now lost) error on the split-off part? As a rule of thumb drives should never set bi_status to 0, so null_blk here has a bug. What is missing everywhere is proper memory barriers, though. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-30 7:50 ` Christoph Hellwig @ 2022-10-30 18:47 ` Al Viro 2022-11-01 11:11 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Al Viro @ 2022-10-30 18:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, linux-block On Sun, Oct 30, 2022 at 08:50:32AM +0100, Christoph Hellwig wrote: > On Sat, Oct 29, 2022 at 05:45:45PM +0100, Al Viro wrote: > > completion (in NULL_Q_BIO case, at least). What happens if request > > gets split and split-off part finishes first with an error? AFAICS, > > its ->bi_status will be copied to parent (original bio, the one that > > covers the tail). Now the IO on the original bio is over as well > > and we hit drivers/block/null_blk/main.c:end_cmd(). Suppose this > > part succeeds; won't we end up overwriting ->bi_status with zero > > and assuming that the entire thing had succeeded, despite the > > (now lost) error on the split-off part? > > As a rule of thumb drives should never set bi_status to 0, so null_blk > here has a bug. What is missing everywhere is proper memory barriers, > though. Something like static inline void set_bio_status(struct bio *bio, blk_status_t status) { if (unlikely(status)) cmpxchg(&bio->bi_status, 0, status); } with e.g. if (bio->bi_status && !dio->bio.bi_status) dio->bio.bi_status = bio->bi_status; in blkdev_bio_end_io() replaced with set_bio_status(&dio->bio, bio->bi_status); perhaps? That would probably do for almost all users, but... what about e.g. drivers/md/raid1.c:fix_sync_read_error()? Looks like it really intends non-zero -> zero change; I'm not familiar enough with the guts of that sucker to tell if it is guaranteed to get no propagation from another bio... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-30 18:47 ` Al Viro @ 2022-11-01 11:11 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2022-11-01 11:11 UTC (permalink / raw) To: Al Viro; +Cc: Christoph Hellwig, Jens Axboe, linux-block On Sun, Oct 30, 2022 at 06:47:33PM +0000, Al Viro wrote: > static inline void set_bio_status(struct bio *bio, blk_status_t status) > { > if (unlikely(status)) > cmpxchg(&bio->bi_status, 0, status); > } > > with e.g. > if (bio->bi_status && !dio->bio.bi_status) > dio->bio.bi_status = bio->bi_status; > in blkdev_bio_end_io() replaced with > set_bio_status(&dio->bio, bio->bi_status); > > perhaps? I think so, yes. > That would probably do for almost all users, but... what about > e.g. drivers/md/raid1.c:fix_sync_read_error()? Looks like it really > intends non-zero -> zero change; I'm not familiar enough with the guts > of that sucker to tell if it is guaranteed to get no propagation from > another bio... I think the clearing to zero here is intentional as the bio failed, but it manages to get the data from the other leg of the mirror. But the md code is really hard to follow, and any change to this code needs careful review from the maintainer and the linux-raid list. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-28 19:21 ` Al Viro 2022-10-29 16:45 ` Al Viro @ 2022-10-30 7:48 ` Christoph Hellwig 1 sibling, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2022-10-30 7:48 UTC (permalink / raw) To: Al Viro; +Cc: Jens Axboe, Christoph Hellwig, linux-block On Fri, Oct 28, 2022 at 08:21:54PM +0100, Al Viro wrote: > I wonder if we ought to simply add "depends on BROKEN" for CONFIG_BOUNCE... ;-/ Depends on BROKEN does not help anyone, we might better remove it. Users are: paride drivers drivers/mmc/ for a few drivers 3 SCSI LLDDs (two of those parport attached, one ISA) usb-storage The thing that most matters is usb-storage that can't be used on highmem systems, and maybe one or two of the mmc drivers might exist on arm highmem systems. In general nothing should use these, for MMIO/PIO based drivers they should kmap, and for DMA drivers nothing should use the kernel virtual address, except that the USB interfaces are a mess that seems to partially require one. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-28 18:02 [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way Al Viro 2022-10-28 18:51 ` Jens Axboe @ 2022-10-28 19:14 ` Al Viro 2022-10-30 7:45 ` Christoph Hellwig 2 siblings, 0 replies; 10+ messages in thread From: Al Viro @ 2022-10-28 19:14 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-block On Fri, Oct 28, 2022 at 07:02:54PM +0100, Al Viro wrote: > AFAICS, this condition is backwards - it should be > > static inline bool blk_queue_may_bounce(struct request_queue *q) > { > return IS_ENABLED(CONFIG_BOUNCE) && > q->limits.bounce == BLK_BOUNCE_HIGH && > max_low_pfn < max_pfn; > } > > What am I missing here? More fun in that area: /* * Bvec table can't be updated by bio_for_each_segment_all(), * so retrieve bvec from the table directly. This way is safe * because the 'bio' is single-page bvec. */ for (i = 0, to = bio->bi_io_vec; i < bio->bi_vcnt; to++, i++) { struct page *bounce_page; if (!PageHighMem(to->bv_page)) continue; bounce_page = mempool_alloc(&page_pool, GFP_NOIO); inc_zone_page_state(bounce_page, NR_BOUNCE); if (rw == WRITE) { flush_dcache_page(to->bv_page); memcpy_from_bvec(page_address(bounce_page), to); ^^^^^^^^^^^^^^^^^^^^^^^^^ } to->bv_page = bounce_page; } Consider the case when highmem page comes in bio_vec that covers the second half of it. We * allocate a bounce page * copy the second half of old page into the first half of new one * point the bio_vec to the second half of the new page. * submit the mangled bio. While we are at it, the logics above that re splitting the bio before bothering with bounces also looks somewhat fishy; if it triggers (which needs > 256 elements in the original vector) we get bio split and parts chained, then, assuming we run into a highmem page in each half, we end up with bounce bio 1: has ->bi_private pointing to bio 1, ->bi_end_io bounce_end_io_{read,write}(). Queued. bounce bio 2: has ->bi_private pointing to bio 2, ->bi_end_io bounce_end_io_{read,write}(). Queued. bio 1: original, covers the tail of original range. bio 2: covers the beginning of original range, ->bi_private points to bio 1, ->bi_end_io is bio_chain_endio(). Suppose the IO on bounce bio 2 fails. We get ->bi_status of that sucker set to non-zero. ->bi_end_io() is called, leading to bounce_end_io(), which will copy that status to bio 2 and call bio_endio() on bio 2. Which will check ->bi_status on bio 1, see it zero and propagate the error to bio 1. Now bounce bio 1 completes without an error. We still have zero in its ->bi_status and the call of bounce_end_io() hits bio_orig->bi_status = bio->bi_status; bio_endio(bio_orig); copying that zero to bio 1. Oops - we'd just lost the error reported by the IO on the other half... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way 2022-10-28 18:02 [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way Al Viro 2022-10-28 18:51 ` Jens Axboe 2022-10-28 19:14 ` Al Viro @ 2022-10-30 7:45 ` Christoph Hellwig 2 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2022-10-30 7:45 UTC (permalink / raw) To: Al Viro; +Cc: Christoph Hellwig, linux-block On Fri, Oct 28, 2022 at 07:02:54PM +0100, Al Viro wrote: > Now, the last term in expression in blk_queue_may_bounce() is > true only on the boxen where max_pfn is no greater than max_low_pfn. > > Unless I'm very confused, that's the boxen where we don't *have* > any highmem pages. Yes, this go mixed up. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-11-01 11:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-28 18:02 [bug?] blk_queue_may_bounce() has the comparison max_low_pfn and max_pfn wrong way Al Viro 2022-10-28 18:51 ` Jens Axboe 2022-10-28 19:21 ` Al Viro 2022-10-29 16:45 ` Al Viro 2022-10-30 7:50 ` Christoph Hellwig 2022-10-30 18:47 ` Al Viro 2022-11-01 11:11 ` Christoph Hellwig 2022-10-30 7:48 ` Christoph Hellwig 2022-10-28 19:14 ` Al Viro 2022-10-30 7:45 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox