* Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling
@ 2023-12-22 18:31 Ira Weiny
2023-12-22 18:57 ` Ira Weiny
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Ira Weiny @ 2023-12-22 18:31 UTC (permalink / raw)
To: Coly Li
Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke,
NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel
Coly,
Yesterday I noticed that a few of our nvdimm tests were failing. I bisected
the problem to the following commit.
aa511ff8218b ("badblocks: switch to the improved badblock handling code")
Reverting this patch fixed our tests.
I've also dug into the code a bit and I believe the algorithm for
badblocks_check() is broken (not yet sure about the other calls). At the
very least I see the bb->p pointer being indexed with '-1'. :-(
I did notice that this work was due to a bug report in badblock_set().
Therefore, I'm not sure of that severity of that fix is vs a revert. But
at this point I'm not seeing an easy fix so I'm in favor of a revert.
Thanks,
Ira
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-22 18:31 Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling Ira Weiny @ 2023-12-22 18:57 ` Ira Weiny 2023-12-23 0:24 ` Ira Weiny 2023-12-23 6:52 ` Coly Li 2023-12-23 8:35 ` Linux regression tracking #adding (Thorsten Leemhuis) 2 siblings, 1 reply; 9+ messages in thread From: Ira Weiny @ 2023-12-22 18:57 UTC (permalink / raw) To: Ira Weiny, Coly Li Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel Ira Weiny wrote: > Coly, > > Yesterday I noticed that a few of our nvdimm tests were failing. I bisected > the problem to the following commit. > > aa511ff8218b ("badblocks: switch to the improved badblock handling code") > > Reverting this patch fixed our tests. > > I've also dug into the code a bit and I believe the algorithm for > badblocks_check() is broken (not yet sure about the other calls). At the > very least I see the bb->p pointer being indexed with '-1'. :-( > > I did notice that this work was due to a bug report in badblock_set(). > Therefore, I'm not sure of that severity of that fix is vs a revert. But > at this point I'm not seeing an easy fix so I'm in favor of a revert. > Dan and I were discussing this and it occurs to us that it may be easy for you to stand up the test environment I'm using. For CXL we have a run_qemu.sh project[1] which stands up a qemu environment with the ndctl[2] tests in them. Clone ndctl to ~/git/ndctl so run_qemu.sh can find it. Then start run_qemu.sh in a kernel tree like this: $ <path_to_run_qemu>/run_qemu.sh --cxl --nfit-test --nfit-debug [-r img] [-r img] is optional but useful if you have changed the ndctl tests. Once booted you can run the test suite with meson: $ cd ndctl && meson test -C build I've been running just our clear.sh test which shows the error. $ cd ndctl/build && meson test clear.sh Hope this helps, Ira [1] https://github.com/pmem/run_qemu [2] https://github.com/pmem/ndctl ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-22 18:57 ` Ira Weiny @ 2023-12-23 0:24 ` Ira Weiny 2023-12-23 9:39 ` Coly Li 0 siblings, 1 reply; 9+ messages in thread From: Ira Weiny @ 2023-12-23 0:24 UTC (permalink / raw) To: Ira Weiny, Coly Li Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel Ira Weiny wrote: > Ira Weiny wrote: > > Coly, > > > > Yesterday I noticed that a few of our nvdimm tests were failing. I bisected > > the problem to the following commit. > > > > aa511ff8218b ("badblocks: switch to the improved badblock handling code") > > > > Reverting this patch fixed our tests. > > [snip] I added some prints[1] to try and see what is happening. Perhaps this will help you. ... [ 99.919237] IKW set_badblock 00000000aa44c55d 8000 1 [ 99.921448] IKW set_badblock 00000000aa44c55d 8001 1 [ 99.924051] IKW set_badblock 00000000aa44c55d 8002 1 [ 99.926135] IKW set_badblock 00000000aa44c55d 8003 1 [ 99.928516] IKW set_badblock 00000000aa44c55d 8004 1 [ 99.930491] IKW set_badblock 00000000aa44c55d 8005 1 [ 99.932894] IKW set_badblock 00000000aa44c55d 8006 1 [ 99.936638] IKW set_badblock 00000000aa44c55d 8007 1 [ 100.999297] IKW _badblocks_check() 00000000aa44c55d s 8000 num 1 [ 101.000027] IKW table count 1 shift 0 [ 101.000644] IKW 0: off 8000 end 8008 [ 101.001271] IKW prev 0, cnt 1 [ 101.002481] IKW start 8000, len 1 [ 101.003464] IKW front overlap 0 [ 101.004256] IKW rv 1 ... ^^^^^^^^^ <This is a valid failure as part of the test> ... [ 101.148783] IKW set_badblock 00000000721b4f3d 8000 1 [ 101.150629] IKW set_badblock 00000000721b4f3d 8001 1 [ 101.152315] IKW set_badblock 00000000721b4f3d 8002 1 [ 101.154544] IKW set_badblock 00000000721b4f3d 8003 1 [ 101.156238] IKW set_badblock 00000000721b4f3d 8004 1 [ 101.158310] IKW set_badblock 00000000721b4f3d 8005 1 [ 101.160196] IKW set_badblock 00000000721b4f3d 8006 1 [ 101.162158] IKW set_badblock 00000000721b4f3d 8007 1 [ 101.163543] IKW _badblocks_check() 00000000721b4f3d s 0 num 8 [ 101.164427] IKW table count 1 shift 0 [ 101.165310] IKW 0: off 8000 end 8008 [ 101.166398] IKW prev -1, cnt 1 [ 101.167178] IKW start 0, len 8 [ 101.168107] IKW rv 0 [ 101.168858] IKW _badblocks_check() 00000000721b4f3d s 8 num 8 [ 101.169814] IKW table count 1 shift 0 [ 101.170547] IKW 0: off 8000 end 8008 [ 101.171238] IKW prev -1, cnt 1 [ 101.171985] IKW start 8, len 8 [ 101.173007] IKW front overlap -1 <== this is prev which is used to index bb->pages [ 101.174157] IKW prev -1, cnt 1 [ 101.175268] IKW start 9, len 7 [ 101.176557] IKW rv -1 ... ^^^^^^^^^ This is where the failure occurs. ... I think overlap_front() is not working correctly in this case. And from my reading of the code I don't know how it would. But overlap_front() is used elsewhere and I'm not confident in making the change. Hope this helps, Ira [1] diff --git a/block/badblocks.c b/block/badblocks.c index fc92d4e18aa3..21e22ee576e5 100644 --- a/block/badblocks.c +++ b/block/badblocks.c @@ -1280,6 +1280,16 @@ static int _badblocks_check(struct badblocks *bb, sector_t s, int sectors, unsigned int seq; int len, rv; u64 *p; + int i; + + printk(KERN_CRIT "IKW %s() %p s %llx num %d\n", __func__, + bb, s, sectors); + + printk(KERN_CRIT " IKW table count %d shift %d\n", bb->count, bb->shift); + for (i = 0; i < bb->count; i++) { + printk(KERN_CRIT " IKW %d: off %llx end %llx\n", i, + BB_OFFSET(bb->page[i]), BB_END(bb->page[i])); + } WARN_ON(bb->shift < 0 || sectors == 0); @@ -1311,6 +1321,9 @@ static int _badblocks_check(struct badblocks *bb, sector_t s, int sectors, prev = prev_badblocks(bb, &bad, hint); + printk(KERN_CRIT " IKW prev %d, cnt %d\n", prev, bb->count); + printk(KERN_CRIT " IKW start %llx, len %llx\n", bad.start, bad.len); + /* start after all badblocks */ if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { len = sectors; @@ -1318,6 +1331,7 @@ static int _badblocks_check(struct badblocks *bb, sector_t s, int sectors, } if (overlap_front(bb, prev, &bad)) { + printk(KERN_CRIT " IKW front overlap %d\n", prev); if (BB_ACK(p[prev])) acked_badblocks++; else @@ -1365,6 +1379,7 @@ static int _badblocks_check(struct badblocks *bb, sector_t s, int sectors, if (read_seqretry(&bb->lock, seq)) goto retry; + printk(KERN_CRIT "IKW rv %d\n", rv); return rv; } diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c index a002ea6fdd84..93ffd189bc75 100644 --- a/drivers/nvdimm/badrange.c +++ b/drivers/nvdimm/badrange.c @@ -167,6 +167,7 @@ static void set_badblock(struct badblocks *bb, sector_t s, int num) dev_dbg(bb->dev, "Found a bad range (0x%llx, 0x%llx)\n", (u64) s * 512, (u64) num * 512); /* this isn't an error as the hardware will still throw an exception */ + printk(KERN_CRIT "IKW %s %p %llx %x\n", __func__, bb, s, num); if (badblocks_set(bb, s, num, 1)) dev_info_once(bb->dev, "%s: failed for sector %llx\n", __func__, (u64) s); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-23 0:24 ` Ira Weiny @ 2023-12-23 9:39 ` Coly Li 2023-12-23 17:13 ` Ira Weiny 0 siblings, 1 reply; 9+ messages in thread From: Coly Li @ 2023-12-23 9:39 UTC (permalink / raw) To: Ira Weiny Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 5471 bytes --] On Fri, Dec 22, 2023 at 04:24:16PM -0800, Ira Weiny wrote: > Ira Weiny wrote: > > Ira Weiny wrote: > > > Coly, > > > > > > Yesterday I noticed that a few of our nvdimm tests were failing. I bisected > > > the problem to the following commit. > > > > > > aa511ff8218b ("badblocks: switch to the improved badblock handling code") > > > > > > Reverting this patch fixed our tests. > > > > > [snip] > > I added some prints[1] to try and see what is happening. Perhaps this will > help you. > > ... > [ 99.919237] IKW set_badblock 00000000aa44c55d 8000 1 > [ 99.921448] IKW set_badblock 00000000aa44c55d 8001 1 > [ 99.924051] IKW set_badblock 00000000aa44c55d 8002 1 > [ 99.926135] IKW set_badblock 00000000aa44c55d 8003 1 > [ 99.928516] IKW set_badblock 00000000aa44c55d 8004 1 > [ 99.930491] IKW set_badblock 00000000aa44c55d 8005 1 > [ 99.932894] IKW set_badblock 00000000aa44c55d 8006 1 > [ 99.936638] IKW set_badblock 00000000aa44c55d 8007 1 > [ 100.999297] IKW _badblocks_check() 00000000aa44c55d s 8000 num 1 > [ 101.000027] IKW table count 1 shift 0 > [ 101.000644] IKW 0: off 8000 end 8008 > [ 101.001271] IKW prev 0, cnt 1 > [ 101.002481] IKW start 8000, len 1 > [ 101.003464] IKW front overlap 0 > [ 101.004256] IKW rv 1 > > ... ^^^^^^^^^ > <This is a valid failure as part of the test> > ... > > [ 101.148783] IKW set_badblock 00000000721b4f3d 8000 1 > [ 101.150629] IKW set_badblock 00000000721b4f3d 8001 1 > [ 101.152315] IKW set_badblock 00000000721b4f3d 8002 1 > [ 101.154544] IKW set_badblock 00000000721b4f3d 8003 1 > [ 101.156238] IKW set_badblock 00000000721b4f3d 8004 1 > [ 101.158310] IKW set_badblock 00000000721b4f3d 8005 1 > [ 101.160196] IKW set_badblock 00000000721b4f3d 8006 1 > [ 101.162158] IKW set_badblock 00000000721b4f3d 8007 1 > [ 101.163543] IKW _badblocks_check() 00000000721b4f3d s 0 num 8 > [ 101.164427] IKW table count 1 shift 0 > [ 101.165310] IKW 0: off 8000 end 8008 > [ 101.166398] IKW prev -1, cnt 1 > [ 101.167178] IKW start 0, len 8 > [ 101.168107] IKW rv 0 > [ 101.168858] IKW _badblocks_check() 00000000721b4f3d s 8 num 8 > [ 101.169814] IKW table count 1 shift 0 > [ 101.170547] IKW 0: off 8000 end 8008 > [ 101.171238] IKW prev -1, cnt 1 > [ 101.171985] IKW start 8, len 8 > [ 101.173007] IKW front overlap -1 <== this is prev which is used to index bb->pages > [ 101.174157] IKW prev -1, cnt 1 > [ 101.175268] IKW start 9, len 7 > [ 101.176557] IKW rv -1 > > ... ^^^^^^^^^ > This is where the failure occurs. > ... > > I think overlap_front() is not working correctly in this case. And from > my reading of the code I don't know how it would. But overlap_front() is > used elsewhere and I'm not confident in making the change. > > Hope this helps, Hi Ira, The above information is accurate and very helpful, thank you! From __badblocks_check(), the problematic code block is, 1303 re_check: 1304 bad.start = s; 1305 bad.len = sectors; 1306 1307 if (badblocks_empty(bb)) { 1308 len = sectors; 1309 goto update_sectors; 1310 } 1311 1312 prev = prev_badblocks(bb, &bad, hint); 1313 1314 /* start after all badblocks */ 1315 if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { 1316 len = sectors; 1317 goto update_sectors; 1318 } 1319 1320 if (overlap_front(bb, prev, &bad)) { 1321 if (BB_ACK(p[prev])) 1322 acked_badblocks++; 1323 else 1324 unacked_badblocks++; 1325 1326 if (BB_END(p[prev]) >= (s + sectors)) 1327 len = sectors; 1328 else 1329 len = BB_END(p[prev]) - s; 1330 1331 if (set == 0) { 1332 *first_bad = BB_OFFSET(p[prev]); 1333 *bad_sectors = BB_LEN(p[prev]); 1334 set = 1; 1335 } 1336 goto update_sectors; 1337 } 1338 1339 /* Not front overlap, but behind overlap */ 1340 if ((prev + 1) < bb->count && overlap_behind(bb, &bad, prev + 1)) { 1341 len = BB_OFFSET(p[prev + 1]) - bad.start; 1342 hint = prev + 1; 1343 goto update_sectors; 1344 } 1345 1346 /* not cover any badblocks range in the table */ 1347 len = sectors; 1348 1349 update_sectors: If the checking range is before all badblocks records in the badblocks table, value -1 is returned from prev_badblock(). Code blocks between line 1314 and line 1337 doesn't hanle the implicit '-1' value properly. Then counter unacked_badblocks is increased at line 1324 mistakenly. So the value prev should be checked and make sure '>= 0' before comparing the checking range with a badblock record returned by prev_badblocks(). Other wise it dones't make sense. For badblocks_set() and badblocks_clear(), 'prev < 0' is explicitly checked, value '-1' doesn't go though into following code. Could you please apply and try the attached patch? Hope it may help a bit. And now it is weekend time, you may be out of office and not able to access the testing hardware. I will do more testing from myside and update more info if necessary. Thanks for the report and debug! Coly Li [debug patch snipped] [-- Attachment #2: 0001-badblocks-avoid-checking-invalid-range-in-badblocks_.patch --] [-- Type: text/plain, Size: 1445 bytes --] From 0f81e197404653a4631f5d5857c63510e0534430 Mon Sep 17 00:00:00 2001 From: Coly Li <colyli@suse.de> Date: Sat, 23 Dec 2023 17:24:51 +0800 Subject: [PATCH] badblocks: avoid checking invalid range in badblocks_check() If prev_badblocks() returns '-1', it means no valid badblocks record before the checking range. It doesn't make sense to check whether the input checking range is overlapped with the non-existed invalid front range. This patch checkes whether 'prev >= 0' is true before calling overlap_front(), to void such invalid operations. Reported-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Coly Li <colyli@suse.de> --- block/badblocks.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/badblocks.c b/block/badblocks.c index fc92d4e18aa3..db4ec8b9b2a8 100644 --- a/block/badblocks.c +++ b/block/badblocks.c @@ -1312,12 +1312,14 @@ static int _badblocks_check(struct badblocks *bb, sector_t s, int sectors, prev = prev_badblocks(bb, &bad, hint); /* start after all badblocks */ - if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { + if ((prev >= 0) && + ((prev + 1) >= bb->count) && !overlap_front(bb, prev, &bad)) { len = sectors; goto update_sectors; } - if (overlap_front(bb, prev, &bad)) { + /* Overlapped with front badblocks record */ + if ((prev >= 0) && overlap_front(bb, prev, &bad)) { if (BB_ACK(p[prev])) acked_badblocks++; else -- 2.35.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-23 9:39 ` Coly Li @ 2023-12-23 17:13 ` Ira Weiny 2023-12-24 0:18 ` Coly Li 0 siblings, 1 reply; 9+ messages in thread From: Ira Weiny @ 2023-12-23 17:13 UTC (permalink / raw) To: Coly Li, Ira Weiny, linan666 Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel Coly Li wrote: [snip] > > Hi Ira, > > The above information is accurate and very helpful, thank you! > > From __badblocks_check(), the problematic code block is, > 1303 re_check: > 1304 bad.start = s; > 1305 bad.len = sectors; > 1306 > 1307 if (badblocks_empty(bb)) { > 1308 len = sectors; > 1309 goto update_sectors; > 1310 } > 1311 > 1312 prev = prev_badblocks(bb, &bad, hint); > 1313 > 1314 /* start after all badblocks */ > 1315 if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { > 1316 len = sectors; > 1317 goto update_sectors; > 1318 } > 1319 > 1320 if (overlap_front(bb, prev, &bad)) { > 1321 if (BB_ACK(p[prev])) > 1322 acked_badblocks++; > 1323 else > 1324 unacked_badblocks++; > 1325 > 1326 if (BB_END(p[prev]) >= (s + sectors)) > 1327 len = sectors; > 1328 else > 1329 len = BB_END(p[prev]) - s; > 1330 > 1331 if (set == 0) { > 1332 *first_bad = BB_OFFSET(p[prev]); > 1333 *bad_sectors = BB_LEN(p[prev]); > 1334 set = 1; > 1335 } > 1336 goto update_sectors; > 1337 } > 1338 > 1339 /* Not front overlap, but behind overlap */ > 1340 if ((prev + 1) < bb->count && overlap_behind(bb, &bad, prev + 1)) { > 1341 len = BB_OFFSET(p[prev + 1]) - bad.start; > 1342 hint = prev + 1; > 1343 goto update_sectors; > 1344 } > 1345 > 1346 /* not cover any badblocks range in the table */ > 1347 len = sectors; > 1348 > 1349 update_sectors: > > If the checking range is before all badblocks records in the badblocks table, > value -1 is returned from prev_badblock(). Code blocks between line 1314 and > line 1337 doesn't hanle the implicit '-1' value properly. Then counter > unacked_badblocks is increased at line 1324 mistakenly. > > So the value prev should be checked and make sure '>= 0' before comparing > the checking range with a badblock record returned by prev_badblocks(). Other > wise it dones't make sense. > > For badblocks_set() and badblocks_clear(), 'prev < 0' is explicitly checked, > value '-1' doesn't go though into following code. > > Could you please apply and try the attached patch? Hope it may help a bit. > > And now it is weekend time, you may be out of office and not able to access > the testing hardware. I will do more testing from myside and update more info > if necessary. > > Thanks for the report and debug! > > Coly Li > > [debug patch snipped] This debug patch does fix our tests. Thanks! But Nan has submitted a series to fix this as well.[1] I'm going to test his series as well. Thanks! Ira [1] https://lore.kernel.org/linux-block/20231223063728.3229446-1-linan666@huaweicloud.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-23 17:13 ` Ira Weiny @ 2023-12-24 0:18 ` Coly Li 0 siblings, 0 replies; 9+ messages in thread From: Coly Li @ 2023-12-24 0:18 UTC (permalink / raw) To: Ira Weiny Cc: Li Nan, Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, Linux Block Devices, nvdimm, Linux Kernel Mailing List > 2023年12月24日 01:13,Ira Weiny <ira.weiny@intel.com> 写道: > > Coly Li wrote: > > [snip] > >> >> Hi Ira, >> >> The above information is accurate and very helpful, thank you! >> >> From __badblocks_check(), the problematic code block is, >> 1303 re_check: >> 1304 bad.start = s; >> 1305 bad.len = sectors; >> 1306 >> 1307 if (badblocks_empty(bb)) { >> 1308 len = sectors; >> 1309 goto update_sectors; >> 1310 } >> 1311 >> 1312 prev = prev_badblocks(bb, &bad, hint); >> 1313 >> 1314 /* start after all badblocks */ >> 1315 if ((prev + 1) >= bb->count && !overlap_front(bb, prev, &bad)) { >> 1316 len = sectors; >> 1317 goto update_sectors; >> 1318 } >> 1319 >> 1320 if (overlap_front(bb, prev, &bad)) { >> 1321 if (BB_ACK(p[prev])) >> 1322 acked_badblocks++; >> 1323 else >> 1324 unacked_badblocks++; >> 1325 >> 1326 if (BB_END(p[prev]) >= (s + sectors)) >> 1327 len = sectors; >> 1328 else >> 1329 len = BB_END(p[prev]) - s; >> 1330 >> 1331 if (set == 0) { >> 1332 *first_bad = BB_OFFSET(p[prev]); >> 1333 *bad_sectors = BB_LEN(p[prev]); >> 1334 set = 1; >> 1335 } >> 1336 goto update_sectors; >> 1337 } >> 1338 >> 1339 /* Not front overlap, but behind overlap */ >> 1340 if ((prev + 1) < bb->count && overlap_behind(bb, &bad, prev + 1)) { >> 1341 len = BB_OFFSET(p[prev + 1]) - bad.start; >> 1342 hint = prev + 1; >> 1343 goto update_sectors; >> 1344 } >> 1345 >> 1346 /* not cover any badblocks range in the table */ >> 1347 len = sectors; >> 1348 >> 1349 update_sectors: >> >> If the checking range is before all badblocks records in the badblocks table, >> value -1 is returned from prev_badblock(). Code blocks between line 1314 and >> line 1337 doesn't hanle the implicit '-1' value properly. Then counter >> unacked_badblocks is increased at line 1324 mistakenly. >> >> So the value prev should be checked and make sure '>= 0' before comparing >> the checking range with a badblock record returned by prev_badblocks(). Other >> wise it dones't make sense. >> >> For badblocks_set() and badblocks_clear(), 'prev < 0' is explicitly checked, >> value '-1' doesn't go though into following code. >> >> Could you please apply and try the attached patch? Hope it may help a bit. >> >> And now it is weekend time, you may be out of office and not able to access >> the testing hardware. I will do more testing from myside and update more info >> if necessary. >> >> Thanks for the report and debug! >> >> Coly Li >> >> [debug patch snipped] > > This debug patch does fix our tests. Thanks! > > But Nan has submitted a series to fix this as well.[1] > > I'm going to test his series as well. Hi Ira, Thanks for the very quick response, and the positive result. Now I compose a official patch and submit to Jens. Coly Li ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-22 18:31 Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling Ira Weiny 2023-12-22 18:57 ` Ira Weiny @ 2023-12-23 6:52 ` Coly Li 2023-12-23 8:35 ` Linux regression tracking #adding (Thorsten Leemhuis) 2 siblings, 0 replies; 9+ messages in thread From: Coly Li @ 2023-12-23 6:52 UTC (permalink / raw) To: Ira Weiny Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel > 2023年12月23日 02:31,Ira Weiny <ira.weiny@intel.com> 写道: > > Coly, > > Yesterday I noticed that a few of our nvdimm tests were failing. I bisected > the problem to the following commit. > > aa511ff8218b ("badblocks: switch to the improved badblock handling code") > > Reverting this patch fixed our tests. > > I've also dug into the code a bit and I believe the algorithm for > badblocks_check() is broken (not yet sure about the other calls). At the > very least I see the bb->p pointer being indexed with '-1'. :-( > > I did notice that this work was due to a bug report in badblock_set(). > Therefore, I'm not sure of that severity of that fix is vs a revert. But > at this point I'm not seeing an easy fix so I'm in favor of a revert. Hi Ira, Thanks for the information reported. Let me look into the situation now. Coly Li ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-22 18:31 Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling Ira Weiny 2023-12-22 18:57 ` Ira Weiny 2023-12-23 6:52 ` Coly Li @ 2023-12-23 8:35 ` Linux regression tracking #adding (Thorsten Leemhuis) 2024-01-07 8:48 ` Linux regression tracking #update (Thorsten Leemhuis) 2 siblings, 1 reply; 9+ messages in thread From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-12-23 8:35 UTC (permalink / raw) To: Ira Weiny, Coly Li Cc: Dan Williams, Jens Axboe, Xiao Ni, Geliang Tang, Hannes Reinecke, NeilBrown, Vishal L Verma, linux-block, nvdimm, linux-kernel [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 22.12.23 19:31, Ira Weiny wrote: > Coly, > > Yesterday I noticed that a few of our nvdimm tests were failing. I bisected > the problem to the following commit. > > aa511ff8218b ("badblocks: switch to the improved badblock handling code") > > Reverting this patch fixed our tests. > > I've also dug into the code a bit and I believe the algorithm for > badblocks_check() is broken (not yet sure about the other calls). At the > very least I see the bb->p pointer being indexed with '-1'. :-( > > I did notice that this work was due to a bug report in badblock_set(). > Therefore, I'm not sure of that severity of that fix is vs a revert. But > at this point I'm not seeing an easy fix so I'm in favor of a revert. Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced aa511ff8218b #regzbot title badblocks: nvdimm tests were failing after switch to impoved code #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling 2023-12-23 8:35 ` Linux regression tracking #adding (Thorsten Leemhuis) @ 2024-01-07 8:48 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 9+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2024-01-07 8:48 UTC (permalink / raw) To: Linux kernel regressions list; +Cc: linux-block, nvdimm, linux-kernel On 23.12.23 09:35, Linux regression tracking #adding (Thorsten Leemhuis) wrote: > On 22.12.23 19:31, Ira Weiny wrote: >> Coly, >> >> Yesterday I noticed that a few of our nvdimm tests were failing. I bisected >> the problem to the following commit. >> >> aa511ff8218b ("badblocks: switch to the improved badblock handling code") >> >> Reverting this patch fixed our tests. > > #regzbot ^introduced aa511ff8218b #regzbot introduced 3ea3354cb9f0 #regzbot fix: 146e843f6b0927 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-01-07 8:48 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 18:31 Bug in commit aa511ff8218b ("badblocks: switch to the improved badblock handling Ira Weiny
2023-12-22 18:57 ` Ira Weiny
2023-12-23 0:24 ` Ira Weiny
2023-12-23 9:39 ` Coly Li
2023-12-23 17:13 ` Ira Weiny
2023-12-24 0:18 ` Coly Li
2023-12-23 6:52 ` Coly Li
2023-12-23 8:35 ` Linux regression tracking #adding (Thorsten Leemhuis)
2024-01-07 8:48 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox