[PATCH] block:added printing when bio->bi

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] block:added printing when bio->bi_status fails
@ 2024-08-07  9:33 824731276
  2024-08-07 19:55 ` kernel test robot
  2024-08-07 20:05 ` kernel test robot
  0 siblings, 2 replies; 6+ messages in thread
From: 824731276 @ 2024-08-07  9:33 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-block, baiguo

From: baiguo <baiguo@kylinos.cn>

When ftrace is not enabled and bio is not OK,
the system cannot actively record which disk is abnormal.
Add a message record to bio_endio.

Signed-off-by: baiguo <baiguo@kylinos.cn>
---
 block/bio.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index c4053d496..29ae86c21 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1617,6 +1617,11 @@ void bio_endio(struct bio *bio)
 		bio_clear_flag(bio, BIO_TRACE_COMPLETION);
 	}
 
+	if (bio->bi_status && bio->bi_disk)
+		printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
+				__func__, bio->bi_status, bio->bi_disk->major,\
+				bio->bi_disk->first_minor);
+
 	/*
 	 * Need to have a real endio function for chained bios, otherwise
 	 * various corner cases will break (like stacking block devices that
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block:added printing when bio->bi_status fails
  2024-08-07  9:33 824731276
@ 2024-08-07 19:55 ` kernel test robot
  2024-08-07 20:05 ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-08-07 19:55 UTC (permalink / raw)
  To: 824731276, axboe; +Cc: oe-kbuild-all, linux-kernel, linux-block, baiguo

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.11-rc2 next-20240807]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/824731276-qq-com/block-added-printing-when-bio-bi_status-fails/20240807-174005
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/tencent_F71A15579D1E52ED0B58EF2F3607AA883308%40qq.com
patch subject: [PATCH] block:added printing when bio->bi_status fails
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20240808/202408080303.bwOWkFK1-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240808/202408080303.bwOWkFK1-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408080303.bwOWkFK1-lkp@intel.com/

All errors (new ones prefixed by >>):

   block/bio.c: In function 'bio_endio':
>> block/bio.c:1620:34: error: 'struct bio' has no member named 'bi_disk'
    1620 |         if (bio->bi_status && bio->bi_disk)
         |                                  ^~
   In file included from include/asm-generic/bug.h:22,
                    from arch/openrisc/include/asm/bug.h:5,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from block/bio.c:5:
   block/bio.c:1622:62: error: 'struct bio' has no member named 'bi_disk'
    1622 |                                 __func__, bio->bi_status, bio->bi_disk->major,\
         |                                                              ^~
   include/linux/printk.h:437:33: note: in definition of macro 'printk_index_wrap'
     437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                                 ^~~~~~~~~~~
   block/bio.c:1621:17: note: in expansion of macro 'printk'
    1621 |                 printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
         |                 ^~~~~~
   block/bio.c:1623:36: error: 'struct bio' has no member named 'bi_disk'
    1623 |                                 bio->bi_disk->first_minor);
         |                                    ^~
   include/linux/printk.h:437:33: note: in definition of macro 'printk_index_wrap'
     437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                                 ^~~~~~~~~~~
   block/bio.c:1621:17: note: in expansion of macro 'printk'
    1621 |                 printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
         |                 ^~~~~~


vim +1620 block/bio.c

  1589	
  1590	/**
  1591	 * bio_endio - end I/O on a bio
  1592	 * @bio:	bio
  1593	 *
  1594	 * Description:
  1595	 *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
  1596	 *   way to end I/O on a bio. No one should call bi_end_io() directly on a
  1597	 *   bio unless they own it and thus know that it has an end_io function.
  1598	 *
  1599	 *   bio_endio() can be called several times on a bio that has been chained
  1600	 *   using bio_chain().  The ->bi_end_io() function will only be called the
  1601	 *   last time.
  1602	 **/
  1603	void bio_endio(struct bio *bio)
  1604	{
  1605	again:
  1606		if (!bio_remaining_done(bio))
  1607			return;
  1608		if (!bio_integrity_endio(bio))
  1609			return;
  1610	
  1611		blk_zone_bio_endio(bio);
  1612	
  1613		rq_qos_done_bio(bio);
  1614	
  1615		if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
  1616			trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio);
  1617			bio_clear_flag(bio, BIO_TRACE_COMPLETION);
  1618		}
  1619	
> 1620		if (bio->bi_status && bio->bi_disk)
  1621			printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
  1622					__func__, bio->bi_status, bio->bi_disk->major,\
  1623					bio->bi_disk->first_minor);
  1624	
  1625		/*
  1626		 * Need to have a real endio function for chained bios, otherwise
  1627		 * various corner cases will break (like stacking block devices that
  1628		 * save/restore bi_end_io) - however, we want to avoid unbounded
  1629		 * recursion and blowing the stack. Tail call optimization would
  1630		 * handle this, but compiling with frame pointers also disables
  1631		 * gcc's sibling call optimization.
  1632		 */
  1633		if (bio->bi_end_io == bio_chain_endio) {
  1634			bio = __bio_chain_endio(bio);
  1635			goto again;
  1636		}
  1637	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block:added printing when bio->bi_status fails
  2024-08-07  9:33 824731276
  2024-08-07 19:55 ` kernel test robot
@ 2024-08-07 20:05 ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-08-07 20:05 UTC (permalink / raw)
  To: 824731276, axboe; +Cc: llvm, oe-kbuild-all, linux-kernel, linux-block, baiguo

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on linus/master v6.11-rc2 next-20240807]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/824731276-qq-com/block-added-printing-when-bio-bi_status-fails/20240807-174005
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/tencent_F71A15579D1E52ED0B58EF2F3607AA883308%40qq.com
patch subject: [PATCH] block:added printing when bio->bi_status fails
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20240808/202408080348.jL0uiVq7-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240808/202408080348.jL0uiVq7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408080348.jL0uiVq7-lkp@intel.com/

All errors (new ones prefixed by >>):

>> block/bio.c:1620:29: error: no member named 'bi_disk' in 'struct bio'
    1620 |         if (bio->bi_status && bio->bi_disk)
         |                               ~~~  ^
   block/bio.c:1622:36: error: no member named 'bi_disk' in 'struct bio'
    1622 |                                 __func__, bio->bi_status, bio->bi_disk->major,\
         |                                                           ~~~  ^
   include/linux/printk.h:465:60: note: expanded from macro 'printk'
     465 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
         |                                                            ^~~~~~~~~~~
   include/linux/printk.h:437:19: note: expanded from macro 'printk_index_wrap'
     437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                                 ^~~~~~~~~~~
   block/bio.c:1623:10: error: no member named 'bi_disk' in 'struct bio'
    1623 |                                 bio->bi_disk->first_minor);
         |                                 ~~~  ^
   include/linux/printk.h:465:60: note: expanded from macro 'printk'
     465 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
         |                                                            ^~~~~~~~~~~
   include/linux/printk.h:437:19: note: expanded from macro 'printk_index_wrap'
     437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                                 ^~~~~~~~~~~
   3 errors generated.


vim +1620 block/bio.c

  1589	
  1590	/**
  1591	 * bio_endio - end I/O on a bio
  1592	 * @bio:	bio
  1593	 *
  1594	 * Description:
  1595	 *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
  1596	 *   way to end I/O on a bio. No one should call bi_end_io() directly on a
  1597	 *   bio unless they own it and thus know that it has an end_io function.
  1598	 *
  1599	 *   bio_endio() can be called several times on a bio that has been chained
  1600	 *   using bio_chain().  The ->bi_end_io() function will only be called the
  1601	 *   last time.
  1602	 **/
  1603	void bio_endio(struct bio *bio)
  1604	{
  1605	again:
  1606		if (!bio_remaining_done(bio))
  1607			return;
  1608		if (!bio_integrity_endio(bio))
  1609			return;
  1610	
  1611		blk_zone_bio_endio(bio);
  1612	
  1613		rq_qos_done_bio(bio);
  1614	
  1615		if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
  1616			trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio);
  1617			bio_clear_flag(bio, BIO_TRACE_COMPLETION);
  1618		}
  1619	
> 1620		if (bio->bi_status && bio->bi_disk)
  1621			printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
  1622					__func__, bio->bi_status, bio->bi_disk->major,\
  1623					bio->bi_disk->first_minor);
  1624	
  1625		/*
  1626		 * Need to have a real endio function for chained bios, otherwise
  1627		 * various corner cases will break (like stacking block devices that
  1628		 * save/restore bi_end_io) - however, we want to avoid unbounded
  1629		 * recursion and blowing the stack. Tail call optimization would
  1630		 * handle this, but compiling with frame pointers also disables
  1631		 * gcc's sibling call optimization.
  1632		 */
  1633		if (bio->bi_end_io == bio_chain_endio) {
  1634			bio = __bio_chain_endio(bio);
  1635			goto again;
  1636		}
  1637	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] block:added printing when bio->bi_status fails
@ 2024-08-08  9:54 824731276
  2024-08-16  4:04 ` kernel test robot
  2024-08-16  7:45 ` Yu Kuai
  0 siblings, 2 replies; 6+ messages in thread
From: 824731276 @ 2024-08-08  9:54 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, linux-block, baiguo

From: baiguo <baiguo@kylinos.cn>

    When ftrace is not enabled and bio is not OK,
    the system cannot actively record which disk is abnormal.
    Add a message record to bio_endio.

Signed-off-by: baiguo <baiguo@kylinos.cn>
---
 block/bio.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index c4053d496..fb07589c8 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1617,6 +1617,11 @@ void bio_endio(struct bio *bio)
 		bio_clear_flag(bio, BIO_TRACE_COMPLETION);
 	}
 
+	if (bio->bi_status && bio->bi_bdev)
+		printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
+				__func__, bio->bi_status, bio->bi_bdev->bd_disk->major,\
+				bio->bi_bdev->bd_disk->first_minor);
+
 	/*
 	 * Need to have a real endio function for chained bios, otherwise
 	 * various corner cases will break (like stacking block devices that
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block:added printing when bio->bi_status fails
  2024-08-08  9:54 [PATCH] block:added printing when bio->bi_status fails 824731276
@ 2024-08-16  4:04 ` kernel test robot
  2024-08-16  7:45 ` Yu Kuai
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-08-16  4:04 UTC (permalink / raw)
  To: 824731276
  Cc: oe-lkp, lkp, linux-block, axboe, linux-kernel, baiguo,
	oliver.sang



Hello,

kernel test robot noticed "WARNING:at_fs/buffer.c:#mark_buffer_dirty" on:

commit: 0824beb1d430c30731166484b8c26e37147d4dbb ("[PATCH] block:added printing when bio->bi_status fails")
url: https://github.com/intel-lab-lkp/linux/commits/824731276-qq-com/block-added-printing-when-bio-bi_status-fails/20240808-181758
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/all/tencent_9A3345EA79C1EE9DC4464BB576C6A602A105@qq.com/
patch subject: [PATCH] block:added printing when bio->bi_status fails

in testcase: xfstests
version: xfstests-x86_64-f5ada754-1_20240812
with following parameters:

	disk: 4HDD
	fs: udf
	test: generic-081



compiler: gcc-12
test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Skylake) with 28G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202408161114.dfe9cc49-oliver.sang@intel.com


we do see a lot of print that:

[   68.449409][   T89] bio: bio_endio status is 10, disk[253:2]
[   68.454169][ T2095] bio: bio_endio status is 10, disk[253:3]
[   68.455863][   T89] bio: bio_endio status is 10, disk[253:2]
[   68.466598][   T89] bio: bio_endio status is 10, disk[253:2]
[   68.472314][   T89] bio: bio_endio status is 10, disk[253:2]

...

[   74.216172][   T89] bio: bio_endio status is 10, disk[253:2]
[   74.221903][   T89] bio: bio_endio status is 10, disk[253:2]
[   74.2:2]
[   74.348274][   T89] bio: bio_endio status is 10, disk[253:3]
[   74.356178][ T2096] ------------[ cut here ]------------
[   74.361531][ T2096] WARNING: CPU: 0 PID: 2096 at fs/buffer.c:1181 mark_buffer_dirty+0x1e6/0x240


then see below WARNING


[   72.605562][ T2097] ------------[ cut here ]------------
[ 72.610907][ T2097] WARNING: CPU: 7 PID: 2097 at fs/buffer.c:1181 mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) 
[   72.619661][ T2097] Modules linked in: dm_snapshot dm_bufio udf crc_itu_t cdrom btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp sd_mod ipmi_devintf sg ipmi_msghandler kvm_intel i915 kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_buddy ahci intel_gtt rapl mei_wdt drm_display_helper libahci intel_cstate wmi_bmof ttm mei_me i2c_i801 intel_uncore libata drm_kms_helper i2c_smbus mei intel_pch_thermal video wmi acpi_pad binfmt_misc loop fuse drm dm_mod ip_tables
[   72.671579][ T2097] CPU: 7 UID: 0 PID: 2097 Comm: xfs_io Not tainted 6.11.0-rc1-00021-g0824beb1d430 #1
[   72.680936][ T2097] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016
[ 72.689071][ T2097] RIP: 0010:mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) 
[ 72.694676][ T2097] Code: 58 c6 ff 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 75 60 48 8b 7d 00 5b be 04 00 00 00 5d e9 3a fc fc ff <0f> 0b e9 34 fe ff ff 48 89 df e8 5b 23 e5 ff e9 54 fe ff ff 48 89
All code
========
   0:	58                   	pop    %rax
   1:	c6                   	(bad)  
   2:	ff 48 89             	decl   -0x77(%rax)
   5:	ea                   	(bad)  
   6:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
   d:	fc ff df 
  10:	48 c1 ea 03          	shr    $0x3,%rdx
  14:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
  18:	75 60                	jne    0x7a
  1a:	48 8b 7d 00          	mov    0x0(%rbp),%rdi
  1e:	5b                   	pop    %rbx
  1f:	be 04 00 00 00       	mov    $0x4,%esi
  24:	5d                   	pop    %rbp
  25:	e9 3a fc fc ff       	jmpq   0xfffffffffffcfc64
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e9 34 fe ff ff       	jmpq   0xfffffffffffffe65
  31:	48 89 df             	mov    %rbx,%rdi
  34:	e8 5b 23 e5 ff       	callq  0xffffffffffe52394
  39:	e9 54 fe ff ff       	jmpq   0xfffffffffffffe92
  3e:	48                   	rex.W
  3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e9 34 fe ff ff       	jmpq   0xfffffffffffffe3b
   7:	48 89 df             	mov    %rbx,%rdi
   a:	e8 5b 23 e5 ff       	callq  0xffffffffffe5236a
   f:	e9 54 fe ff ff       	jmpq   0xfffffffffffffe68
  14:	48                   	rex.W
  15:	89                   	.byte 0x89
[   72.714238][ T2097] RSP: 0018:ffffc900033df8a0 EFLAGS: 00010246
[   72.720206][ T2097] RAX: 0000000000000001 RBX: ffff888120d219d8 RCX: ffffffff81c2b878
[   72.728093][ T2097] RDX: ffffed10241a433c RSI: 0000000000000008 RDI: ffff888120d219d8
[   72.735965][ T2097] RBP: ffff888120d219d8 R08: 0000000000000000 R09: ffffed10241a433b
[   72.743867][ T2097] R10: ffff888120d219df R11: 0000000000000008 R12: ffffed10241a4340
[   72.751752][ T2097] R13: 0000000000000004 R14: ffff8887472e8000 R15: 0000000000000948
[   72.759638][ T2097] FS:  0000000000000000(0000) GS:ffff888634180000(0000) knlGS:0000000000000000
[   72.768470][ T2097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.774945][ T2097] CR2: 000055d09397fc48 CR3: 000000075685c004 CR4: 00000000003706f0
[   72.782843][ T2097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   72.790713][ T2097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   72.798583][ T2097] Call Trace:
[   72.801752][ T2097]  <TASK>
[ 72.804567][ T2097] ? __warn (kernel/panic.c:735) 
[ 72.808515][ T2097] ? mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) 
[ 72.813514][ T2097] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
[ 72.817908][ T2097] ? handle_bug (arch/x86/kernel/traps.c:239) 
[ 72.822125][ T2097] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) 
[ 72.826692][ T2097] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
[ 72.831612][ T2097] ? mark_buffer_dirty (arch/x86/include/asm/bitops.h:213 arch/x86/include/asm/bitops.h:245 include/asm-generic/bitops/instrumented-non-atomic.h:154 include/linux/buffer_head.h:171 fs/buffer.c:1181) 
[ 72.836529][ T2097] ? mark_buffer_dirty (fs/buffer.c:1181 (discriminator 1)) 
[ 72.841536][ T2097] udf_bitmap_free_blocks (fs/udf/balloc.c:164) udf
[ 72.847326][ T2097] udf_free_blocks (fs/udf/balloc.c:662) udf
[ 72.852530][ T2097] udf_discard_prealloc (fs/udf/truncate.c:147) udf
[ 72.858161][ T2097] ? __pfx_udf_discard_prealloc (fs/udf/truncate.c:118) udf
[ 72.864303][ T2097] ? __pfx_down_write (kernel/locking/rwsem.c:1577) 
[ 72.869051][ T2097] ? __pfx_locks_remove_file (fs/locks.c:2687) 
[ 72.874412][ T2097] udf_release_file (fs/udf/file.c:185 fs/udf/file.c:174) udf
[ 72.879584][ T2097] ? security_file_release (security/security.c:2754 (discriminator 11)) 
[ 72.884757][ T2097] __fput (fs/file_table.c:422) 
[ 72.888638][ T2097] task_work_run (kernel/task_work.c:222 (discriminator 1)) 
[ 72.893108][ T2097] ? __pfx_task_work_run (kernel/task_work.c:190) 
[ 72.898101][ T2097] do_exit (kernel/exit.c:883) 
[ 72.902048][ T2097] ? __pfx_do_exit (kernel/exit.c:821) 
[ 72.906518][ T2097] ? _raw_spin_lock_irq (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:120 kernel/locking/spinlock.c:170) 
[ 72.911423][ T2097] do_group_exit (kernel/exit.c:1012) 
[ 72.915818][ T2097] get_signal (include/linux/signal.h:78 kernel/signal.c:2751) 
[ 72.920215][ T2097] ? finish_task_switch+0x495/0x750 
[ 72.925907][ T2097] ? __switch_to (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 include/linux/thread_info.h:89 include/linux/sched.h:1945 arch/x86/include/asm/fpu/sched.h:68 arch/x86/kernel/process_64.c:674) 
[ 72.930377][ T2097] ? __pfx_get_signal (kernel/signal.c:2682) 
[ 72.935108][ T2097] ? __schedule (kernel/sched/core.c:6399) 
[ 72.939576][ T2097] arch_do_signal_or_restart (arch/x86/kernel/signal.c:310) 
[ 72.945005][ T2097] ? __pfx_arch_do_signal_or_restart (arch/x86/kernel/signal.c:307) 
[ 72.951046][ T2097] syscall_exit_to_user_mode (kernel/entry/common.c:111 include/linux/entry-common.h:328 kernel/entry/common.c:207 kernel/entry/common.c:218) 
[ 72.956561][ T2097] do_syscall_64 (arch/x86/entry/common.c:102) 
[ 72.960942][ T2097] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
[   72.966720][ T2097] RIP: 0033:0x7f896a5efd32
[ 72.971012][ T2097] Code: Unable to access opcode bytes at 0x7f896a5efd08.

Code starting with the faulting instruction
===========================================
[   72.977922][ T2097] RSP: 002b:00007f896a1ffdb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000022
[   72.986232][ T2097] RAX: fffffffffffffdfe RBX: 00007f896a2006c0 RCX: 00007f896a5efd32
[   72.994101][ T2097] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
[   73.001969][ T2097] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffdb5bc88c7
[   73.009844][ T2097] R10: 00007f896a535f80 R11: 0000000000000293 R12: ffffffffffffff80
[   73.017712][ T2097] R13: 0000000000000002 R14: 00007ffdb5bc87d0 R15: 00007f8969a00000
[   73.025582][ T2097]  </TASK>
[   73.028478][ T2097] ---[ end trace 0000000000000000 ]---
[   73.034147][   T66] bio: bio_endio status is 10, disk[253:3]
[   73.034167][ T2097] bio: bio_endio status is 10, disk[253:3]
[   73.039846][   T66] Buffer I/O error on dev dm-3, logical block 259, lost async page write
[   73.045565][ T2097] Buffer I/O error on dev dm-3, logical block 128, lost async page write
[   73.053894][   T66] bio: bio_endio status is 10, disk[253:3]
[   73.067887][   T66] Buffer I/O error on dev dm-3, logical block 387, lost async page write
[   73.076268][   T66] bio: bio_endio status is 10, disk[253:3]
[   73.081976][   T66] Buffer I/O error on dev dm-3, logical block 388, lost async page write
[   73.180045][ T2097] bio: bio_endio status is 10, disk[253:3]
[   73.185743][ T2097] Buffer I/O error on dev dm-3, logical block 128, lost sync page write



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240816/202408161114.dfe9cc49-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block:added printing when bio->bi_status fails
  2024-08-08  9:54 [PATCH] block:added printing when bio->bi_status fails 824731276
  2024-08-16  4:04 ` kernel test robot
@ 2024-08-16  7:45 ` Yu Kuai
  1 sibling, 0 replies; 6+ messages in thread
From: Yu Kuai @ 2024-08-16  7:45 UTC (permalink / raw)
  To: 824731276, axboe; +Cc: linux-kernel, linux-block, baiguo, yukuai (C)

Hi,

在 2024/08/08 17:54, 824731276@qq.com 写道:
> From: baiguo <baiguo@kylinos.cn>
> 
>      When ftrace is not enabled and bio is not OK,
>      the system cannot actively record which disk is abnormal.
>      Add a message record to bio_endio.
> 
> Signed-off-by: baiguo <baiguo@kylinos.cn>
> ---
>   block/bio.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index c4053d496..fb07589c8 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1617,6 +1617,11 @@ void bio_endio(struct bio *bio)
>   		bio_clear_flag(bio, BIO_TRACE_COMPLETION);
>   	}
>   
> +	if (bio->bi_status && bio->bi_bdev)
> +		printk(KERN_ERR "bio: %s status is %d, disk[%d:%d]\n",\
> +				__func__, bio->bi_status, bio->bi_bdev->bd_disk->major,\
> +				bio->bi_bdev->bd_disk->first_minor);

I don't understand why you'll need this, bio_endio() will still be
called for unsupported bio from submit_bio_noacct() when the disk is
fine.

For real disks blk_print_req_error() already print message for failed IO
that are submitted to disk.

Thanks,
Kuai

> +
>   	/*
>   	 * Need to have a real endio function for chained bios, otherwise
>   	 * various corner cases will break (like stacking block devices that
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-16  7:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08  9:54 [PATCH] block:added printing when bio->bi_status fails 824731276
2024-08-16  4:04 ` kernel test robot
2024-08-16  7:45 ` Yu Kuai
  -- strict thread matches above, loose matches on Subject: below --
2024-08-07  9:33 824731276
2024-08-07 19:55 ` kernel test robot
2024-08-07 20:05 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).