From: Sumit Saxena <sumit.saxena@broadcom.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: RE: Application stops due to ext4 filesytsem IO error
Date: Tue, 6 Jun 2017 21:04:57 +0530 [thread overview]
Message-ID: <bf603f1c2f3873f58717101abb3d9c83@mail.gmail.com> (raw)
In-Reply-To: 3e25920f0068797bd74e5ea37a2dc3dc@mail.gmail.com
Gentle ping..
>-----Original Message-----
>From: Sumit Saxena [mailto:sumit.saxena@broadcom.com]
>Sent: Monday, June 05, 2017 12:59 PM
>To: 'Jens Axboe'
>Cc: 'linux-block@vger.kernel.org'; 'linux-scsi@vger.kernel.org'
>Subject: Application stops due to ext4 filesytsem IO error
>
>Jens,
>
>We am observing application stops while running ext4 filesystem IOs
along
>with target reset in parallel.
>Our suspect is this behavior can be attributed to linux block layer. See
below
>for details-
>
>Problem statement - " Application stops due to IO error from file system
>buffered IO. (Note - It is always a FS meta data read failure)"
>Issue is reproducible - "Yes. It is consistently reproducible."
>Brief about setup -
>Latest 4.11 kernel. Issue hits irrespective of whether SCSI MQ is enabled
or
>disabled. use_blk_mq=Y and use_blk_mq=N has similar issue.
>Direct attached 4 SAS/SATA drives connected to MegaRAID Invader
>controller.
>
>Reproduction steps -
>-Create ext4 FS on 4 JBODs(non RAID volumes) behind MegaRAID SAS
>controller.
>-Start Data integrity test on all four ext4 mounted partition. (Tool
should be
>configured to send Buffered FS IO).
>-Send Target Reset (have some delay between next reset to allow some IO
>on device) on each JBOD to simulate error condition. (sg_reset -d
/dev/sdX).
>
>End result -
>Combination of target resets and FS IOs in parallel causes application
halt
>with ext4 Filesystem IO error.
>We are able to restart application without cleaning and unmounting
>filesystem.
>Below are the error logs at the time of application stop-
>
>--------------------------
>sd 0:0:53:0: target reset called for
>scmd(ffff88003cf25148)
>sd 0:0:53:0: attempting target reset!
>scmd(ffff88003cf25148) tm_dev_handle 0xb
>sd 0:0:53:0: [sde] tag#519 BRCM Debug: request->cmd_flags: 0x80700 bio-
>>bi_flags: 0x2 bio->bi_opf: 0x3000 rq_flags 0x20e3
>..
>sd 0:0:53:0: [sde] tag#519 CDB: Read(10) 28 00 15 00 11 10 00 00 f8 00
>EXT4-fs error (device sde): __ext4_get_inode_loc:4465: inode #11018287:
>block 44040738: comm chaos: unable to read itable block
>-----------------------
>
>We debug further to understand what is happening above LLD. See below-
>
>During target reset, there may be IO coming from target with CHECK
>CONDITION with below sense information-.
>Sense Key : Aborted Command [current]
>Add. Sense: No additional sense information
>
>Such Aborted command should be retried by SML/Block layer. This happens
>from SML expect for FS Meta data read.
>>From driver level debug, we found IOs with REQ_FAILFAST_DEV bit set in
>scmd->request->cmd_flags are not retried by SML and that is also as
>expected.
>
>Below is the code in scsi_error.c(function- scsi_noretry_cmd) which
causes
>IOs with REQ_FAILFAST_DEV enabled not getting retried bit completed back
>to upper layer-
>--------
>/*
> * assume caller has checked sense and determined
> * the check condition was retryable.
> */
> if (scmd->request->cmd_flags & REQ_FAILFAST_DEV ||
> scmd->request->cmd_type == REQ_TYPE_BLOCK_PC)
> return 1;
> else
> return 0;
>--------
>
>IO which causes application to stop has REQ_FAILFAST_DEV enabled inside
>"scmd->request->cmd_flags". We noticed that this bit will be set for
>filesystem Read ahead meta data IOs. In order to confirm the same, we
>mounted with option inode_readahead_blks=0 to disable ext4's inode table
>readahead algorithm and did not observe the issue. Issue does not hit
with
>DIRECT IOs but only with cached/buffered IOs.
>
>2. From driver level debug prints, we also noticed - There are many IO
>failures with REQ_FAILFAST_DEV handled gracefully by filesystem.
>Application level failure happens only If IO has RQF_MIXED_MERGE set.
>If IO merging is disabled through sysfs parameter for SCSI device in
question-
>nomerges set to 2, we are not seeing the issue.
>
>3. We added few prints in driver to dump "scmd->request->cmd_flags" and
>"scmd->request->rq_flags" for IOs completed with CHECK CONDITION and
>culprit IOs has all these bits- REQ_FAILFAST_DEV and REQ_RAHEAD bit set
in
>"scmd->request->cmd_flags" and RQF_MIXED_MERGE bit set in "scmd-
>>request->rq_flags". Also it's not necessarily true that all IOs with
these
>three bits set will cause issue but whenever issue hits, these three bits
are
>set for IO causing failure.
>
>
>In summary,
>FS mechanism of using READ AHEAD for meta data works fine (in case of IO
>failure) if there is no mix/merge at block layer.
>FS mechanism of using READ AHEAD for meta data has some corner case
>which is not handled properly (in case of IO failure) if there was
mix/merge
>at block layer.
>megaraid_sas driver's behavior seems correct here. Aborted IO goes to SML
>with CHECK CONDITION settings and SML decided to fail fast IO as it was
>requested.
>
>Query - Is this block layer (page cache) issue? What should be the
ideal fix ?
>
>Thanks,
>Sumit
WARNING: multiple messages have this Message-ID (diff)
From: Sumit Saxena <sumit.saxena@broadcom.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: RE: Application stops due to ext4 filesytsem IO error
Date: Tue, 6 Jun 2017 21:04:57 +0530 [thread overview]
Message-ID: <bf603f1c2f3873f58717101abb3d9c83@mail.gmail.com> (raw)
In-Reply-To: 3e25920f0068797bd74e5ea37a2dc3dc@mail.gmail.com
Gentle ping..
>-----Original Message-----
>From: Sumit Saxena [mailto:sumit.saxena@broadcom.com]
>Sent: Monday, June 05, 2017 12:59 PM
>To: 'Jens Axboe'
>Cc: 'linux-block@vger.kernel.org'; 'linux-scsi@vger.kernel.org'
>Subject: Application stops due to ext4 filesytsem IO error
>
>Jens,
>
>We am observing application stops while running ext4 filesystem IOs
along
>with target reset in parallel.
>Our suspect is this behavior can be attributed to linux block layer. See
below
>for details-
>
>Problem statement - " Application stops due to IO error from file system
>buffered IO. (Note - It is always a FS meta data read failure)"
>Issue is reproducible - "Yes. It is consistently reproducible."
>Brief about setup -
>Latest 4.11 kernel. Issue hits irrespective of whether SCSI MQ is enabled
or
>disabled. use_blk_mq=Y and use_blk_mq=N has similar issue.
>Direct attached 4 SAS/SATA drives connected to MegaRAID Invader
>controller.
>
>Reproduction steps -
>-Create ext4 FS on 4 JBODs(non RAID volumes) behind MegaRAID SAS
>controller.
>-Start Data integrity test on all four ext4 mounted partition. (Tool
should be
>configured to send Buffered FS IO).
>-Send Target Reset (have some delay between next reset to allow some IO
>on device) on each JBOD to simulate error condition. (sg_reset -d
/dev/sdX).
>
>End result -
>Combination of target resets and FS IOs in parallel causes application
halt
>with ext4 Filesystem IO error.
>We are able to restart application without cleaning and unmounting
>filesystem.
>Below are the error logs at the time of application stop-
>
>--------------------------
>sd 0:0:53:0: target reset called for
>scmd(ffff88003cf25148)
>sd 0:0:53:0: attempting target reset!
>scmd(ffff88003cf25148) tm_dev_handle 0xb
>sd 0:0:53:0: [sde] tag#519 BRCM Debug: request->cmd_flags: 0x80700 bio-
>>bi_flags: 0x2 bio->bi_opf: 0x3000 rq_flags 0x20e3
>..
>sd 0:0:53:0: [sde] tag#519 CDB: Read(10) 28 00 15 00 11 10 00 00 f8 00
>EXT4-fs error (device sde): __ext4_get_inode_loc:4465: inode #11018287:
>block 44040738: comm chaos: unable to read itable block
>-----------------------
>
>We debug further to understand what is happening above LLD. See below-
>
>During target reset, there may be IO coming from target with CHECK
>CONDITION with below sense information-.
>Sense Key : Aborted Command [current]
>Add. Sense: No additional sense information
>
>Such Aborted command should be retried by SML/Block layer. This happens
>from SML expect for FS Meta data read.
>From driver level debug, we found IOs with REQ_FAILFAST_DEV bit set in
>scmd->request->cmd_flags are not retried by SML and that is also as
>expected.
>
>Below is the code in scsi_error.c(function- scsi_noretry_cmd) which
causes
>IOs with REQ_FAILFAST_DEV enabled not getting retried bit completed back
>to upper layer-
>--------
>/*
> * assume caller has checked sense and determined
> * the check condition was retryable.
> */
> if (scmd->request->cmd_flags & REQ_FAILFAST_DEV ||
> scmd->request->cmd_type == REQ_TYPE_BLOCK_PC)
> return 1;
> else
> return 0;
>--------
>
>IO which causes application to stop has REQ_FAILFAST_DEV enabled inside
>"scmd->request->cmd_flags". We noticed that this bit will be set for
>filesystem Read ahead meta data IOs. In order to confirm the same, we
>mounted with option inode_readahead_blks=0 to disable ext4's inode table
>readahead algorithm and did not observe the issue. Issue does not hit
with
>DIRECT IOs but only with cached/buffered IOs.
>
>2. From driver level debug prints, we also noticed - There are many IO
>failures with REQ_FAILFAST_DEV handled gracefully by filesystem.
>Application level failure happens only If IO has RQF_MIXED_MERGE set.
>If IO merging is disabled through sysfs parameter for SCSI device in
question-
>nomerges set to 2, we are not seeing the issue.
>
>3. We added few prints in driver to dump "scmd->request->cmd_flags" and
>"scmd->request->rq_flags" for IOs completed with CHECK CONDITION and
>culprit IOs has all these bits- REQ_FAILFAST_DEV and REQ_RAHEAD bit set
in
>"scmd->request->cmd_flags" and RQF_MIXED_MERGE bit set in "scmd-
>>request->rq_flags". Also it's not necessarily true that all IOs with
these
>three bits set will cause issue but whenever issue hits, these three bits
are
>set for IO causing failure.
>
>
>In summary,
>FS mechanism of using READ AHEAD for meta data works fine (in case of IO
>failure) if there is no mix/merge at block layer.
>FS mechanism of using READ AHEAD for meta data has some corner case
>which is not handled properly (in case of IO failure) if there was
mix/merge
>at block layer.
>megaraid_sas driver's behavior seems correct here. Aborted IO goes to SML
>with CHECK CONDITION settings and SML decided to fail fast IO as it was
>requested.
>
>Query - Is this block layer (page cache) issue? What should be the
ideal fix ?
>
>Thanks,
>Sumit
next reply other threads:[~2017-06-06 15:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-06 15:34 Sumit Saxena [this message]
2017-06-06 15:34 ` Application stops due to ext4 filesytsem IO error Sumit Saxena
-- strict thread matches above, loose matches on Subject: below --
2017-06-13 13:31 Sumit Saxena
2017-06-13 13:31 ` Sumit Saxena
2017-06-05 7:28 Sumit Saxena
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf603f1c2f3873f58717101abb3d9c83@mail.gmail.com \
--to=sumit.saxena@broadcom.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.