All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>
Subject: Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
Date: Sat, 21 Oct 2017 19:21:59 +0200	[thread overview]
Message-ID: <3006499.SSja2RFHbQ@natalenko.name> (raw)
In-Reply-To: <1508340432.2540.4.camel@wdc.com>

Well,

I've cherry-picked this series for current upstream/master branch, and got=
=20
this while performing another suspend try:

=3D=3D=3D
[   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refus=
ing=20
to freeze, wq_busy=3D0):
[   62.421150] xfsaild/dm-7    D    0   289      2 0x80000000
[   62.425800] Call Trace:
[   62.428902]  __schedule+0x239/0x870
[   62.431834]  schedule+0x33/0x90
[   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
[   62.438767]  ? schedule_timeout+0x188/0x390
[   62.443592]  ? wake_up_q+0x80/0x80
[   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
[   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
[   62.453217]  xfsaild+0x18d/0x780 [xfs]
[   62.456717]  kthread+0x124/0x140
[   62.459237]  ? kthread+0x124/0x140
[   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[   62.465146]  ? kthread_create_on_node+0x70/0x70
[   62.467331]  ret_from_fork+0x25/0x30
[   62.474386] Restarting kernel threads ... done.
=3D=3D=3D

After this it looks like the system tried to freeze anyway:

=3D=3D=3D
[   62.478290] OOM killer enabled.
[   62.481711] Restarting tasks ... done.
[   62.488931] PM: suspend exit
[   62.491497] PM: suspend entry (s2idle)
[   62.493445] PM: Syncing filesystems ... done.
[   63.774220] Freezing user space processes ... (elapsed 0.001 seconds) do=
ne.
[   63.782707] OOM killer disabled.
[   63.785226] Freezing remaining freezable tasks ... (elapsed 0.001 second=
s)=20
done.
[   63.861548] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[   63.868153] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   63.868463] sd 1:0:0:0: [sdb] Stopping disk
[   63.873216] sd 0:0:0:0: [sda] Stopping disk
=3D=3D=3D

but got hung completely. After some time hung task was detected:

=3D=3D=3D
[  247.531069] INFO: task systemd-sleep:663 blocked for more than 120 secon=
ds.
[  247.535307]       Not tainted 4.14.0-pf0 #1
[  247.537820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables=
=20
this message.
[  247.541015] systemd-sleep   D    0   663      1 0x00000000
[  247.542706] Call Trace:
[  247.543386]  __schedule+0x239/0x870
[  247.544351]  schedule+0x33/0x90
[  247.545197]  suspend_devices_and_enter+0x61b/0x890
[  247.546539]  ? wait_woken+0x80/0x80
[  247.547517]  pm_suspend+0x340/0x3b0
[  247.548550]  state_store+0x5a/0x90
[  247.549646]  kobj_attr_store+0xf/0x20
[  247.550649]  sysfs_kf_write+0x37/0x40
[  247.551640]  kernfs_fop_write+0x11c/0x1a0
[  247.552708]  __vfs_write+0x37/0x150
[  247.553641]  ? SYSC_newfstat+0x44/0x70
[  247.554628]  vfs_write+0xb1/0x1a0
[  247.555509]  SyS_write+0x55/0xc0
[  247.556366]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  247.557667] RIP: 0033:0x7f56b74ec8d4
[  247.558616] RSP: 002b:00007fff141c7738 EFLAGS: 00000246 ORIG_RAX:=20
0000000000000001
[  247.560667] RAX: ffffffffffffffda RBX: 000055dd61863290 RCX:=20
00007f56b74ec8d4
[  247.562639] RDX: 0000000000000007 RSI: 000055dd61864eb0 RDI:=20
0000000000000004
[  247.564874] RBP: 00007f56b77b3240 R08: 000055dd61863370 R09:=20
00007f56b79c88c0
[  247.566875] R10: 000000000000000a R11: 0000000000000246 R12:=20
0000000000000000
[  247.569213] R13: 000055dd61863290 R14: 000055dd61863d08 R15:=20
00000000ffffffea
=3D=3D=3D

P.S. Current Ming's series is enough for 4.13 to not experience any issues=
=20
like this.

On st=C5=99eda 18. =C5=99=C3=ADjna 2017 17:27:14 CEST Bart Van Assche wrote:
> I think this version (v10) has significant advantages over the most recent
> patch series posted by Ming Lei to address suspend, resume and SPI domain
> validation. So it would be appreciated if you could switch to this series
> for testing suspend and resume.

WARNING: multiple messages have this Message-ID (diff)
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>
Subject: Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
Date: Sat, 21 Oct 2017 19:21:59 +0200	[thread overview]
Message-ID: <3006499.SSja2RFHbQ@natalenko.name> (raw)
In-Reply-To: <1508340432.2540.4.camel@wdc.com>

Well,

I've cherry-picked this series for current upstream/master branch, and got 
this while performing another suspend try:

===
[   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing 
to freeze, wq_busy=0):
[   62.421150] xfsaild/dm-7    D    0   289      2 0x80000000
[   62.425800] Call Trace:
[   62.428902]  __schedule+0x239/0x870
[   62.431834]  schedule+0x33/0x90
[   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
[   62.438767]  ? schedule_timeout+0x188/0x390
[   62.443592]  ? wake_up_q+0x80/0x80
[   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
[   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
[   62.453217]  xfsaild+0x18d/0x780 [xfs]
[   62.456717]  kthread+0x124/0x140
[   62.459237]  ? kthread+0x124/0x140
[   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[   62.465146]  ? kthread_create_on_node+0x70/0x70
[   62.467331]  ret_from_fork+0x25/0x30
[   62.474386] Restarting kernel threads ... done.
===

After this it looks like the system tried to freeze anyway:

===
[   62.478290] OOM killer enabled.
[   62.481711] Restarting tasks ... done.
[   62.488931] PM: suspend exit
[   62.491497] PM: suspend entry (s2idle)
[   62.493445] PM: Syncing filesystems ... done.
[   63.774220] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   63.782707] OOM killer disabled.
[   63.785226] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[   63.861548] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[   63.868153] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   63.868463] sd 1:0:0:0: [sdb] Stopping disk
[   63.873216] sd 0:0:0:0: [sda] Stopping disk
===

but got hung completely. After some time hung task was detected:

===
[  247.531069] INFO: task systemd-sleep:663 blocked for more than 120 seconds.
[  247.535307]       Not tainted 4.14.0-pf0 #1
[  247.537820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  247.541015] systemd-sleep   D    0   663      1 0x00000000
[  247.542706] Call Trace:
[  247.543386]  __schedule+0x239/0x870
[  247.544351]  schedule+0x33/0x90
[  247.545197]  suspend_devices_and_enter+0x61b/0x890
[  247.546539]  ? wait_woken+0x80/0x80
[  247.547517]  pm_suspend+0x340/0x3b0
[  247.548550]  state_store+0x5a/0x90
[  247.549646]  kobj_attr_store+0xf/0x20
[  247.550649]  sysfs_kf_write+0x37/0x40
[  247.551640]  kernfs_fop_write+0x11c/0x1a0
[  247.552708]  __vfs_write+0x37/0x150
[  247.553641]  ? SYSC_newfstat+0x44/0x70
[  247.554628]  vfs_write+0xb1/0x1a0
[  247.555509]  SyS_write+0x55/0xc0
[  247.556366]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  247.557667] RIP: 0033:0x7f56b74ec8d4
[  247.558616] RSP: 002b:00007fff141c7738 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  247.560667] RAX: ffffffffffffffda RBX: 000055dd61863290 RCX: 
00007f56b74ec8d4
[  247.562639] RDX: 0000000000000007 RSI: 000055dd61864eb0 RDI: 
0000000000000004
[  247.564874] RBP: 00007f56b77b3240 R08: 000055dd61863370 R09: 
00007f56b79c88c0
[  247.566875] R10: 000000000000000a R11: 0000000000000246 R12: 
0000000000000000
[  247.569213] R13: 000055dd61863290 R14: 000055dd61863d08 R15: 
00000000ffffffea
===

P.S. Current Ming's series is enough for 4.13 to not experience any issues 
like this.

On středa 18. října 2017 17:27:14 CEST Bart Van Assche wrote:
> I think this version (v10) has significant advantages over the most recent
> patch series posted by Ming Lei to address suspend, resume and SPI domain
> validation. So it would be appreciated if you could switch to this series
> for testing suspend and resume.

  reply	other threads:[~2017-10-21 17:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche
2017-10-17 23:28 ` [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Jens Axboe
2017-10-17 23:40   ` Bart Van Assche
2017-10-17 23:40     ` Bart Van Assche
2017-10-18  1:47     ` Jens Axboe
2017-10-18  5:02   ` Oleksandr Natalenko
2017-10-18  5:02     ` Oleksandr Natalenko
2017-10-18 15:27     ` Bart Van Assche
2017-10-18 15:27       ` Bart Van Assche
2017-10-21 17:21       ` Oleksandr Natalenko [this message]
2017-10-21 17:21         ` Oleksandr Natalenko
2017-10-21 17:59         ` Bart Van Assche
2017-10-21 17:59           ` Bart Van Assche
2017-10-21 18:31           ` Bart Van Assche
2017-10-21 18:31             ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3006499.SSja2RFHbQ@natalenko.name \
    --to=oleksandr@natalenko.name \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.