[Bug 200835] New: XFS hangs in xfs_reclaim

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 200835] New: XFS hangs in xfs_reclaim_inode()
@ 2018-08-16 16:03 bugzilla-daemon
  2018-08-16 16:04 ` [Bug 200835] " bugzilla-daemon
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-16 16:03 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

            Bug ID: 200835
           Summary: XFS hangs in xfs_reclaim_inode()
           Product: File System
           Version: 2.5
    Kernel Version: 4.14.62
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: XFS
          Assignee: filesystem_xfs@kernel-bugs.kernel.org
          Reporter: peter.klotz99@gmail.com
        Regression: No

Created attachment 277895
  --> https://bugzilla.kernel.org/attachment.cgi?id=277895&action=edit
Hang of kernel 4.14.62

The attached file shows backtraces that occur in quick succession and
ultimately lead to a complete server hang.

The backtraces seem related to inode reclamation.

For example the first backtrace:

Aug 16 02:33:30 hpmicroserver kernel: INFO: task khugepaged:28 blocked for more
than 120 seconds.
Aug 16 02:33:30 hpmicroserver kernel:       Not tainted 4.14.62-1-lts #1
Aug 16 02:33:30 hpmicroserver kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 16 02:33:30 hpmicroserver kernel: khugepaged      D    0    28      2
0x80000000
Aug 16 02:33:30 hpmicroserver kernel: Call Trace:
Aug 16 02:33:30 hpmicroserver kernel:  ? __schedule+0x284/0x860
Aug 16 02:33:30 hpmicroserver kernel:  schedule+0x28/0x80
Aug 16 02:33:30 hpmicroserver kernel:  schedule_timeout+0x292/0x370
Aug 16 02:33:30 hpmicroserver kernel:  ? check_preempt_curr+0x62/0x90
Aug 16 02:33:30 hpmicroserver kernel:  wait_for_completion+0xaf/0x140
Aug 16 02:33:30 hpmicroserver kernel:  ? wake_up_q+0x70/0x70
Aug 16 02:33:30 hpmicroserver kernel:  flush_work+0x116/0x1d0
Aug 16 02:33:30 hpmicroserver kernel:  ? worker_detach_from_pool+0xa0/0xa0
Aug 16 02:33:30 hpmicroserver kernel:  xlog_cil_force_lsn+0x78/0x210 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  ? enqueue_task_fair+0x5a/0x500
Aug 16 02:33:30 hpmicroserver kernel:  ? native_sched_clock+0x37/0x90
Aug 16 02:33:30 hpmicroserver kernel:  ? __switch_to_asm+0x40/0x70
Aug 16 02:33:30 hpmicroserver kernel:  _xfs_log_force_lsn+0x71/0x340 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  ? try_to_wake_up+0x54/0x4b0
Aug 16 02:33:30 hpmicroserver kernel:  ? update_group_capacity+0x27/0x1e0
Aug 16 02:33:30 hpmicroserver kernel:  ? xfs_reclaim_inode+0xe3/0x340 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  __xfs_iunpin_wait+0xa7/0x160 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  ? bit_waitqueue+0x30/0x30
Aug 16 02:33:30 hpmicroserver kernel:  xfs_reclaim_inode+0xe3/0x340 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  xfs_reclaim_inodes_ag+0x1b1/0x300 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  super_cache_scan+0x152/0x1a0
Aug 16 02:33:30 hpmicroserver kernel:  shrink_slab.part.45+0x1e8/0x3c0
Aug 16 02:33:30 hpmicroserver kernel:  shrink_node+0x123/0x310
Aug 16 02:33:30 hpmicroserver kernel:  do_try_to_free_pages+0xc3/0x330
Aug 16 02:33:30 hpmicroserver kernel:  try_to_free_pages+0xf4/0x1b0
Aug 16 02:33:30 hpmicroserver kernel:  __alloc_pages_slowpath+0x3e4/0xd80
Aug 16 02:33:30 hpmicroserver kernel:  ? __switch_to+0x170/0x4b0
Aug 16 02:33:30 hpmicroserver kernel:  ? __switch_to_asm+0x34/0x70
Aug 16 02:33:30 hpmicroserver kernel:  ? __switch_to_asm+0x34/0x70
Aug 16 02:33:30 hpmicroserver kernel:  ? __switch_to_asm+0x40/0x70
Aug 16 02:33:30 hpmicroserver kernel:  __alloc_pages_nodemask+0x226/0x240
Aug 16 02:33:30 hpmicroserver kernel:  khugepaged_alloc_page+0x17/0x50
Aug 16 02:33:30 hpmicroserver kernel:  khugepaged+0xbcb/0x2120
Aug 16 02:33:30 hpmicroserver kernel:  ? wait_woken+0x80/0x80
Aug 16 02:33:30 hpmicroserver kernel:  ? collapse_shmem+0xb90/0xb90
Aug 16 02:33:30 hpmicroserver kernel:  kthread+0x119/0x130
Aug 16 02:33:30 hpmicroserver kernel:  ? __kthread_parkme+0xa0/0xa0
Aug 16 02:33:30 hpmicroserver kernel:  ret_from_fork+0x22/0x40


The problem has occurred twice so far. Once in kernel 4.14.52 and once in
4.14.62. I am using the Arch Linux LTS kernel so it is more or less the vanilla
kernel.

The server has two XFS filesystems. One is a 30TB XFS/LUKS/RAID setup, the
other one is a plain 4TB XFS volume.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
@ 2018-08-16 16:04 ` bugzilla-daemon
  2018-08-17  2:36 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-16 16:04 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

--- Comment #1 from Peter Klotz (peter.klotz99@gmail.com) ---
Created attachment 277897
  --> https://bugzilla.kernel.org/attachment.cgi?id=277897&action=edit
Hang of kernel 4.14.52

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
  2018-08-16 16:04 ` [Bug 200835] " bugzilla-daemon
@ 2018-08-17  2:36 ` bugzilla-daemon
  2018-08-17  5:32 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17  2:36 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

Dave Chinner (david@fromorbit.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |david@fromorbit.com

--- Comment #2 from Dave Chinner (david@fromorbit.com) ---
Memory reclaim is stuck waiting for log forces to complete. The log force is
stuck in both cases waiting for iclog IO completion here:

Aug 04 04:33:45 hpmicroserver kernel: INFO: task kworker/1:2:30142 blocked for
more than 120 seconds.
Aug 04 04:33:45 hpmicroserver kernel:       Not tainted 4.14.52-1-lts #1
Aug 04 04:33:45 hpmicroserver kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 04 04:33:45 hpmicroserver kernel: kworker/1:2     D    0 30142      2
0x80000000
Aug 04 04:33:45 hpmicroserver kernel: Workqueue: xfs-cil/dm-4
xlog_cil_push_work [xfs]
Aug 04 04:33:45 hpmicroserver kernel: Call Trace:
Aug 04 04:33:45 hpmicroserver kernel:  ? __schedule+0x284/0x870
Aug 04 04:33:45 hpmicroserver kernel:  ? wake_up_q+0x70/0x70
Aug 04 04:33:45 hpmicroserver kernel:  schedule+0x28/0x80
Aug 04 04:33:45 hpmicroserver kernel:  xlog_state_get_iclog_space+0x105/0x2c0
[xfs]
Aug 04 04:33:45 hpmicroserver kernel:  ? wake_up_q+0x70/0x70
Aug 04 04:33:45 hpmicroserver kernel:  xlog_write+0x163/0x6e0 [xfs]
Aug 04 04:33:45 hpmicroserver kernel:  xlog_cil_push+0x21d/0x3c0 [xfs]
Aug 04 04:33:45 hpmicroserver kernel:  process_one_work+0x1e9/0x3b0
Aug 04 04:33:45 hpmicroserver kernel:  worker_thread+0x21f/0x3f0
Aug 04 04:33:45 hpmicroserver kernel:  ? pwq_unbound_release_workfn+0xc0/0xc0
Aug 04 04:33:45 hpmicroserver kernel:  kthread+0x119/0x130
Aug 04 04:33:45 hpmicroserver kernel:  ? __kthread_parkme+0xa0/0xa0
Aug 04 04:33:45 hpmicroserver kernel:  ret_from_fork+0x22/0x40

This indicates the log force is in progress, and it is waiting for log buffer
space to become available to continue writing the in-memory journal to disk.
This means that all log buffers are currently under IO, and the journal is
waiting for IO completion to occur before it can issue more IO to the journal
to make further progress.

This typically occurs when there is an IO problem in the layers below XFS.
Without seeing the full logs, or knowing any other details about your system,
hardware or storage I can't say any more than this.

-Dave.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
  2018-08-16 16:04 ` [Bug 200835] " bugzilla-daemon
  2018-08-17  2:36 ` bugzilla-daemon
@ 2018-08-17  5:32 ` bugzilla-daemon
  2018-08-17  5:34 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17  5:32 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

--- Comment #3 from Peter Klotz (peter.klotz99@gmail.com) ---
Hello Dave

Thanks for the reply. I will upload a complete journald leading to one of the
crashes.

It seems the problem now occurs on a daily basis. The trigger is the deletion
of old backups.

I have a directory structure of this kind:

Backups
 Machine1
  Day1
  Day2
  ...
 Machine2
  Day1
  Day2
  ...

The files in the daily directories are hardlinked, so a minimum amount of space
is wasted. However subdirectories still consume inodes, so lots of them are in
use.

[root@hpmicroserver ~]# df -i
Filesystem                Inodes    IUsed      IFree IUse% Mounted on
/dev/mapper/halvar     390701504 12805954  377895550    4% /mnt/halvar
/dev/mapper/faxe      2929890048 24276605 2905613443    1% /mnt/faxe

The only unusual thing I had to configure is "libata.force=noncq" since the
SATA controller had otherwise problems when talking to the SSD.

Any help (maybe a workaround) is appreciated.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-08-17  5:32 ` bugzilla-daemon
@ 2018-08-17  5:34 ` bugzilla-daemon
  2018-08-17  8:43 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17  5:34 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

--- Comment #4 from Peter Klotz (peter.klotz99@gmail.com) ---
Created attachment 277905
  --> https://bugzilla.kernel.org/attachment.cgi?id=277905&action=edit
journald output leading to the 4.14.62 hang

The output covers everything from booting the server to the XFS backtraces.

The delete_backups.sh script is called 3 minutes before the server is stuck.
There was no other activity on the server.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-08-17  5:34 ` bugzilla-daemon
@ 2018-08-17  8:43 ` bugzilla-daemon
  2018-08-17 11:55 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17  8:43 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

--- Comment #5 from Dave Chinner (david@fromorbit.com) ---
Ok, so the hung task warnings should up 3m30s after the delete script starts,
then there's a second, smaller set almost exactly 120s after the first which is
a repeat of some of the warnings from the first set that had not resolved
themselves. 

The thing I note is that the log push that is "hung" waiting for journal buffer
space is reported in the first set of warnings but not the second set, andthe
second set only contains 2 tasks, not the 7 that are in the first set. Further,
I note that kcryptd (i.e. dm-crypt) is one of the tasks that is hung, so
there's an encrypted filesystem configured - is it the XFS filesystem files are
being deleted from?

Finally, after the second set of warnings, there are no more warnings, so
whatever is occurred is temporary and the filesystem is not actually hung. i.e.
there's no direct evidence in that trace that there was a complete system hang.
However, there is evidence of a potential problem if your XFS filesystem is
hosted on dm-crypt volumes.

i.e. this:

Aug 16 02:33:30 hpmicroserver kernel: Workqueue: kcryptd kcryptd_crypt
[dm_crypt]
Aug 16 02:33:30 hpmicroserver kernel: Call Trace:
Aug 16 02:33:30 hpmicroserver kernel:  ? __schedule+0x284/0x860
Aug 16 02:33:30 hpmicroserver kernel:  schedule+0x28/0x80
Aug 16 02:33:30 hpmicroserver kernel:  schedule_timeout+0x292/0x370
Aug 16 02:33:30 hpmicroserver kernel:  ? check_preempt_curr+0x62/0x90
Aug 16 02:33:30 hpmicroserver kernel:  wait_for_completion+0xaf/0x140
Aug 16 02:33:30 hpmicroserver kernel:  ? wake_up_q+0x70/0x70
Aug 16 02:33:30 hpmicroserver kernel:  flush_work+0x116/0x1d0
Aug 16 02:33:30 hpmicroserver kernel:  ? worker_detach_from_pool+0xa0/0xa0
Aug 16 02:33:30 hpmicroserver kernel:  xlog_cil_force_lsn+0x78/0x210 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  _xfs_log_force_lsn+0x71/0x340 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  ? xfs_reclaim_inode+0xe3/0x340 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  __xfs_iunpin_wait+0xa7/0x160 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  ? bit_waitqueue+0x30/0x30
Aug 16 02:33:30 hpmicroserver kernel:  xfs_reclaim_inode+0xe3/0x340 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  xfs_reclaim_inodes_ag+0x1b1/0x300 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
Aug 16 02:33:30 hpmicroserver kernel:  super_cache_scan+0x152/0x1a0
Aug 16 02:33:30 hpmicroserver kernel:  shrink_slab.part.45+0x1e8/0x3c0
Aug 16 02:33:30 hpmicroserver kernel:  shrink_node+0x123/0x310
Aug 16 02:33:30 hpmicroserver kernel:  do_try_to_free_pages+0xc3/0x330
Aug 16 02:33:30 hpmicroserver kernel:  try_to_free_pages+0xf4/0x1b0
Aug 16 02:33:30 hpmicroserver kernel:  __alloc_pages_slowpath+0x3e4/0xd80
Aug 16 02:33:30 hpmicroserver kernel:  __alloc_pages_nodemask+0x226/0x240
Aug 16 02:33:30 hpmicroserver kernel:  new_slab+0x2f3/0x620
Aug 16 02:33:30 hpmicroserver kernel:  ___slab_alloc+0x322/0x4a0
Aug 16 02:33:30 hpmicroserver kernel:  ? __alloc_pages_slowpath+0xd4d/0xd80
Aug 16 02:33:30 hpmicroserver kernel:  ? init_crypt+0x7f/0xd0 [xts]
Aug 16 02:33:30 hpmicroserver kernel:  __slab_alloc+0x1c/0x30
Aug 16 02:33:30 hpmicroserver kernel:  __kmalloc+0x18e/0x1f0
Aug 16 02:33:30 hpmicroserver kernel:  init_crypt+0x7f/0xd0 [xts]
Aug 16 02:33:30 hpmicroserver kernel:  encrypt+0x15/0x20 [xts]
Aug 16 02:33:30 hpmicroserver kernel:  crypt_convert+0x954/0xec0 [dm_crypt]
Aug 16 02:33:30 hpmicroserver kernel:  ? bio_alloc_bioset+0x132/0x1e0
Aug 16 02:33:30 hpmicroserver kernel:  kcryptd_crypt+0x2b8/0x370 [dm_crypt]
Aug 16 02:33:30 hpmicroserver kernel:  process_one_work+0x1e9/0x3b0
Aug 16 02:33:30 hpmicroserver kernel:  worker_thread+0x2b/0x3f0
Aug 16 02:33:30 hpmicroserver kernel:  ? pwq_unbound_release_workfn+0xc0/0xc0
Aug 16 02:33:30 hpmicroserver kernel:  kthread+0x119/0x130
Aug 16 02:33:30 hpmicroserver kernel:  ? __kthread_parkme+0xa0/0xa0
Au

This appears to be a potential deadlock via incorrect memory allocation
contexts in dm-crypt. i.e. the crypto code it uses is doing GFP_KERNEL
allocations while setting up the encryption context which allows it to get
stuck in a filesystem that can't make progress until the encryption completes.
. i.e. the dm-crypt/crypto allocation context should probably be GFP_NOIO to
prevent memory reclaim recursion into contexts that might be already be
dependent on dm-crypt making progress (i.e. filesystems)....

This isn't really looking like an XFS issue at this point....

-Dave.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-08-17  8:43 ` bugzilla-daemon
@ 2018-08-17 11:55 ` bugzilla-daemon
  2018-08-17 13:06 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17 11:55 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

--- Comment #6 from Peter Klotz (peter.klotz99@gmail.com) ---
Hello Dave

You are right! Both XFS filesystems are on top of dm-crypt:

 * 30TB: XFS/dm-crypt/dm-raid
 *  4TB: XFS/dm-crypt

So everytime a hang occurred, dm-crypt was definitely involved.

Do you know any dm-crypt developers that could be CCed so that the
investigation can continue? Or should I change Product/Component of the bug?

Regards, Peter.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-08-17 11:55 ` bugzilla-daemon
@ 2018-08-17 13:06 ` bugzilla-daemon
  2018-08-17 13:25 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17 13:06 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

Eric Sandeen (sandeen@sandeen.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandeen@sandeen.net

--- Comment #7 from Eric Sandeen (sandeen@sandeen.net) ---
I'll poke some dm people here @ red hat.

Thanks,
-Eric

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-08-17 13:06 ` bugzilla-daemon
@ 2018-08-17 13:25 ` bugzilla-daemon
  2018-08-17 14:04 ` bugzilla-daemon
  2018-08-17 14:06 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17 13:25 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

Mike Snitzer (snitzer@redhat.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |snitzer@redhat.com

--- Comment #8 from Mike Snitzer (snitzer@redhat.com) ---
(In reply to Dave Chinner from comment #5)

> Finally, after the second set of warnings, there are no more warnings, so
> whatever is occurred is temporary and the filesystem is not actually hung.
> i.e. there's no direct evidence in that trace that there was a complete
> system hang. However, there is evidence of a potential problem if your XFS
> filesystem is hosted on dm-crypt volumes.
> 
> i.e. this:
> 
> Aug 16 02:33:30 hpmicroserver kernel: Workqueue: kcryptd kcryptd_crypt
> [dm_crypt]
> Aug 16 02:33:30 hpmicroserver kernel: Call Trace:
...
> Aug 16 02:33:30 hpmicroserver kernel:  ? init_crypt+0x7f/0xd0 [xts]
> Aug 16 02:33:30 hpmicroserver kernel:  __slab_alloc+0x1c/0x30
> Aug 16 02:33:30 hpmicroserver kernel:  __kmalloc+0x18e/0x1f0
> Aug 16 02:33:30 hpmicroserver kernel:  init_crypt+0x7f/0xd0 [xts]
> Aug 16 02:33:30 hpmicroserver kernel:  encrypt+0x15/0x20 [xts]
> Aug 16 02:33:30 hpmicroserver kernel:  crypt_convert+0x954/0xec0 [dm_crypt]
> Aug 16 02:33:30 hpmicroserver kernel:  ? bio_alloc_bioset+0x132/0x1e0
> Aug 16 02:33:30 hpmicroserver kernel:  kcryptd_crypt+0x2b8/0x370 [dm_crypt]
> Aug 16 02:33:30 hpmicroserver kernel:  process_one_work+0x1e9/0x3b0
> Aug 16 02:33:30 hpmicroserver kernel:  worker_thread+0x2b/0x3f0
> Aug 16 02:33:30 hpmicroserver kernel:  ? pwq_unbound_release_workfn+0xc0/0xc0
> Aug 16 02:33:30 hpmicroserver kernel:  kthread+0x119/0x130
> Aug 16 02:33:30 hpmicroserver kernel:  ? __kthread_parkme+0xa0/0xa0
> Au
> 
> This appears to be a potential deadlock via incorrect memory allocation
> contexts in dm-crypt. i.e. the crypto code it uses is doing GFP_KERNEL
> allocations while setting up the encryption context which allows it to get
> stuck in a filesystem that can't make progress until the encryption
> completes. . i.e. the dm-crypt/crypto allocation context should probably be
> GFP_NOIO to prevent memory reclaim recursion into contexts that might be
> already be dependent on dm-crypt making progress (i.e. filesystems)....

So problematic call chain is:
crypt_convert -> encrypt -> init_crypt -> __kmalloc

crypto:xts.c:encrypt is:

static int encrypt(struct skcipher_request *req)
{
        return do_encrypt(req, init_crypt(req, encrypt_done));
}

There are no gfp flags passed in.  SO yes, to be able to work for all callers
init_crypt() should be changed from GFP_KERNEL to GFO_NOIO.

init_crypt() does the allocation with:
        gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ? GFP_KERNEL :
                                                           GFP_ATOMIC;

So another option would be for DM-crypt to _not_ set CRYPTO_TFM_REQ_MAY_SLEEP
in struct skcipher_request *req's base.flags (I wonder if it defaults to
setting it?)

The crypto code gets opaque quite quickly.. not yet sure where the relevant
dm-crypt code is that would be able to ensure CRYPTO_TFM_REQ_MAY_SLEEP is _not_
set in skcipher_request req->base.flags

In any case, it really does seem to make sense to change xts.c:init_crypt() to
use GDP_NOIO instead of GFP_KERNEL.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-08-17 13:25 ` bugzilla-daemon
@ 2018-08-17 14:04 ` bugzilla-daemon
  2018-08-17 14:06 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17 14:04 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

--- Comment #9 from Mike Snitzer (snitzer@redhat.com) ---
(In reply to Mike Snitzer from comment #8)

> init_crypt() does the allocation with:
>         gfp = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ? GFP_KERNEL :
>                                                            GFP_ATOMIC;
> 
> So another option would be for DM-crypt to _not_ set
> CRYPTO_TFM_REQ_MAY_SLEEP in struct skcipher_request *req's base.flags (I
> wonder if it defaults to setting it?)
> 
> The crypto code gets opaque quite quickly.. not yet sure where the relevant
> dm-crypt code is that would be able to ensure CRYPTO_TFM_REQ_MAY_SLEEP is
> _not_ set in skcipher_request req->base.flags

drivers/md/dm-crypt.c:crypt_alloc_req_skcipher() is calling
skcipher_request_set_callback() with CRYPTO_TFM_REQ_MAY_SLEEP set.

But there are quite a few other modes that also set CRYPTO_TFM_REQ_MAY_SLEEP. 
And most crypto code uses GFP_KERNEL if that is set.

In any case, it would seem commit f1c131b45410a ("crypto: xts - Convert to
skcipher") is what ushered in the use of GFP_KERNEL underneath dm-crypt's IO
path.

> In any case, it really does seem to make sense to change xts.c:init_crypt()
> to use GDP_NOIO instead of GFP_KERNEL.

I still think switching to GFP_NOIO here needs further thought and review by
the crypto/XTS developers.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 200835] XFS hangs in xfs_reclaim_inode()
  2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-08-17 14:04 ` bugzilla-daemon
@ 2018-08-17 14:06 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2018-08-17 14:06 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=200835

Mike Snitzer (snitzer@redhat.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gmazyland@gmail.com,
                   |                            |mpatocka@redhat.com
          Component|XFS                         |LVM2/DM
           Assignee|filesystem_xfs@kernel-bugs. |agk@redhat.com
                   |kernel.org                  |
            Product|File System                 |IO/Storage

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-08-17 17:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-16 16:03 [Bug 200835] New: XFS hangs in xfs_reclaim_inode() bugzilla-daemon
2018-08-16 16:04 ` [Bug 200835] " bugzilla-daemon
2018-08-17  2:36 ` bugzilla-daemon
2018-08-17  5:32 ` bugzilla-daemon
2018-08-17  5:34 ` bugzilla-daemon
2018-08-17  8:43 ` bugzilla-daemon
2018-08-17 11:55 ` bugzilla-daemon
2018-08-17 13:06 ` bugzilla-daemon
2018-08-17 13:25 ` bugzilla-daemon
2018-08-17 14:04 ` bugzilla-daemon
2018-08-17 14:06 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).