* recent issues with heavy delete's causing soft lockups
@ 2018-10-27 18:40 Thomas Fjellstrom
2018-10-27 19:20 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Thomas Fjellstrom @ 2018-10-27 18:40 UTC (permalink / raw)
To: linux-block
Hi
As of the past few months or so I've been dealing with my workstation locking
up for upwards of minutes at a time when deleting a large directory tree. I
don't recall this being a problem before.
Current setup is 3 SATA SSDs in an lvm vg. most space is allocated to an ext4
/home where my work projects live.
The main use case causing problems is deleting the "out" directory of an
android AOSP build tree. It can be upwards of 95GB in size with 240k or more
files. If I run a `rm -fr out` or `make clean` it will lock up anything
attempting to use the disk (eg: plasma, intellij, android studio, chrome, etc)
for sometimes minutes.
I have tried different block scheduler settings including none, mq-deadline,
kyber and bfq none of which seem to improve things much at all.
It may be worth noting that disk space is starting to run low, perhaps there's
some interaction going on with free space handling or ssd wear leveling...
That said, it seems to have started happening (or at least made worse) some
time around when mq was made the default and only implementation for sata.
if it helps, my system specs are:
Kernel: Debian Sid's 4.18.0-2-amd64 (4.18.10-2)
CPU: AMD FX-8320 OCed to 4.4Ghz
RAM: 32GB DDR3 1866
MB: Asus 970 Aura Pro Gaming
Storage: Kingston HyperX 3K 240G + Samsung 850 Evo 250G + SanDisk X300 500G
I'm thinking of testing with a different or older kernel, what would be the
best to test with?
Thanks for any assistance.
--
Thomas Fjellstrom
thomas@fjellstrom.ca
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: recent issues with heavy delete's causing soft lockups
2018-10-27 18:40 recent issues with heavy delete's causing soft lockups Thomas Fjellstrom
@ 2018-10-27 19:20 ` Jens Axboe
2018-11-02 18:25 ` Thomas Fjellstrom
2018-11-02 20:32 ` Thomas Fjellstrom
0 siblings, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2018-10-27 19:20 UTC (permalink / raw)
To: Thomas Fjellstrom; +Cc: linux-block
On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom <thomas@fjellstrom.ca> wrote=
:
>=20
> Hi
>=20
> As of the past few months or so I've been dealing with my workstation lock=
ing=20
> up for upwards of minutes at a time when deleting a large directory tree. I=
=20
> don't recall this being a problem before.
>=20
> Current setup is 3 SATA SSDs in an lvm vg. most space is allocated to an e=
xt4=20
> /home where my work projects live.
>=20
> The main use case causing problems is deleting the "out" directory of an=20=
> android AOSP build tree. It can be upwards of 95GB in size with 240k or mo=
re=20
> files. If I run a `rm -fr out` or `make clean` it will lock up anything=20=
> attempting to use the disk (eg: plasma, intellij, android studio, chrome, e=
tc)=20
> for sometimes minutes.
>=20
> I have tried different block scheduler settings including none, mq-deadlin=
e,=20
> kyber and bfq none of which seem to improve things much at all.
>=20
> It may be worth noting that disk space is starting to run low, perhaps the=
re's=20
> some interaction going on with free space handling or ssd wear leveling...=
>=20
> That said, it seems to have started happening (or at least made worse) som=
e=20
> time around when mq was made the default and only implementation for sata.=
>=20
> if it helps, my system specs are:
>=20
> Kernel: Debian Sid's 4.18.0-2-amd64 (4.18.10-2)
> CPU: AMD FX-8320 OCed to 4.4Ghz
> RAM: 32GB DDR3 1866
> MB: Asus 970 Aura Pro Gaming
> Storage: Kingston HyperX 3K 240G + Samsung 850 Evo 250G + SanDisk X300 500=
G
>=20
> I'm thinking of testing with a different or older kernel, what would be th=
e=20
> best to test with?
Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue a=
round requeue conditions, which SATA is the one to most often hit.=20
Jens=
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: recent issues with heavy delete's causing soft lockups
2018-10-27 19:20 ` Jens Axboe
@ 2018-11-02 18:25 ` Thomas Fjellstrom
2018-11-02 20:32 ` Thomas Fjellstrom
1 sibling, 0 replies; 6+ messages in thread
From: Thomas Fjellstrom @ 2018-11-02 18:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block
On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom <thomas@fjellstrom.ca>
wrote:
> > Hi
[snip explanation of problem]
>
> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue
> around requeue conditions, which SATA is the one to most often hit.
Gave it a shot. with the vanila kernel from git linux-stable/v4.9. It was a
bit of a pain as the amdgpu driver seems to be broken for my r9 390 on many
kernels, including 4.19. Had to reconfigure to the radeon driver, which I must
say seems to work a lot better than it used to.
At any rate, it doesn't seem to have helped a lot so far. I did end up adding
"scsi_mod.use_blk_mq=0 dm_mod.use_blk_mq=0" to the default kernel boot command
line in grub. It seems to have helped a little, but I haven't tested fully
with a full delete of the build directory. haven't had time to sit and wait
the 40+ minutes it takes to re build the entire thing. And I'm low enough on
disk space that I can't easily make a copy of the 109GB build folder. I've got
about 25GB free out of 780GB. I'll try and test some more soon.
> Jens
--
Thomas Fjellstrom
thomas@fjellstrom.ca
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: recent issues with heavy delete's causing soft lockups
2018-10-27 19:20 ` Jens Axboe
2018-11-02 18:25 ` Thomas Fjellstrom
@ 2018-11-02 20:32 ` Thomas Fjellstrom
2018-11-02 20:37 ` Jens Axboe
1 sibling, 1 reply; 6+ messages in thread
From: Thomas Fjellstrom @ 2018-11-02 20:32 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block
On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom <thomas@fjellstrom.ca>
[snip]
>
> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue
> around requeue conditions, which SATA is the one to most often hit.
>
> Jens
I just had to do a clean, and I have the mq kernel options I mentioned in my
previous mail enabled. (mq should be disabled) and it appears to still be
causing issues. current io scheduler appears to be cfq, and it took that "make
clean" about 4 minutes, a lot of that time was spent with plasma, intelij, and
chrome all starved of IO.
I did switch to a terminal and checked iostat -d 1, and it showed very little
actual io for the time I was looking at it.
I have no idea what's going on.
--
Thomas Fjellstrom
thomas@fjellstrom.ca
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: recent issues with heavy delete's causing soft lockups
2018-11-02 20:32 ` Thomas Fjellstrom
@ 2018-11-02 20:37 ` Jens Axboe
2018-11-21 21:25 ` Thomas Fjellstrom
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2018-11-02 20:37 UTC (permalink / raw)
To: Thomas Fjellstrom; +Cc: linux-block
On 11/2/18 2:32 PM, Thomas Fjellstrom wrote:
> On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
>> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom <thomas@fjellstrom.ca>
> [snip]
>>
>> Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue
>> around requeue conditions, which SATA is the one to most often hit.
>>
>> Jens
>
> I just had to do a clean, and I have the mq kernel options I mentioned in my
> previous mail enabled. (mq should be disabled) and it appears to still be
> causing issues. current io scheduler appears to be cfq, and it took that "make
> clean" about 4 minutes, a lot of that time was spent with plasma, intelij, and
> chrome all starved of IO.
>
> I did switch to a terminal and checked iostat -d 1, and it showed very little
> actual io for the time I was looking at it.
>
> I have no idea what's going on.
If you're using cfq, then it's not using mq at all. Maybe do something ala:
# perf record -ag -- sleep 10
while the slowdown is happening and then do perf report -g --no-children and
see if that yields anything interesting. Sounds like time is being spent
elsewhere and you aren't actually waiting on IO.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: recent issues with heavy delete's causing soft lockups
2018-11-02 20:37 ` Jens Axboe
@ 2018-11-21 21:25 ` Thomas Fjellstrom
0 siblings, 0 replies; 6+ messages in thread
From: Thomas Fjellstrom @ 2018-11-21 21:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block
On Friday, November 2, 2018 2:37:08 PM MST Jens Axboe wrote:
> On 11/2/18 2:32 PM, Thomas Fjellstrom wrote:
> > On Saturday, October 27, 2018 1:20:10 PM MDT Jens Axboe wrote:
> >> On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom <thomas@fjellstrom.ca>
> >
> > [snip]
> >
> >> Can you try 4.19? A patch went in since 4.18 that fixes a starvation
> >> issue
> >> around requeue conditions, which SATA is the one to most often hit.
> >>
> >> Jens
> >
> > I just had to do a clean, and I have the mq kernel options I mentioned in
> > my previous mail enabled. (mq should be disabled) and it appears to still
> > be causing issues. current io scheduler appears to be cfq, and it took
> > that "make clean" about 4 minutes, a lot of that time was spent with
> > plasma, intelij, and chrome all starved of IO.
> >
> > I did switch to a terminal and checked iostat -d 1, and it showed very
> > little actual io for the time I was looking at it.
> >
> > I have no idea what's going on.
>
> If you're using cfq, then it's not using mq at all. Maybe do something ala:
Yeah, I switched off mq to test. I mentioned it in a previous mail.
> # perf record -ag -- sleep 10
>
> while the slowdown is happening and then do perf report -g --no-children and
> see if that yields anything interesting. Sounds like time is being spent
> elsewhere and you aren't actually waiting on IO.
Ok, with the 4.19.1 kernel from linux-stable I've managed to catch the issue
during real use, rather than just a dd command.
I should note that I have swap turned off, so I'm not sure what
the "swapper" process in the below log is doing.
I also see the problem with swap enabled. But right now I'd rather
certain apps die rather than slow the entire system down.
I also have a perf report -t log if that'd be helpful. It shows a lot of "use"
in do_idle/acpi_idle_do_entry though I presume that's actual real idle time,
not actual use. The next most eye catching item in the -t log is chrome spending
17% of its time in glibc's free function.
(the top 100~ lines from perf report -g)
# Total Lost Samples: 0
#
# Samples: 456K of event 'cycles'
# Event count (approx.): 136347735217
#
# Overhead Command Shared Object Symbol
# ........ ............... ...................................... .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
#
25.64% swapper [kernel.kallsyms] [k] acpi_idle_do_entry
|
---0xffffffffa16000d4
|
|--22.23%--start_secondary
| cpu_startup_entry
| do_idle
| cpuidle_enter_state
| acpi_idle_enter
| acpi_idle_do_entry
|
--3.41%--start_kernel
cpu_startup_entry
do_idle
cpuidle_enter_state
acpi_idle_enter
acpi_idle_do_entry
0.61% swapper [kernel.kallsyms] [k] apic_timer_interrupt
|
---0xffffffffa16000d4
|
--0.52%--start_secondary
cpu_startup_entry
do_idle
|
--0.52%--cpuidle_enter_state
0.54% chrome chrome [.] _fini
0.42% swapper [kernel.kallsyms] [k] native_sched_clock
0.41% swapper [kernel.kallsyms] [k] menu_select
0.40% swapper [kernel.kallsyms] [k] check_preemption_disabled
0.35% http.so libQt5Core.so.5.11.2 [.] QTranslatorPrivate::do_translate
0.35% swapper [kernel.kallsyms] [k] x86_pmu_disable_all
0.32% TaskSchedulerFo [kernel.kallsyms] [k] osq_lock
0.31% Chrome_IOThread chrome [.] _fini
0.30% chrome libpthread-2.27.so [.] __pthread_mutex_lock
0.29% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
0.28% swapper [kernel.kallsyms] [k] read_tsc
0.26% chrome libpthread-2.27.so [.] __pthread_mutex_unlock_usercnt
0.26% swapper [kernel.kallsyms] [k] reschedule_interrupt
0.24% swapper [kernel.kallsyms] [k] _raw_spin_lock
0.24% swapper [kernel.kallsyms] [k] __sched_text_start
0.24% swapper [kernel.kallsyms] [k] native_load_gs_index
0.23% swapper [kernel.kallsyms] [k] __switch_to
0.22% swapper [kernel.kallsyms] [k] do_idle
0.21% TaskSchedulerFo [kernel.kallsyms] [k] mutex_lock
0.21% swapper [kernel.kallsyms] [k] cpuidle_enter_state
0.21% TaskSchedulerFo chrome [.] 0x000000000306c000
0.20% chrome [kernel.kallsyms] [k] native_sched_clock
0.20% TaskSchedulerFo [kernel.kallsyms] [k] mutex_unlock
0.18% chrome [kernel.kallsyms] [k] entry_SYSCALL_64
0.18% thumbnail.so ld-2.27.so [.] do_lookup_x
0.17% Xorg [kernel.kallsyms] [k] delay_tsc
0.17% rm [ext4] [k] ext4_mark_iloc_dirty
0.16% swapper [kernel.kallsyms] [k] update_blocked_averages
0.16% chrome [kernel.kallsyms] [k] check_preemption_disabled
0.15% swapper [kernel.kallsyms] [k] update_load_avg
0.15% swapper [kernel.kallsyms] [k] interrupt_entry
0.15% swapper [kernel.kallsyms] [k] ktime_get
0.15% swapper [kernel.kallsyms] [k] switch_mm_irqs_off
0.15% TaskSchedulerFo [kernel.kallsyms] [k] __mutex_lock.isra.5
0.14% rm [kernel.kallsyms] [k] check_preemption_disabled
0.14% TaskSchedulerFo chrome [.] 0x000000000306c009
0.13% swapper [kernel.kallsyms] [k] __update_load_avg_se
0.13% chrome libc-2.27.so [.] __memcpy_ssse3
0.13% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq
0.12% http.so libQt5Core.so.5.11.2 [.] QCoreApplicationPrivate::sendPostedEvents
0.12% rm [kernel.kallsyms] [k] __find_get_block
0.12% swapper [kernel.kallsyms] [k] timerqueue_add
0.12% swapper [kernel.kallsyms] [k] acpi_idle_enter
0.12% apt-cache libz.so.1.2.11 [.] adler32_z
0.12% swapper [kernel.kallsyms] [k] rcu_dynticks_eqs_exit
0.12% Xorg [radeon] [k] cail_reg_read
0.12% swapper [kernel.kallsyms] [k] trace_hardirqs_off
0.11% swapper [kernel.kallsyms] [k] set_next_entity
0.11% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.11% http.so libQt5Core.so.5.11.2 [.] QCoreApplication::translate
0.11% http.so [kernel.kallsyms] [k] __switch_to
0.11% Chrome_ChildIOT chrome [.] _fini
0.11% chrome [kernel.kallsyms] [k] __fget
0.10% swapper [kernel.kallsyms] [k] __hrtimer_next_event_base
0.10% http.so [kernel.kallsyms] [k] native_load_gs_index
0.10% swapper [kernel.kallsyms] [k] rcu_check_callbacks
0.10% drkonqi ld-2.27.so [.] do_lookup_x
0.10% TaskSchedulerFo chrome [.] 0x000000000306e42b
0.10% http.so [kernel.kallsyms] [k] native_sched_clock
0.10% swapper [kernel.kallsyms] [k] x86_pmu_enable_all
0.10% swapper [kernel.kallsyms] [k] find_busiest_group
0.10% radeon_cs:0 [kernel.kallsyms] [k] refcount_sub_and_test_checked
0.10% http.so [vdso] [.] 0x00000000000008d9
Thanks,
--
Thomas Fjellstrom
thomas@fjellstrom.ca
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-11-21 21:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-27 18:40 recent issues with heavy delete's causing soft lockups Thomas Fjellstrom
2018-10-27 19:20 ` Jens Axboe
2018-11-02 18:25 ` Thomas Fjellstrom
2018-11-02 20:32 ` Thomas Fjellstrom
2018-11-02 20:37 ` Jens Axboe
2018-11-21 21:25 ` Thomas Fjellstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox