* potential riscv special bug maybe found
@ 2023-03-09 15:18 Bo YU
2023-08-09 14:47 ` Aurelien Jarno
0 siblings, 1 reply; 4+ messages in thread
From: Bo YU @ 2023-03-09 15:18 UTC (permalink / raw)
To: tsu.yubo, linux-riscv; +Cc: Andreas Gruenbacher
[-- Attachment #1.1: Type: text/plain, Size: 1301 bytes --]
Hi,
I am sorry if this is noise.
Some days ago I noticed strace 6.2 was built failed on riscv64 due to
test cases[0]. There is one program from strace can reproduce it:
```
./tests/read-write
```
It will be hang.
In fact, the issue has existed since 5.18. I `git bisect` and finally
found out the issue was introduced by the commit[1]:
commit 631f871f071746789e9242e514ab0f49067fa97a
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date: Tue Nov 9 12:56:06 2021 +0100
fs/iomap: Fix buffered write page prefaulting
I do not think there is a problem with this commit, because it does not
affect others arch expect riscv and after I reverted it, it will pass
all test cases from strace(There is still one case failed on qemu, but
this is another store).
I try to debug something but failed.
Would be appreciated it any help.
PS:
This is `cat /proc/${read-write-pid}/stack` when hang:
[<0>] generic_perform_write+0x12e/0x1ec
[<0>] ext4_buffered_write_iter+0x5e/0xe6
[<0>] ext4_file_write_iter+0xb4/0x67c
[<0>] vfs_write+0x1d2/0x308
[<0>] ksys_write+0x56/0xc6
[<0>] sys_write+0xe/0x16
[<0>] check_syscall_nr+0x3c/0x3c
[0]: https://github.com/strace/strace/issues/242
[1]: https://lkml.org/lkml/2021/11/23/641
--
Regards,
--
Bo YU
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: potential riscv special bug maybe found 2023-03-09 15:18 potential riscv special bug maybe found Bo YU @ 2023-08-09 14:47 ` Aurelien Jarno 2023-08-10 11:09 ` Alexandre Ghiti 0 siblings, 1 reply; 4+ messages in thread From: Aurelien Jarno @ 2023-08-09 14:47 UTC (permalink / raw) To: linux-riscv; +Cc: Bo YU, Andreas Gruenbacher [-- Attachment #1.1.1: Type: text/plain, Size: 1316 bytes --] Hi, On 2023-03-09 23:18, Bo YU wrote: > Hi, > > I am sorry if this is noise. > > Some days ago I noticed strace 6.2 was built failed on riscv64 due to > test cases[0]. There is one program from strace can reproduce it: > > ``` > ./tests/read-write ``` > > It will be hang. > > In fact, the issue has existed since 5.18. I `git bisect` and finally > found out the issue was introduced by the commit[1]: > > commit 631f871f071746789e9242e514ab0f49067fa97a > Author: Andreas Gruenbacher <agruenba@redhat.com> > Date: Tue Nov 9 12:56:06 2021 +0100 > > fs/iomap: Fix buffered write page prefaulting > > I do not think there is a problem with this commit, because it does not > affect others arch expect riscv and after I reverted it, it will pass > all test cases from strace(There is still one case failed on qemu, but > this is another store). > > I try to debug something but failed. > Would be appreciated it any help. Please find attached a simpler reproducer extracted from strace, which should make the issue easier to reproduce. It hangs on riscv64 and needs to be killed with -9, while it works fine on amd64. Regards Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net [-- Attachment #1.1.2: read-write.c --] [-- Type: text/x-csrc, Size: 2277 bytes --] /* * Check decoding and dumping of read and write syscalls. * * Copyright (c) 2016 Dmitry V. Levin <ldv@strace.io> * Copyright (c) 2016-2021 The strace developers. * All rights reserved. * * SPDX-License-Identifier: GPL-2.0-or-later */ #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <sys/mman.h> # define LENGTH_OF(arg) ((unsigned int) sizeof(arg) - 1) # define ARRAY_SIZE(a_) (sizeof(a_) / sizeof((a_)[0])) static void * tail_alloc(const size_t size) { const size_t page_size = sysconf(_SC_PAGESIZE); const size_t len = (size + page_size - 1) & -page_size; const size_t alloc_size = len + 6 * page_size; void *p = mmap(NULL, alloc_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (MAP_FAILED == p) { perror("mmap"); exit(1); } void *start_work = p + 3 * page_size; void *tail_guard = start_work + len; if (munmap(p, page_size) || munmap(p + 2 * page_size, page_size) || munmap(tail_guard, page_size) || munmap(tail_guard + 2 * page_size, page_size)) { perror("munmap"); exit(1); } memset(start_work, 0xff, len); return tail_guard - size; } static void fill_memory_ex(void *ptr, size_t size, unsigned char start, unsigned int period) { unsigned char *p = ptr; for (typeof(size) i = 0; i < size; ++i) { p[i] = start + i % period; } } int main(void) { static const char tmp[] = "read-write-tmpfile"; long rc; long fdr; long fdw; unlink(tmp); fdr = open(tmp, O_CREAT|O_EXCL|O_RDONLY, 0600); if (fdr < 0) { perror("create"); exit(1); } fdw = open(tmp, O_TRUNC|O_WRONLY); if (fdw < 0) { perror("open"); exit(1); } static const char w_c[] = "0123456789abcde"; const unsigned int w_len = LENGTH_OF(w_c); rc = write(fdw, w_c, w_len); if (rc != (int) w_len) { perror("write"); exit(1); } static const size_t six_wide_size = 1 << 20; static const size_t fetch_size = 1 << 16; const size_t buf_size = six_wide_size + fetch_size; const size_t sizes[] = { buf_size, buf_size + 1, }; char *big_buf = tail_alloc(buf_size); fill_memory_ex(big_buf, buf_size, 0, 0x100); for (size_t i = 0; i < ARRAY_SIZE(sizes); i++) { write(fdw, big_buf, sizes[i]); } close(fdw); return 0; } [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 161 bytes --] _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: potential riscv special bug maybe found 2023-08-09 14:47 ` Aurelien Jarno @ 2023-08-10 11:09 ` Alexandre Ghiti 2023-08-11 9:32 ` Alexandre Ghiti 0 siblings, 1 reply; 4+ messages in thread From: Alexandre Ghiti @ 2023-08-10 11:09 UTC (permalink / raw) To: Aurelien Jarno, linux-riscv; +Cc: Bo YU, Andreas Gruenbacher Hi Aurélien, Bo, On 09/08/2023 16:47, Aurelien Jarno wrote: > Hi, > > On 2023-03-09 23:18, Bo YU wrote: >> Hi, >> >> I am sorry if this is noise. >> >> Some days ago I noticed strace 6.2 was built failed on riscv64 due to >> test cases[0]. There is one program from strace can reproduce it: >> >> ``` >> ./tests/read-write ``` >> >> It will be hang. >> >> In fact, the issue has existed since 5.18. I `git bisect` and finally >> found out the issue was introduced by the commit[1]: >> >> commit 631f871f071746789e9242e514ab0f49067fa97a >> Author: Andreas Gruenbacher <agruenba@redhat.com> >> Date: Tue Nov 9 12:56:06 2021 +0100 >> >> fs/iomap: Fix buffered write page prefaulting >> >> I do not think there is a problem with this commit, because it does not >> affect others arch expect riscv and after I reverted it, it will pass >> all test cases from strace(There is still one case failed on qemu, but >> this is another store). >> >> I try to debug something but failed. >> Would be appreciated it any help. > Please find attached a simpler reproducer extracted from strace, which > should make the issue easier to reproduce. It hangs on riscv64 and needs > to be killed with -9, while it works fine on amd64. Thanks for the reproducer, I was able to reproduce the problem: the kernel is stuck trying to copy data from user, I'm looking into it right now as this seems very weird. Note that I will be on vacation at the end of the week, if I don't have time to fix this, I'll post my findings here. Thanks again, Alex > Regards > Aurelien > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: potential riscv special bug maybe found 2023-08-10 11:09 ` Alexandre Ghiti @ 2023-08-11 9:32 ` Alexandre Ghiti 0 siblings, 0 replies; 4+ messages in thread From: Alexandre Ghiti @ 2023-08-11 9:32 UTC (permalink / raw) To: Aurelien Jarno, linux-riscv; +Cc: Bo YU, Andreas Gruenbacher Hi Aurélien, Bo, On 10/08/2023 13:09, Alexandre Ghiti wrote: > Hi Aurélien, Bo, > > > On 09/08/2023 16:47, Aurelien Jarno wrote: >> Hi, >> >> On 2023-03-09 23:18, Bo YU wrote: >>> Hi, >>> >>> I am sorry if this is noise. >>> >>> Some days ago I noticed strace 6.2 was built failed on riscv64 due to >>> test cases[0]. There is one program from strace can reproduce it: >>> >>> ``` >>> ./tests/read-write ``` >>> >>> It will be hang. >>> >>> In fact, the issue has existed since 5.18. I `git bisect` and finally >>> found out the issue was introduced by the commit[1]: >>> >>> commit 631f871f071746789e9242e514ab0f49067fa97a >>> Author: Andreas Gruenbacher <agruenba@redhat.com> >>> Date: Tue Nov 9 12:56:06 2021 +0100 >>> >>> fs/iomap: Fix buffered write page prefaulting >>> >>> I do not think there is a problem with this commit, because it does not >>> affect others arch expect riscv and after I reverted it, it will pass >>> all test cases from strace(There is still one case failed on qemu, but >>> this is another store). >>> >>> I try to debug something but failed. >>> Would be appreciated it any help. >> Please find attached a simpler reproducer extracted from strace, which >> should make the issue easier to reproduce. It hangs on riscv64 and needs >> to be killed with -9, while it works fine on amd64. > > > Thanks for the reproducer, I was able to reproduce the problem: the > kernel is stuck trying to copy data from user, I'm looking into it > right now as this seems very weird. Note that I will be on vacation at > the end of the week, if I don't have time to fix this, I'll post my > findings here. > So I was able to find the root cause and I'm about to send a fix, but in a nustshell, our copy_[from|to]_user and clear_user routines do not return the number of bytes effectively written when a "fixup exception" happens, which causes the hang you both observed. Thank you very much Aurélien for the reproducer, that really helps! And thanks Bo for the initial report! Alex > > Thanks again, > > Alex > > >> Regards >> Aurelien >> >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-11 9:32 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-03-09 15:18 potential riscv special bug maybe found Bo YU 2023-08-09 14:47 ` Aurelien Jarno 2023-08-10 11:09 ` Alexandre Ghiti 2023-08-11 9:32 ` Alexandre Ghiti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox