* potential riscv special bug maybe found
@ 2023-03-09 15:18 Bo YU
2023-08-09 14:47 ` Aurelien Jarno
0 siblings, 1 reply; 4+ messages in thread
From: Bo YU @ 2023-03-09 15:18 UTC (permalink / raw)
To: tsu.yubo, linux-riscv; +Cc: Andreas Gruenbacher
[-- Attachment #1.1: Type: text/plain, Size: 1301 bytes --]
Hi,
I am sorry if this is noise.
Some days ago I noticed strace 6.2 was built failed on riscv64 due to
test cases[0]. There is one program from strace can reproduce it:
```
./tests/read-write
```
It will be hang.
In fact, the issue has existed since 5.18. I `git bisect` and finally
found out the issue was introduced by the commit[1]:
commit 631f871f071746789e9242e514ab0f49067fa97a
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date: Tue Nov 9 12:56:06 2021 +0100
fs/iomap: Fix buffered write page prefaulting
I do not think there is a problem with this commit, because it does not
affect others arch expect riscv and after I reverted it, it will pass
all test cases from strace(There is still one case failed on qemu, but
this is another store).
I try to debug something but failed.
Would be appreciated it any help.
PS:
This is `cat /proc/${read-write-pid}/stack` when hang:
[<0>] generic_perform_write+0x12e/0x1ec
[<0>] ext4_buffered_write_iter+0x5e/0xe6
[<0>] ext4_file_write_iter+0xb4/0x67c
[<0>] vfs_write+0x1d2/0x308
[<0>] ksys_write+0x56/0xc6
[<0>] sys_write+0xe/0x16
[<0>] check_syscall_nr+0x3c/0x3c
[0]: https://github.com/strace/strace/issues/242
[1]: https://lkml.org/lkml/2021/11/23/641
--
Regards,
--
Bo YU
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: potential riscv special bug maybe found
2023-03-09 15:18 potential riscv special bug maybe found Bo YU
@ 2023-08-09 14:47 ` Aurelien Jarno
2023-08-10 11:09 ` Alexandre Ghiti
0 siblings, 1 reply; 4+ messages in thread
From: Aurelien Jarno @ 2023-08-09 14:47 UTC (permalink / raw)
To: linux-riscv; +Cc: Bo YU, Andreas Gruenbacher
[-- Attachment #1.1.1: Type: text/plain, Size: 1316 bytes --]
Hi,
On 2023-03-09 23:18, Bo YU wrote:
> Hi,
>
> I am sorry if this is noise.
>
> Some days ago I noticed strace 6.2 was built failed on riscv64 due to
> test cases[0]. There is one program from strace can reproduce it:
>
> ```
> ./tests/read-write ```
>
> It will be hang.
>
> In fact, the issue has existed since 5.18. I `git bisect` and finally
> found out the issue was introduced by the commit[1]:
>
> commit 631f871f071746789e9242e514ab0f49067fa97a
> Author: Andreas Gruenbacher <agruenba@redhat.com>
> Date: Tue Nov 9 12:56:06 2021 +0100
>
> fs/iomap: Fix buffered write page prefaulting
>
> I do not think there is a problem with this commit, because it does not
> affect others arch expect riscv and after I reverted it, it will pass
> all test cases from strace(There is still one case failed on qemu, but
> this is another store).
>
> I try to debug something but failed.
> Would be appreciated it any help.
Please find attached a simpler reproducer extracted from strace, which
should make the issue easier to reproduce. It hangs on riscv64 and needs
to be killed with -9, while it works fine on amd64.
Regards
Aurelien
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://aurel32.net
[-- Attachment #1.1.2: read-write.c --]
[-- Type: text/x-csrc, Size: 2277 bytes --]
/*
* Check decoding and dumping of read and write syscalls.
*
* Copyright (c) 2016 Dmitry V. Levin <ldv@strace.io>
* Copyright (c) 2016-2021 The strace developers.
* All rights reserved.
*
* SPDX-License-Identifier: GPL-2.0-or-later
*/
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
# define LENGTH_OF(arg) ((unsigned int) sizeof(arg) - 1)
# define ARRAY_SIZE(a_) (sizeof(a_) / sizeof((a_)[0]))
static void *
tail_alloc(const size_t size)
{
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t len = (size + page_size - 1) & -page_size;
const size_t alloc_size = len + 6 * page_size;
void *p = mmap(NULL, alloc_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == p) {
perror("mmap");
exit(1);
}
void *start_work = p + 3 * page_size;
void *tail_guard = start_work + len;
if (munmap(p, page_size) ||
munmap(p + 2 * page_size, page_size) ||
munmap(tail_guard, page_size) ||
munmap(tail_guard + 2 * page_size, page_size)) {
perror("munmap");
exit(1);
}
memset(start_work, 0xff, len);
return tail_guard - size;
}
static void
fill_memory_ex(void *ptr, size_t size, unsigned char start,
unsigned int period)
{
unsigned char *p = ptr;
for (typeof(size) i = 0; i < size; ++i) {
p[i] = start + i % period;
}
}
int
main(void)
{
static const char tmp[] = "read-write-tmpfile";
long rc;
long fdr;
long fdw;
unlink(tmp);
fdr = open(tmp, O_CREAT|O_EXCL|O_RDONLY, 0600);
if (fdr < 0) {
perror("create");
exit(1);
}
fdw = open(tmp, O_TRUNC|O_WRONLY);
if (fdw < 0) {
perror("open");
exit(1);
}
static const char w_c[] = "0123456789abcde";
const unsigned int w_len = LENGTH_OF(w_c);
rc = write(fdw, w_c, w_len);
if (rc != (int) w_len) {
perror("write");
exit(1);
}
static const size_t six_wide_size = 1 << 20;
static const size_t fetch_size = 1 << 16;
const size_t buf_size = six_wide_size + fetch_size;
const size_t sizes[] = {
buf_size,
buf_size + 1,
};
char *big_buf = tail_alloc(buf_size);
fill_memory_ex(big_buf, buf_size, 0, 0x100);
for (size_t i = 0; i < ARRAY_SIZE(sizes); i++) {
write(fdw, big_buf, sizes[i]);
}
close(fdw);
return 0;
}
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 161 bytes --]
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: potential riscv special bug maybe found
2023-08-09 14:47 ` Aurelien Jarno
@ 2023-08-10 11:09 ` Alexandre Ghiti
2023-08-11 9:32 ` Alexandre Ghiti
0 siblings, 1 reply; 4+ messages in thread
From: Alexandre Ghiti @ 2023-08-10 11:09 UTC (permalink / raw)
To: Aurelien Jarno, linux-riscv; +Cc: Bo YU, Andreas Gruenbacher
Hi Aurélien, Bo,
On 09/08/2023 16:47, Aurelien Jarno wrote:
> Hi,
>
> On 2023-03-09 23:18, Bo YU wrote:
>> Hi,
>>
>> I am sorry if this is noise.
>>
>> Some days ago I noticed strace 6.2 was built failed on riscv64 due to
>> test cases[0]. There is one program from strace can reproduce it:
>>
>> ```
>> ./tests/read-write ```
>>
>> It will be hang.
>>
>> In fact, the issue has existed since 5.18. I `git bisect` and finally
>> found out the issue was introduced by the commit[1]:
>>
>> commit 631f871f071746789e9242e514ab0f49067fa97a
>> Author: Andreas Gruenbacher <agruenba@redhat.com>
>> Date: Tue Nov 9 12:56:06 2021 +0100
>>
>> fs/iomap: Fix buffered write page prefaulting
>>
>> I do not think there is a problem with this commit, because it does not
>> affect others arch expect riscv and after I reverted it, it will pass
>> all test cases from strace(There is still one case failed on qemu, but
>> this is another store).
>>
>> I try to debug something but failed.
>> Would be appreciated it any help.
> Please find attached a simpler reproducer extracted from strace, which
> should make the issue easier to reproduce. It hangs on riscv64 and needs
> to be killed with -9, while it works fine on amd64.
Thanks for the reproducer, I was able to reproduce the problem: the
kernel is stuck trying to copy data from user, I'm looking into it right
now as this seems very weird. Note that I will be on vacation at the end
of the week, if I don't have time to fix this, I'll post my findings here.
Thanks again,
Alex
> Regards
> Aurelien
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: potential riscv special bug maybe found
2023-08-10 11:09 ` Alexandre Ghiti
@ 2023-08-11 9:32 ` Alexandre Ghiti
0 siblings, 0 replies; 4+ messages in thread
From: Alexandre Ghiti @ 2023-08-11 9:32 UTC (permalink / raw)
To: Aurelien Jarno, linux-riscv; +Cc: Bo YU, Andreas Gruenbacher
Hi Aurélien, Bo,
On 10/08/2023 13:09, Alexandre Ghiti wrote:
> Hi Aurélien, Bo,
>
>
> On 09/08/2023 16:47, Aurelien Jarno wrote:
>> Hi,
>>
>> On 2023-03-09 23:18, Bo YU wrote:
>>> Hi,
>>>
>>> I am sorry if this is noise.
>>>
>>> Some days ago I noticed strace 6.2 was built failed on riscv64 due to
>>> test cases[0]. There is one program from strace can reproduce it:
>>>
>>> ```
>>> ./tests/read-write ```
>>>
>>> It will be hang.
>>>
>>> In fact, the issue has existed since 5.18. I `git bisect` and finally
>>> found out the issue was introduced by the commit[1]:
>>>
>>> commit 631f871f071746789e9242e514ab0f49067fa97a
>>> Author: Andreas Gruenbacher <agruenba@redhat.com>
>>> Date: Tue Nov 9 12:56:06 2021 +0100
>>>
>>> fs/iomap: Fix buffered write page prefaulting
>>>
>>> I do not think there is a problem with this commit, because it does not
>>> affect others arch expect riscv and after I reverted it, it will pass
>>> all test cases from strace(There is still one case failed on qemu, but
>>> this is another store).
>>>
>>> I try to debug something but failed.
>>> Would be appreciated it any help.
>> Please find attached a simpler reproducer extracted from strace, which
>> should make the issue easier to reproduce. It hangs on riscv64 and needs
>> to be killed with -9, while it works fine on amd64.
>
>
> Thanks for the reproducer, I was able to reproduce the problem: the
> kernel is stuck trying to copy data from user, I'm looking into it
> right now as this seems very weird. Note that I will be on vacation at
> the end of the week, if I don't have time to fix this, I'll post my
> findings here.
>
So I was able to find the root cause and I'm about to send a fix, but in
a nustshell, our copy_[from|to]_user and clear_user routines do not
return the number of bytes effectively written when a "fixup exception"
happens, which causes the hang you both observed.
Thank you very much Aurélien for the reproducer, that really helps!
And thanks Bo for the initial report!
Alex
>
> Thanks again,
>
> Alex
>
>
>> Regards
>> Aurelien
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-11 9:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-09 15:18 potential riscv special bug maybe found Bo YU
2023-08-09 14:47 ` Aurelien Jarno
2023-08-10 11:09 ` Alexandre Ghiti
2023-08-11 9:32 ` Alexandre Ghiti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox