* [sparc64] sigbus in e2fsck
@ 2016-08-30 12:42 Anatoly Pugachev
2016-08-30 12:56 ` John Paul Adrian Glaubitz
0 siblings, 1 reply; 10+ messages in thread
From: Anatoly Pugachev @ 2016-08-30 12:42 UTC (permalink / raw)
To: linux-ext4; +Cc: debian-sparc
Hello!
I'm getting SIGBUS (unaligned access?) in fsck.ext4 / e2fsck running
on debian sparc64 sid/unstable linux:
root@ttip:/home/mator/e2fsprogs# git describe
v1.43.1-14-g9a23fa8
root@ttip:/home/mator/e2fsprogs# git remote -v
origin git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git (fetch)
root@ttip:/home/mator/e2fsprogs/build/e2fsck# gdb -q
(gdb) file ./e2fsck
Reading symbols from ./e2fsck...done.
(gdb) set args /dev/vdiskc2
(gdb) run
Starting program: /home/mator/e2fsprogs/build/e2fsck/e2fsck /dev/vdiskc2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1".
e2fsck 1.43.1 (08-Jun-2016)
/dev/vdiskc2: recovering journal
Program received signal SIGBUS, Bus error.
0x0000000000142f54 in scan_revoke_records (journal=0x2decd0,
bh=0x2e9b70, sequence=36855, info=0x7feffffef68) at
../../e2fsck/recovery.c:866
866 blocknr = ext2fs_be64_to_cpu(* ((__u64
*) (bh->b_data+offset)));
(gdb) bt
#0 0x0000000000142f54 in scan_revoke_records (journal=0x2decd0,
bh=0x2e9b70, sequence=36855, info=0x7feffffef68) at
../../e2fsck/recovery.c:866
#1 0x0000000000142bf8 in do_one_pass (journal=0x2decd0,
info=0x7feffffef68, pass=PASS_REVOKE) at ../../e2fsck/recovery.c:767
#2 0x0000000000141c54 in journal_recover (journal=0x2decd0) at
../../e2fsck/recovery.c:273
#3 0x0000000000139570 in recover_ext3_journal (ctx=0x2d1000) at
../../e2fsck/journal.c:940
#4 0x0000000000139750 in e2fsck_run_ext3_journal (ctx=0x2d1000) at
../../e2fsck/journal.c:978
#5 0x00000000001152d0 in main (argc=2, argv=0x7fefffff6e8) at
../../e2fsck/unix.c:1678
(gdb)
It should be the same sigbus which I have on debian boot (log cut):
[952000.246726] sunvnet: eth0: PORT ( remote-mac 00:14:4f:f8:38:39 )
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Will now check root file system ... fsck from util-linux 2.28.1
[/sbin/fsck.ext4 (1) -- /dev/vdiska2] fsck.ext4 -a -C0 /dev/vdiska2
/dev/vdiska2: recovering journal
Signal (10) SIGBUS si_code=BUS_ADRALN fault addr=0x10000164a8c
fsck.ext4(+0x313f0)[0x100000313f0]
fsck exited with status code 8
done.
Warning: File system check failed but did not detect errors
[952005.583237] EXT4-fs (vdiska2): INFO: recovery required on readonly
filesystem
[952005.583350] EXT4-fs (vdiska2): write access will be enabled during recovery
[952005.666515] EXT4-fs (vdiska2): recovery complete
[952005.680055] EXT4-fs (vdiska2): mounted filesystem with ordered
data mode. Opts: (null)
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[952005.743934] ip_tables: (C) 2000-2006 Netfilter Core Team
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 12:42 [sparc64] sigbus in e2fsck Anatoly Pugachev
@ 2016-08-30 12:56 ` John Paul Adrian Glaubitz
2016-08-30 14:58 ` Theodore Ts'o
0 siblings, 1 reply; 10+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-08-30 12:56 UTC (permalink / raw)
To: Anatoly Pugachev, linux-ext4; +Cc: debian-sparc
On 08/30/2016 02:42 PM, Anatoly Pugachev wrote:
> ../../e2fsck/recovery.c:866
> 866 blocknr = ext2fs_be64_to_cpu(* ((__u64
> *) (bh->b_data+offset)));
The reason is that this expression is casting "char * b_data" [1] into u64 [2]
which provokes unaligned access. Since such expression are often inevitable,
it's probably best to modify the conversion macros in bitops.h [3] to be
safe against unaligned accesses.
An example for that can be seen in the systemd code base [4].
Thanks,
Adrian
> [1] http://lxr.free-electrons.com/source/include/linux/buffer_head.h#L69
> [2] http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/e2fsck/recovery.c#n866
> [3] http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/lib/ext2fs/bitops.h
> [4] https://github.com/systemd/systemd/blob/master/src/basic/unaligned.h
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - glaubitz@debian.org
`. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 12:56 ` John Paul Adrian Glaubitz
@ 2016-08-30 14:58 ` Theodore Ts'o
2016-08-30 15:03 ` John Paul Adrian Glaubitz
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-08-30 14:58 UTC (permalink / raw)
To: John Paul Adrian Glaubitz; +Cc: Anatoly Pugachev, linux-ext4, debian-sparc
On Tue, Aug 30, 2016 at 02:56:45PM +0200, John Paul Adrian Glaubitz wrote:
> On 08/30/2016 02:42 PM, Anatoly Pugachev wrote:
> > ../../e2fsck/recovery.c:866
> > 866 blocknr = ext2fs_be64_to_cpu(* ((__u64
> > *) (bh->b_data+offset)));
>
> The reason is that this expression is casting "char * b_data" [1] into u64 [2]
> which provokes unaligned access. Since such expression are often inevitable,
> it's probably best to modify the conversion macros in bitops.h [3] to be
> safe against unaligned accesses.
I don't think that's it. b_data is a 4k buffer should be 8 byte
aligned. For a file system with 64-bit blocks (which you presumably
have since we're on the be64 path as shown in your debugger output)
the offset is initially set to 16, and is incremented in chunks of 8
bytes. So there shouldn't be any unaligned access.
Since you are able to provke this in a debugger, can you have gdb
print out the value of bh->b_data and offset, so we can be sure what's
going on?
I do see a potential problem which is that we're defining
journal_revoke_header_t structure in a non-portable way:
typedef struct journal_header_s
{
__u32 h_magic;
__u32 h_blocktype;
__u32 h_sequence;
} journal_header_t;
typedef struct journal_revoke_header_s
{
journal_header_t r_header;
int r_count; /* Count of bytes used in the block */
} journal_revoke_header_t;
The "int" above should be a __u32. What is the size of int on
sparc64? Is it by any chance 8 bytes? If that's the problem then I'm
kind of surprised we haven't run into massive problems earlier, as
this looks like a long standing bug.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 14:58 ` Theodore Ts'o
@ 2016-08-30 15:03 ` John Paul Adrian Glaubitz
2016-08-30 15:12 ` Anatoly Pugachev
2016-08-30 16:45 ` Jose E. Marchesi
2 siblings, 0 replies; 10+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-08-30 15:03 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Anatoly Pugachev, linux-ext4, debian-sparc
Hi Ted!
On 08/30/2016 04:58 PM, Theodore Ts'o wrote:
> Since you are able to provke this in a debugger, can you have gdb
> print out the value of bh->b_data and offset, so we can be sure what's
> going on?
Since you're a Debian Developer yourself, you have access to the sparc64
porterbox I have set up. If you're interested, please feel free to use
notker [1] for any tests and debugging sessions.
Although I'm sure that Anatoly will follow up with your questions anyway.
Thanks,
Adrian
> [1] https://db.debian.org/machines.cgi?host=notker
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - glaubitz@debian.org
`. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 14:58 ` Theodore Ts'o
2016-08-30 15:03 ` John Paul Adrian Glaubitz
@ 2016-08-30 15:12 ` Anatoly Pugachev
2016-08-30 19:16 ` Theodore Ts'o
2016-08-30 16:45 ` Jose E. Marchesi
2 siblings, 1 reply; 10+ messages in thread
From: Anatoly Pugachev @ 2016-08-30 15:12 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, debian-sparc
On Tue, Aug 30, 2016 at 5:58 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Tue, Aug 30, 2016 at 02:56:45PM +0200, John Paul Adrian Glaubitz wrote:
>> On 08/30/2016 02:42 PM, Anatoly Pugachev wrote:
>> > ../../e2fsck/recovery.c:866
>> > 866 blocknr = ext2fs_be64_to_cpu(* ((__u64
>> > *) (bh->b_data+offset)));
>>
>> The reason is that this expression is casting "char * b_data" [1] into u64 [2]
>> which provokes unaligned access. Since such expression are often inevitable,
>> it's probably best to modify the conversion macros in bitops.h [3] to be
>> safe against unaligned accesses.
>
> I don't think that's it. b_data is a 4k buffer should be 8 byte
> aligned. For a file system with 64-bit blocks (which you presumably
> have since we're on the be64 path as shown in your debugger output)
> the offset is initially set to 16, and is incremented in chunks of 8
> bytes. So there shouldn't be any unaligned access.
>
> Since you are able to provke this in a debugger, can you have gdb
> print out the value of bh->b_data and offset, so we can be sure what's
> going on?
(gdb) p bh->b_data
$1 = "\300;9\230\000\000\000\005\000\000\253\204\000\000\000\070\000\000\000\000\000\000$\022\000\000\000\000\000\000$<\000\000\000\000\000\000$\270\000\000\000\000\000\000$]\000\000\000\000\000\000$\024",
'\000' <repeats 967 times>
(gdb) p offset
$2 = 16
(gdb) p *bh->b_data
$3 = -64 '\300'
(gdb) p *(bh->b_data+offset)
$6 = 0 '\000'
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 14:58 ` Theodore Ts'o
2016-08-30 15:03 ` John Paul Adrian Glaubitz
2016-08-30 15:12 ` Anatoly Pugachev
@ 2016-08-30 16:45 ` Jose E. Marchesi
2 siblings, 0 replies; 10+ messages in thread
From: Jose E. Marchesi @ 2016-08-30 16:45 UTC (permalink / raw)
To: Theodore Ts'o
Cc: John Paul Adrian Glaubitz, Anatoly Pugachev, linux-ext4,
debian-sparc
typedef struct journal_revoke_header_s
{
journal_header_t r_header;
int r_count; /* Count of bytes used in the block */
} journal_revoke_header_t;
The "int" above should be a __u32. What is the size of int on
sparc64? Is it by any chance 8 bytes? If that's the problem then I'm
kind of surprised we haven't run into massive problems earlier, as
this looks like a long standing bug.
sizeof(int) == 4 in both sparc64-linux-gnu and sparcv9-linux-gnu (32bit)
targets.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 15:12 ` Anatoly Pugachev
@ 2016-08-30 19:16 ` Theodore Ts'o
2016-08-30 20:17 ` Anatoly Pugachev
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2016-08-30 19:16 UTC (permalink / raw)
To: Anatoly Pugachev; +Cc: linux-ext4, debian-sparc
On Tue, Aug 30, 2016 at 06:12:39PM +0300, Anatoly Pugachev wrote:
>
> (gdb) p bh->b_data
> $1 = "\300;9\230\000\000\000\005\000\000\253\204\000\000\000\070\000\000\000\000\000\000$\022\000\000\000\000\000\000$<\000\000\000\000\000\000$\270\000\000\000\000\000\000$]\000\000\000\000\000\000$\024",
> '\000' <repeats 967 times>
> (gdb) p offset
> $2 = 16
> (gdb) p *bh->b_data
> $3 = -64 '\300'
> (gdb) p *(bh->b_data+offset)
> $6 = 0 '\000'
Can you give us "p &bh->b_data" (so we can get the starting address of
b_data to make sure it's aligned) and "p offset" (so we can check and
make sure offset is sane)?
Thanks,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 19:16 ` Theodore Ts'o
@ 2016-08-30 20:17 ` Anatoly Pugachev
2016-08-30 20:25 ` Darrick J. Wong
0 siblings, 1 reply; 10+ messages in thread
From: Anatoly Pugachev @ 2016-08-30 20:17 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, debian-sparc
On Tue, Aug 30, 2016 at 10:16 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Tue, Aug 30, 2016 at 06:12:39PM +0300, Anatoly Pugachev wrote:
>>
>> (gdb) p bh->b_data
>> $1 = "\300;9\230\000\000\000\005\000\000\253\204\000\000\000\070\000\000\000\000\000\000$\022\000\000\000\000\000\000$<\000\000\000\000\000\000$\270\000\000\000\000\000\000$]\000\000\000\000\000\000$\024",
>> '\000' <repeats 967 times>
>> (gdb) p offset
>> $2 = 16
>> (gdb) p *bh->b_data
>> $3 = -64 '\300'
>> (gdb) p *(bh->b_data+offset)
>> $6 = 0 '\000'
>
> Can you give us "p &bh->b_data" (so we can get the starting address of
> b_data to make sure it's aligned) and "p offset" (so we can check and
> make sure offset is sane)?
(gdb) p &bh->b_data
$7 = (char (*)[1024]) 0x2e9b9c
(gdb) p offset
$8 = 16
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 20:17 ` Anatoly Pugachev
@ 2016-08-30 20:25 ` Darrick J. Wong
2016-09-01 1:26 ` Theodore Ts'o
0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-30 20:25 UTC (permalink / raw)
To: Anatoly Pugachev; +Cc: Theodore Ts'o, linux-ext4, debian-sparc
On Tue, Aug 30, 2016 at 11:17:07PM +0300, Anatoly Pugachev wrote:
> On Tue, Aug 30, 2016 at 10:16 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> > On Tue, Aug 30, 2016 at 06:12:39PM +0300, Anatoly Pugachev wrote:
> >>
> >> (gdb) p bh->b_data
> >> $1 = "\300;9\230\000\000\000\005\000\000\253\204\000\000\000\070\000\000\000\000\000\000$\022\000\000\000\000\000\000$<\000\000\000\000\000\000$\270\000\000\000\000\000\000$]\000\000\000\000\000\000$\024",
> >> '\000' <repeats 967 times>
> >> (gdb) p offset
> >> $2 = 16
> >> (gdb) p *bh->b_data
> >> $3 = -64 '\300'
> >> (gdb) p *(bh->b_data+offset)
> >> $6 = 0 '\000'
> >
> > Can you give us "p &bh->b_data" (so we can get the starting address of
> > b_data to make sure it's aligned) and "p offset" (so we can check and
> > make sure offset is sane)?
>
> (gdb) p &bh->b_data
> $7 = (char (*)[1024]) 0x2e9b9c
> (gdb) p offset
> $8 = 16
AFAICT, each bh is malloc'd via e2fsck_allocate_memory and nothing seems
to guarantee that the char b_data[1024] will be aligned to a multiple of
8 (it certainly isn't on x64), so I guess this isn't much of a surprise.
We could change b_data to a pointer and then posix_memalign it.
--D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [sparc64] sigbus in e2fsck
2016-08-30 20:25 ` Darrick J. Wong
@ 2016-09-01 1:26 ` Theodore Ts'o
0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2016-09-01 1:26 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Anatoly Pugachev, linux-ext4, debian-sparc
On Tue, Aug 30, 2016 at 01:25:10PM -0700, Darrick J. Wong wrote:
>
> AFAICT, each bh is malloc'd via e2fsck_allocate_memory and nothing seems
> to guarantee that the char b_data[1024] will be aligned to a multiple of
> 8 (it certainly isn't on x64), so I guess this isn't much of a surprise.
>
> We could change b_data to a pointer and then posix_memalign it.
Actually, all we need to do is to rearrange the structure elements so
it looks like this:
unsigned long long b_blocknr;
char b_data[1024];
};
Since b_blocknr will need to be eight byte aligned, this will also
ensure that b_data will also be aligned correctly.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-09-02 6:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-30 12:42 [sparc64] sigbus in e2fsck Anatoly Pugachev
2016-08-30 12:56 ` John Paul Adrian Glaubitz
2016-08-30 14:58 ` Theodore Ts'o
2016-08-30 15:03 ` John Paul Adrian Glaubitz
2016-08-30 15:12 ` Anatoly Pugachev
2016-08-30 19:16 ` Theodore Ts'o
2016-08-30 20:17 ` Anatoly Pugachev
2016-08-30 20:25 ` Darrick J. Wong
2016-09-01 1:26 ` Theodore Ts'o
2016-08-30 16:45 ` Jose E. Marchesi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox