* Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
@ 2017-11-03 14:00 Raphael Hertzog
2017-11-04 10:00 ` Amir Goldstein
0 siblings, 1 reply; 11+ messages in thread
From: Raphael Hertzog @ 2017-11-03 14:00 UTC (permalink / raw)
To: linux-unionfs
Hello,
the Kali live image contains PostgreSQL and with our latest image[1]
PostgreSQL is no longer able to create a new database (/ is an overlayfs
merging a RO squashfs filesystem and an empty RW tmpfs).
[1] http://cdimage.kali.org/kali-weekly/kali-linux-2017-W43-amd64.iso
(if 404 look up the parent directory and pick the currently available
weekly image)
You get messages like this while running createdb:
createdb: database creation failed: ERROR: checkpoint request failed
HINT: Consult recent messages in the server log for details.
And in the server log you have this:
ERROR: could not fsync file "pg_commit_ts": Invalid argument
This was working with Linux 4.12.6 and seems to be broken in Linux 4.13.4
that we currently have in Kali Linux.
Does this ring a bell to anyone ? I saw a few patches in 4.13.6 but their
description does not seem to match the error I'm getting.
What else can I provide to help you diagnose this issue ?
This bug report has been filed against Kali Linux in the first place:
https://bugs.kali.org/view.php?id=4332
Cheers,
--
Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer
Discover the Debian Administrator's Handbook:
→ https://debian-handbook.info/get/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-03 14:00 Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL Raphael Hertzog
@ 2017-11-04 10:00 ` Amir Goldstein
2017-11-07 10:11 ` Raphael Hertzog
0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-11-04 10:00 UTC (permalink / raw)
To: Raphael Hertzog; +Cc: overlayfs
On Fri, Nov 3, 2017 at 4:00 PM, Raphael Hertzog <raphael@ouaza.com> wrote:
> Hello,
>
> the Kali live image contains PostgreSQL and with our latest image[1]
> PostgreSQL is no longer able to create a new database (/ is an overlayfs
> merging a RO squashfs filesystem and an empty RW tmpfs).
>
> [1] http://cdimage.kali.org/kali-weekly/kali-linux-2017-W43-amd64.iso
> (if 404 look up the parent directory and pick the currently available
> weekly image)
>
> You get messages like this while running createdb:
> createdb: database creation failed: ERROR: checkpoint request failed
> HINT: Consult recent messages in the server log for details.
>
> And in the server log you have this:
> ERROR: could not fsync file "pg_commit_ts": Invalid argument
>
> This was working with Linux 4.12.6 and seems to be broken in Linux 4.13.4
> that we currently have in Kali Linux.
>
> Does this ring a bell to anyone ?
Not really there was a regression with squashfs+overlayfs but it is
fixed in 4.13.4.
(fix false positive ESTALE on lookup)
I tries mounting squashfs+overlayfs to /var/lib/postgresql and create
db on Ubuntu
and it seemed ok.
> I saw a few patches in 4.13.6 but their
> description does not seem to match the error I'm getting.
I don't think those fixes are relevant.
>
> What else can I provide to help you diagnose this issue ?
>
> This bug report has been filed against Kali Linux in the first place:
> https://bugs.kali.org/view.php?id=4332
>
- strace of the failing postgresql db create
- dmesg from that time, specifically "overlayfs" messages
- output of cat /proc/self/mountinfo (or better postgres pid instead of self)
Thanks,
Amir.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-04 10:00 ` Amir Goldstein
@ 2017-11-07 10:11 ` Raphael Hertzog
2017-11-07 11:01 ` Amir Goldstein
0 siblings, 1 reply; 11+ messages in thread
From: Raphael Hertzog @ 2017-11-07 10:11 UTC (permalink / raw)
To: Amir Goldstein; +Cc: overlayfs
Hello Amir,
Le samedi 04 novembre 2017, Amir Goldstein a écrit :
> I tries mounting squashfs+overlayfs to /var/lib/postgresql and create
> db on Ubuntu and it seemed ok.
FWIW, in my failing case, it uses PostgreSQL 10.0 as in Debian
Testing/Unstable. In Ubuntu, it's only available in Bionic Beaver (development
release).
> > This bug report has been filed against Kali Linux in the first place:
> > https://bugs.kali.org/view.php?id=4332
>
> - strace of the failing postgresql db create
> - dmesg from that time, specifically "overlayfs" messages
> - output of cat /proc/self/mountinfo (or better postgres pid instead of self)
Here's the relevant output, there's nothing appearing in kernel messages
at the time of the error.
root@kali:~# ps aux|grep postgres
postgres 31597 2.7 1.2 294416 25648 ? S 09:32 0:00 /usr/lib/postgresql/10/bin/postgres -D /var/lib/postgresql/10/main -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres 31599 0.0 0.1 294416 3928 ? Ss 09:32 0:00 postgres: 10/main: checkpointer process
postgres 31600 0.0 0.1 294416 3928 ? Ss 09:32 0:00 postgres: 10/main: writer process
postgres 31601 0.0 0.1 294416 3928 ? Ss 09:32 0:00 postgres: 10/main: wal writer process
postgres 31602 0.0 0.3 294840 6624 ? Ss 09:32 0:00 postgres: 10/main: autovacuum launcher process
postgres 31603 0.0 0.1 149348 3352 ? Ss 09:32 0:00 postgres: 10/main: stats collector process
postgres 31604 0.0 0.2 294708 4880 ? Ss 09:32 0:00 postgres: 10/main: bgworker: logical replication launcher
root 31618 0.0 0.0 12720 964 pts/1 S+ 09:32 0:00 grep postgres
root@kali:~# cat /proc/31597/mountinfo
17 26 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:10 - sysfs sysfs rw
18 26 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:16 - proc proc rw
19 26 0:6 / /dev rw,nosuid,relatime shared:5 - devtmpfs udev rw,size=1005564k,nr_inodes=251391,mode=755
20 19 0:18 / /dev/pts rw,nosuid,noexec,relatime shared:6 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
21 26 0:19 / /run rw,nosuid,noexec,relatime shared:8 - tmpfs tmpfs rw,size=205196k,mode=755
22 26 11:0 / /lib/live/mount/medium ro,noatime shared:2 - iso9660 /dev/sr0 ro,nojoliet,check=s,map=n,blocksize=2048
23 26 7:0 / /lib/live/mount/rootfs/filesystem.squashfs ro,noatime shared:3 - squashfs /dev/loop0 ro
24 26 0:20 / /lib/live/mount/overlay rw,relatime shared:4 - tmpfs tmpfs rw
26 1 0:22 / / rw,noatime shared:1 - overlay overlay rw,lowerdir=//filesystem.squashfs/,upperdir=/live/overlay//rw,workdir=/live/overlay//work
25 17 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:11 - securityfs securityfs rw
29 19 0:23 / /dev/shm rw,nosuid,nodev shared:7 - tmpfs tmpfs rw
30 21 0:24 / /run/lock rw,nosuid,nodev,noexec,relatime shared:9 - tmpfs tmpfs rw,size=5120k
31 17 0:25 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:12 - tmpfs tmpfs ro,mode=755
32 31 0:26 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:13 - cgroup2 cgroup rw,nsdelegate
33 31 0:27 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,xattr,name=systemd
34 17 0:28 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:15 - pstore pstore rw
35 31 0:29 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,cpu,cpuacct
36 31 0:30 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,blkio
37 31 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,devices
38 31 0:32 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,net_cls,net_prio
39 31 0:33 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,pids
40 31 0:34 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,perf_event
41 31 0:35 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,memory
42 31 0:36 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,freezer
43 31 0:37 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,cpuset
44 18 0:38 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=24,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11412
45 17 0:8 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
46 19 0:39 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs rw,pagesize=2M
47 19 0:16 / /dev/mqueue rw,relatime shared:29 - mqueue mqueue rw
80 26 0:42 / /tmp rw,nosuid,nodev,relatime shared:30 - tmpfs tmpfs rw
117 44 0:44 / /proc/sys/fs/binfmt_misc rw,relatime shared:31 - binfmt_misc binfmt_misc rw
259 21 0:46 / /run/user/0 rw,nosuid,nodev,relatime shared:172 - tmpfs tmpfs rw,size=205192k,mode=700
266 259 0:47 / /run/user/0/gvfs rw,nosuid,nodev,relatime shared:177 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=0,group_id=0
274 17 0:48 / /sys/fs/fuse/connections rw,relatime shared:182 - fusectl fusectl rw
As for strace output, postgresql is split over multiple processes. The one that
generates the error in the log is 31599 (checkpointer process). I also attach
some file listing of the directories that it fails to fsync. strace looks like
this (in loop):
# strace -f -p 31599
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
open("pg_xact", O_RDONLY) = 3
fsync(3) = 0
close(3) = 0
open("pg_commit_ts", O_RDONLY) = 3
fsync(3) = -1 EINVAL (Invalid argument)
close(3) = 0
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE KILL SEGV CONT STOP SYS RTMIN RT_1], NULL, 8) = 0
write(2, "2017-11-07 09:47:38.580 UTC [315"..., 98) = 98
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
open("pg_xact", O_RDONLY) = 3
fsync(3) = 0
close(3) = 0
open("pg_commit_ts", O_RDONLY) = 3
fsync(3) = -1 EINVAL (Invalid argument)
close(3) = 0
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE KILL SEGV CONT STOP SYS RTMIN RT_1], NULL, 8) = 0
write(2, "2017-11-07 09:47:39.582 UTC [315"..., 98) = 98
# cat /proc/31599/mountinfo
17 26 0:17 / /sys rw,nosuid,nodev,noexec,relatime shared:10 - sysfs sysfs rw
18 26 0:5 / /proc rw,nosuid,nodev,noexec,relatime shared:16 - proc proc rw
19 26 0:6 / /dev rw,nosuid,relatime shared:5 - devtmpfs udev rw,size=1005564k,nr_inodes=251391,mode=755
20 19 0:18 / /dev/pts rw,nosuid,noexec,relatime shared:6 - devpts devpts rw,gid=5,mode=620,ptmxmode=000
21 26 0:19 / /run rw,nosuid,noexec,relatime shared:8 - tmpfs tmpfs rw,size=205196k,mode=755
22 26 11:0 / /lib/live/mount/medium ro,noatime shared:2 - iso9660 /dev/sr0 ro,nojoliet,check=s,map=n,blocksize=2048
23 26 7:0 / /lib/live/mount/rootfs/filesystem.squashfs ro,noatime shared:3 - squashfs /dev/loop0 ro
24 26 0:20 / /lib/live/mount/overlay rw,relatime shared:4 - tmpfs tmpfs rw
26 1 0:22 / / rw,noatime shared:1 - overlay overlay rw,lowerdir=//filesystem.squashfs/,upperdir=/live/overlay//rw,workdir=/live/overlay//work
25 17 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:11 - securityfs securityfs rw
29 19 0:23 / /dev/shm rw,nosuid,nodev shared:7 - tmpfs tmpfs rw
30 21 0:24 / /run/lock rw,nosuid,nodev,noexec,relatime shared:9 - tmpfs tmpfs rw,size=5120k
31 17 0:25 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:12 - tmpfs tmpfs ro,mode=755
32 31 0:26 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:13 - cgroup2 cgroup rw,nsdelegate
33 31 0:27 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,xattr,name=systemd
34 17 0:28 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:15 - pstore pstore rw
35 31 0:29 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,cpu,cpuacct
36 31 0:30 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,blkio
37 31 0:31 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:19 - cgroup cgroup rw,devices
38 31 0:32 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:20 - cgroup cgroup rw,net_cls,net_prio
39 31 0:33 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,pids
40 31 0:34 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:22 - cgroup cgroup rw,perf_event
41 31 0:35 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,memory
42 31 0:36 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,freezer
43 31 0:37 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,cpuset
44 18 0:38 / /proc/sys/fs/binfmt_misc rw,relatime shared:26 - autofs systemd-1 rw,fd=24,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11412
45 17 0:8 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
46 19 0:39 / /dev/hugepages rw,relatime shared:28 - hugetlbfs hugetlbfs rw,pagesize=2M
47 19 0:16 / /dev/mqueue rw,relatime shared:29 - mqueue mqueue rw
80 26 0:42 / /tmp rw,nosuid,nodev,relatime shared:30 - tmpfs tmpfs rw
117 44 0:44 / /proc/sys/fs/binfmt_misc rw,relatime shared:31 - binfmt_misc binfmt_misc rw
259 21 0:46 / /run/user/0 rw,nosuid,nodev,relatime shared:172 - tmpfs tmpfs rw,size=205192k,mode=700
266 259 0:47 / /run/user/0/gvfs rw,nosuid,nodev,relatime shared:177 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=0,group_id=0
274 17 0:48 / /sys/fs/fuse/connections rw,relatime shared:182 - fusectl fusectl rw
# ls -al /proc/31599/cwd
lrwxrwxrwx 1 postgres postgres 0 Nov 7 09:48 /proc/31599/cwd -> /var/lib/postgresql/10/main
root@kali:~# ls -al /var/lib/postgresql/10/main
total 9
drwx------ 1 postgres postgres 200 Nov 7 09:32 .
drwxr-xr-x 1 postgres postgres 60 Oct 30 13:21 ..
drwx------ 1 postgres postgres 60 Oct 30 13:21 base
drwx------ 1 postgres postgres 360 Nov 7 09:32 global
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_commit_ts
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_dynshmem
drwx------ 4 postgres postgres 77 Oct 30 13:21 pg_logical
drwx------ 1 postgres postgres 60 Oct 30 13:21 pg_multixact
drwx------ 1 postgres postgres 60 Nov 7 09:32 pg_notify
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_replslot
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_serial
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_snapshots
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_stat
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_stat_tmp
drwx------ 2 postgres postgres 27 Oct 30 13:21 pg_subtrans
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_tblspc
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_twophase
-rw------- 1 postgres postgres 3 Oct 30 13:21 PG_VERSION
drwx------ 1 postgres postgres 60 Oct 30 13:21 pg_wal
drwx------ 1 postgres postgres 60 Oct 30 13:21 pg_xact
-rw------- 1 postgres postgres 88 Oct 30 13:21 postgresql.auto.conf
-rw------- 1 postgres postgres 130 Nov 7 09:32 postmaster.opts
-rw------- 1 postgres postgres 109 Nov 7 09:32 postmaster.pid
root@kali:~# ls -al /var/lib/postgresql/10/main/pg_commit_ts/
total 0
drwx------ 2 postgres postgres 3 Oct 30 13:21 .
drwx------ 1 postgres postgres 200 Nov 7 09:32 ..
root@kali:~# ls -al /var/lib/postgresql/10/main/pg_xact/
total 8
drwx------ 1 postgres postgres 60 Oct 30 13:21 .
drwx------ 1 postgres postgres 200 Nov 7 09:32 ..
-rw------- 1 postgres postgres 8192 Nov 7 09:42 0000
And here are the files in the underlying squashfs:
root@kali:~# ls -al /lib/live/mount/rootfs/filesystem.squashfs/var/lib/postgresql/10/main/
total 1
drwx------ 19 postgres postgres 367 Oct 30 13:21 .
drwxr-xr-x 3 postgres postgres 27 Oct 30 13:21 ..
drwx------ 5 postgres postgres 74 Oct 30 13:21 base
drwx------ 2 postgres postgres 822 Oct 30 13:21 global
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_commit_ts
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_dynshmem
drwx------ 4 postgres postgres 77 Oct 30 13:21 pg_logical
drwx------ 4 postgres postgres 45 Oct 30 13:21 pg_multixact
drwx------ 2 postgres postgres 27 Oct 30 13:21 pg_notify
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_replslot
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_serial
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_snapshots
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_stat
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_stat_tmp
drwx------ 2 postgres postgres 27 Oct 30 13:21 pg_subtrans
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_tblspc
drwx------ 2 postgres postgres 3 Oct 30 13:21 pg_twophase
-rw------- 1 postgres postgres 3 Oct 30 13:21 PG_VERSION
drwx------ 3 postgres postgres 69 Oct 30 13:21 pg_wal
drwx------ 2 postgres postgres 27 Oct 30 13:21 pg_xact
-rw------- 1 postgres postgres 88 Oct 30 13:21 postgresql.auto.conf
root@kali:~# ls -al /lib/live/mount/rootfs/filesystem.squashfs/var/lib/postgresql/10/main/pg_xact/
total 8
drwx------ 2 postgres postgres 27 Oct 30 13:21 .
drwx------ 19 postgres postgres 367 Oct 30 13:21 ..
-rw------- 1 postgres postgres 8192 Oct 30 13:21 0000
root@kali:~# ls -al /lib/live/mount/rootfs/filesystem.squashfs/var/lib/postgresql/10/main/pg_commit_ts/
total 0
drwx------ 2 postgres postgres 3 Oct 30 13:21 .
drwx------ 19 postgres postgres 367 Oct 30 13:21 ..
Cheers,
--
Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer
Discover the Debian Administrator's Handbook:
→ https://debian-handbook.info/get/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 10:11 ` Raphael Hertzog
@ 2017-11-07 11:01 ` Amir Goldstein
2017-11-07 11:54 ` Jeff Layton
2017-11-07 12:59 ` Raphael Hertzog
0 siblings, 2 replies; 11+ messages in thread
From: Amir Goldstein @ 2017-11-07 11:01 UTC (permalink / raw)
To: Raphael Hertzog; +Cc: overlayfs, Jeff Layton
On Tue, Nov 7, 2017 at 12:11 PM, Raphael Hertzog <raphael@ouaza.com> wrote:
> Hello Amir,
>
> Le samedi 04 novembre 2017, Amir Goldstein a écrit :
>> I tries mounting squashfs+overlayfs to /var/lib/postgresql and create
>> db on Ubuntu and it seemed ok.
>
> FWIW, in my failing case, it uses PostgreSQL 10.0 as in Debian
> Testing/Unstable. In Ubuntu, it's only available in Bionic Beaver (development
> release).
And is this the same PostgreSQL version that worked with kernel v4.12.6?
[...]
> As for strace output, postgresql is split over multiple processes. The one that
> generates the error in the log is 31599 (checkpointer process). I also attach
> some file listing of the directories that it fails to fsync. strace looks like
> this (in loop):
>
> # strace -f -p 31599
> select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> open("pg_xact", O_RDONLY) = 3
> fsync(3) = 0
> close(3) = 0
> open("pg_commit_ts", O_RDONLY) = 3
> fsync(3) = -1 EINVAL (Invalid argument)
The reason for the error is quite straight forward.
open O_RDONLY gets an open file on lower read-only squashfs
that doesn't have an fsync operation, so fsync returns EINVAL as per
the man page documentation:
EROFS, EINVAL
fd is bound to a special file which does not support
synchronization.
The question is why do you see this error only now.
There is nothing I know of that changed in overlayfs which could explain
why this behavior has not been the same forever.
CC'ing Jeff, because his fsync error reporting infrastructure was merged to
kernel v4.13 where you report the alleged regression, but I didn't find how
his changes could have changed this behavior, as EINVAL is returned right
in the entry of the fsync system call.
int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
{
struct inode *inode = file->f_mapping->host;
if (!file->f_op->fsync)
return -EINVAL;
Perhaps this a PostgreSQL regression (when storing db on overlayfs/squashfs)
that should be reported to PostgreSQL developers?
Working around this in kernel is possible, but requires overriding
file_operations
for all overlay regular files. Not a small change, but we might want
to do it for
other reasons anyway.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 11:01 ` Amir Goldstein
@ 2017-11-07 11:54 ` Jeff Layton
2017-11-07 12:16 ` Amir Goldstein
2017-11-07 12:59 ` Raphael Hertzog
1 sibling, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2017-11-07 11:54 UTC (permalink / raw)
To: Amir Goldstein, Raphael Hertzog; +Cc: overlayfs
On Tue, 2017-11-07 at 13:01 +0200, Amir Goldstein wrote:
> On Tue, Nov 7, 2017 at 12:11 PM, Raphael Hertzog <raphael@ouaza.com> wrote:
> > Hello Amir,
> >
> > Le samedi 04 novembre 2017, Amir Goldstein a écrit :
> > > I tries mounting squashfs+overlayfs to /var/lib/postgresql and create
> > > db on Ubuntu and it seemed ok.
> >
> > FWIW, in my failing case, it uses PostgreSQL 10.0 as in Debian
> > Testing/Unstable. In Ubuntu, it's only available in Bionic Beaver (development
> > release).
>
> And is this the same PostgreSQL version that worked with kernel v4.12.6?
>
> [...]
>
> > As for strace output, postgresql is split over multiple processes. The one that
> > generates the error in the log is 31599 (checkpointer process). I also attach
> > some file listing of the directories that it fails to fsync. strace looks like
> > this (in loop):
> >
> > # strace -f -p 31599
> > select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> > open("pg_xact", O_RDONLY) = 3
> > fsync(3) = 0
> > close(3) = 0
> > open("pg_commit_ts", O_RDONLY) = 3
> > fsync(3) = -1 EINVAL (Invalid argument)
>
> The reason for the error is quite straight forward.
> open O_RDONLY gets an open file on lower read-only squashfs
> that doesn't have an fsync operation, so fsync returns EINVAL as per
> the man page documentation:
>
> EROFS, EINVAL
> fd is bound to a special file which does not support
> synchronization.
>
If that's the case, then why didn't the fsync(3) call not return
EINVAL? Was it because it was copied up first?
If so, then maybe something changed in v4.13 to cause the pg_commit_ts
file to not have been be copied up here, when it would have before?
> The question is why do you see this error only now.
> There is nothing I know of that changed in overlayfs which could explain
> why this behavior has not been the same forever.
>
> CC'ing Jeff, because his fsync error reporting infrastructure was merged to
> kernel v4.13 where you report the alleged regression, but I didn't find how
> his changes could have changed this behavior, as EINVAL is returned right
> in the entry of the fsync system call.
>
> int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
> {
> struct inode *inode = file->f_mapping->host;
>
> if (!file->f_op->fsync)
> return -EINVAL;
>
> Perhaps this a PostgreSQL regression (when storing db on overlayfs/squashfs)
> that should be reported to PostgreSQL developers?
>
> Working around this in kernel is possible, but requires overriding
> file_operations
> for all overlay regular files. Not a small change, but we might want
> to do it for
> other reasons anyway.
It's also possible that something in the writable layer recorded a
EINVAL error during writeback, and that's what you're seeing here. That
seems less likely though.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 11:54 ` Jeff Layton
@ 2017-11-07 12:16 ` Amir Goldstein
2017-11-07 14:32 ` Raphael Hertzog
2017-11-07 16:54 ` Jeff Layton
0 siblings, 2 replies; 11+ messages in thread
From: Amir Goldstein @ 2017-11-07 12:16 UTC (permalink / raw)
To: Jeff Layton; +Cc: Raphael Hertzog, overlayfs
On Tue, Nov 7, 2017 at 1:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
> On Tue, 2017-11-07 at 13:01 +0200, Amir Goldstein wrote:
>> On Tue, Nov 7, 2017 at 12:11 PM, Raphael Hertzog <raphael@ouaza.com> wrote:
>> > Hello Amir,
>> >
>> > Le samedi 04 novembre 2017, Amir Goldstein a écrit :
>> > > I tries mounting squashfs+overlayfs to /var/lib/postgresql and create
>> > > db on Ubuntu and it seemed ok.
>> >
>> > FWIW, in my failing case, it uses PostgreSQL 10.0 as in Debian
>> > Testing/Unstable. In Ubuntu, it's only available in Bionic Beaver (development
>> > release).
>>
>> And is this the same PostgreSQL version that worked with kernel v4.12.6?
>>
>> [...]
>>
>> > As for strace output, postgresql is split over multiple processes. The one that
>> > generates the error in the log is 31599 (checkpointer process). I also attach
>> > some file listing of the directories that it fails to fsync. strace looks like
>> > this (in loop):
>> >
>> > # strace -f -p 31599
>> > select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
>> > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
>> > open("pg_xact", O_RDONLY) = 3
>> > fsync(3) = 0
>> > close(3) = 0
>> > open("pg_commit_ts", O_RDONLY) = 3
>> > fsync(3) = -1 EINVAL (Invalid argument)
>>
>> The reason for the error is quite straight forward.
>> open O_RDONLY gets an open file on lower read-only squashfs
>> that doesn't have an fsync operation, so fsync returns EINVAL as per
>> the man page documentation:
>>
>> EROFS, EINVAL
>> fd is bound to a special file which does not support
>> synchronization.
>>
>
> If that's the case, then why didn't the fsync(3) call not return
> EINVAL? Was it because it was copied up first?
Allegedly yes.
We see in ls -l at the end of report that file 0000 inside pg_xact
mtime (Nov 7) is newer than squashfs mtime (Oct 30).
>
> If so, then maybe something changed in v4.13 to cause the pg_commit_ts
> file
Wait, I misread the information in the report and I wrongly assumed that
pg_commit_ts is a file. It is not. it's a directory in which case, the
inode is an
overlay inode and it does have fsync f_op.
But in the case of a lower directory that has no been copied up (which seems
to be the case with pg_commit_ts) overlayfs will simple vfs_fsync_range the
lower dir, so we are back to EINVAL.
> to not have been be copied up here, when it would have before?
>
That is possible, but I would need more information about all the previous
access to directory pg_commit_ts by postgresql to figure it out.
Are there any aspects of fsync error reporting for directory fsync that
we need to consider as leads to investigate?
Amir.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 11:01 ` Amir Goldstein
2017-11-07 11:54 ` Jeff Layton
@ 2017-11-07 12:59 ` Raphael Hertzog
1 sibling, 0 replies; 11+ messages in thread
From: Raphael Hertzog @ 2017-11-07 12:59 UTC (permalink / raw)
To: Amir Goldstein; +Cc: overlayfs, Jeff Layton
Hi,
Le mardi 07 novembre 2017, Amir Goldstein a écrit :
> > FWIW, in my failing case, it uses PostgreSQL 10.0 as in Debian
> > Testing/Unstable. In Ubuntu, it's only available in Bionic Beaver (development
> > release).
>
> And is this the same PostgreSQL version that worked with kernel v4.12.6?
No. Postgresql 10 became the default in Debian Testing on October 11th
and we switched to Linux 3.13.4 on October 9th, so at about the same
time...
And it seems likely that this fsync() call has been added as part of
PostgreSQL 10 in that commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1b02be21f271db6bd3cd43abb23fa596fcb6bac3
> Perhaps this a PostgreSQL regression (when storing db on overlayfs/squashfs)
> that should be reported to PostgreSQL developers?
I will do that in parallel but it seems awkward to have a writable
filesystem where fsync() is not working... I can understand the desire
to delegate to the underlying filesystem but that's one more case where we
might want to hide the fact that we have a read-only layer below.
Cheers,
--
Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer
Discover the Debian Administrator's Handbook:
→ https://debian-handbook.info/get/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 12:16 ` Amir Goldstein
@ 2017-11-07 14:32 ` Raphael Hertzog
2017-11-07 15:14 ` Amir Goldstein
2017-11-07 16:54 ` Jeff Layton
1 sibling, 1 reply; 11+ messages in thread
From: Raphael Hertzog @ 2017-11-07 14:32 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jeff Layton, overlayfs
Hello,
> Wait, I misread the information in the report and I wrongly assumed that
> pg_commit_ts is a file. It is not. it's a directory in which case, the
> inode is an overlay inode and it does have fsync f_op.
> But in the case of a lower directory that has no been copied up (which seems
> to be the case with pg_commit_ts) overlayfs will simple vfs_fsync_range the
> lower dir, so we are back to EINVAL.
Thanks for the explanation, with this understanding I can at least create
an easy work-around (touch a file in the directory, and drop it again,
just before starting postgresql).
> > to not have been be copied up here, when it would have before?
>
> That is possible, but I would need more information about all the previous
> access to directory pg_commit_ts by postgresql to figure it out.
FWIW I reported this fsync() on a non-modified directory to PostgreSQL
developers here:
https://www.postgresql.org/message-id/20171107135454.lbelbbvfgadljmuj%40home.ouaza.com
Cheers,
--
Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer
Discover the Debian Administrator's Handbook:
→ https://debian-handbook.info/get/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 14:32 ` Raphael Hertzog
@ 2017-11-07 15:14 ` Amir Goldstein
2017-11-07 16:30 ` Raphael Hertzog
0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-11-07 15:14 UTC (permalink / raw)
To: Raphael Hertzog; +Cc: Jeff Layton, overlayfs
On Tue, Nov 7, 2017 at 4:32 PM, Raphael Hertzog <raphael@ouaza.com> wrote:
> Hello,
>
>> Wait, I misread the information in the report and I wrongly assumed that
>> pg_commit_ts is a file. It is not. it's a directory in which case, the
>> inode is an overlay inode and it does have fsync f_op.
>> But in the case of a lower directory that has no been copied up (which seems
>> to be the case with pg_commit_ts) overlayfs will simple vfs_fsync_range the
>> lower dir, so we are back to EINVAL.
>
> Thanks for the explanation, with this understanding I can at least create
> an easy work-around (touch a file in the directory, and drop it again,
> just before starting postgresql).
And you won't be the first to apply that sort of workaround:
http://man7.org/linux/man-pages/man1/yum-ovl.1.html
>
>> > to not have been be copied up here, when it would have before?
>>
>> That is possible, but I would need more information about all the previous
>> access to directory pg_commit_ts by postgresql to figure it out.
>
> FWIW I reported this fsync() on a non-modified directory to PostgreSQL
> developers here:
> https://www.postgresql.org/message-id/20171107135454.lbelbbvfgadljmuj%40home.ouaza.com
>
I guess regardless of what overlayfs behavior should be, PostgreSQL
developers better fix this somehow otherwise postgress upgrade is going to
break for users with stable kernel.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 15:14 ` Amir Goldstein
@ 2017-11-07 16:30 ` Raphael Hertzog
0 siblings, 0 replies; 11+ messages in thread
From: Raphael Hertzog @ 2017-11-07 16:30 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Jeff Layton, overlayfs
Le mardi 07 novembre 2017, Amir Goldstein a écrit :
> > FWIW I reported this fsync() on a non-modified directory to PostgreSQL
> > developers here:
> > https://www.postgresql.org/message-id/20171107135454.lbelbbvfgadljmuj%40home.ouaza.com
>
> I guess regardless of what overlayfs behavior should be, PostgreSQL
> developers better fix this somehow otherwise postgress upgrade is going to
> break for users with stable kernel.
Given the current set of answers, it seems unlikely that PostgreSQL will
stop doing those fsync() calls on unmodified directories because it's hard
for them to track whether the directory is dirty or not but it might start
to ignore EINVAL/EROFS errors on those fsync calls.
That said I don't think that the number of persons using PostgreSQL on top
of overlayfs is so important that they are worried about this breakage
pattern on upgrade...
Cheers,
--
Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer
Discover the Debian Administrator's Handbook:
→ https://debian-handbook.info/get/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL
2017-11-07 12:16 ` Amir Goldstein
2017-11-07 14:32 ` Raphael Hertzog
@ 2017-11-07 16:54 ` Jeff Layton
1 sibling, 0 replies; 11+ messages in thread
From: Jeff Layton @ 2017-11-07 16:54 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Raphael Hertzog, overlayfs
On Tue, 2017-11-07 at 14:16 +0200, Amir Goldstein wrote:
> On Tue, Nov 7, 2017 at 1:54 PM, Jeff Layton <jlayton@kernel.org> wrote:
> > On Tue, 2017-11-07 at 13:01 +0200, Amir Goldstein wrote:
> > > On Tue, Nov 7, 2017 at 12:11 PM, Raphael Hertzog <raphael@ouaza.com> wrote:
> > > > Hello Amir,
> > > >
> > > > Le samedi 04 novembre 2017, Amir Goldstein a écrit :
> > > > > I tries mounting squashfs+overlayfs to /var/lib/postgresql and create
> > > > > db on Ubuntu and it seemed ok.
> > > >
> > > > FWIW, in my failing case, it uses PostgreSQL 10.0 as in Debian
> > > > Testing/Unstable. In Ubuntu, it's only available in Bionic Beaver (development
> > > > release).
> > >
> > > And is this the same PostgreSQL version that worked with kernel v4.12.6?
> > >
> > > [...]
> > >
> > > > As for strace output, postgresql is split over multiple processes. The one that
> > > > generates the error in the log is 31599 (checkpointer process). I also attach
> > > > some file listing of the directories that it fails to fsync. strace looks like
> > > > this (in loop):
> > > >
> > > > # strace -f -p 31599
> > > > select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
> > > > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> > > > open("pg_xact", O_RDONLY) = 3
> > > > fsync(3) = 0
> > > > close(3) = 0
> > > > open("pg_commit_ts", O_RDONLY) = 3
> > > > fsync(3) = -1 EINVAL (Invalid argument)
> > >
> > > The reason for the error is quite straight forward.
> > > open O_RDONLY gets an open file on lower read-only squashfs
> > > that doesn't have an fsync operation, so fsync returns EINVAL as per
> > > the man page documentation:
> > >
> > > EROFS, EINVAL
> > > fd is bound to a special file which does not support
> > > synchronization.
> > >
> >
> > If that's the case, then why didn't the fsync(3) call not return
> > EINVAL? Was it because it was copied up first?
>
> Allegedly yes.
> We see in ls -l at the end of report that file 0000 inside pg_xact
> mtime (Nov 7) is newer than squashfs mtime (Oct 30).
>
> >
> > If so, then maybe something changed in v4.13 to cause the pg_commit_ts
> > file
>
> Wait, I misread the information in the report and I wrongly assumed that
> pg_commit_ts is a file. It is not. it's a directory in which case, the
> inode is an
> overlay inode and it does have fsync f_op.
> But in the case of a lower directory that has no been copied up (which seems
> to be the case with pg_commit_ts) overlayfs will simple vfs_fsync_range the
> lower dir, so we are back to EINVAL.
>
> > to not have been be copied up here, when it would have before?
> >
>
> That is possible, but I would need more information about all the previous
> access to directory pg_commit_ts by postgresql to figure it out.
>
> Are there any aspects of fsync error reporting for directory fsync that
> we need to consider as leads to investigate?
>
> Amir.
At the VFS layer, we don't really make a big distinction between file
and dir inodes with fsync. If it has dirty data, it'll get synced out
either way.
If you think that the -EINVAL is getting stored and reported via the
inode's errseq_t, you can try enabling the file_check_and_advance_wb_err
and filemap_set_wb_err tracepoints to catch it. Those only fire when an
error is reported or recorded via that subsystem.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-11-07 16:54 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-03 14:00 Regression in overlayfs in 4.13: "could not fsync file" error by PostgreSQL Raphael Hertzog
2017-11-04 10:00 ` Amir Goldstein
2017-11-07 10:11 ` Raphael Hertzog
2017-11-07 11:01 ` Amir Goldstein
2017-11-07 11:54 ` Jeff Layton
2017-11-07 12:16 ` Amir Goldstein
2017-11-07 14:32 ` Raphael Hertzog
2017-11-07 15:14 ` Amir Goldstein
2017-11-07 16:30 ` Raphael Hertzog
2017-11-07 16:54 ` Jeff Layton
2017-11-07 12:59 ` Raphael Hertzog
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.