Strange "flush" process bahaviour

linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Strange "flush" process bahaviour
@ 2012-11-12 20:19 fwdnilfsml.to.11df-2pMamKoQTv4
       [not found] ` <50A159BC.9060600-3TRaYzfGuuI@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: fwdnilfsml.to.11df-2pMamKoQTv4 @ 2012-11-12 20:19 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hello!
Some time ago I've noticed, that "flush" kernel process associated with 
device running nilfs2 filesystem, generate huge amount of IO activity 
(using 100% CPU time). "iotop" utility show speeds about 5-15 GB\S on 
that "flush" process, but no actual disk activity happens.
This condition lasts about 1-2 minutes, and then it stops for some time 
(usually 10-20 minutes).

Is this a normal behavior ? I'm using kernel 3.6, and nilfs running on 
luks-encrypted partitions, that belongs to software raid-1 array.

//Sorry for my English

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found] ` <50A159BC.9060600-3TRaYzfGuuI@public.gmane.org>
@ 2012-11-13  6:13   ` Vyacheslav Dubeyko
  2012-11-13 13:04     ` Strange "flush" process bahaviour (fwdnilfsml: message 2 of 20) Александр
  2012-11-13 15:28   ` Piotr Szymaniak
  1 sibling, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-13  6:13 UTC (permalink / raw)
  To: fwdnilfsml.to.11df-2pMamKoQTv4; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Tue, 2012-11-13 at 00:19 +0400, fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org wrote:
> Hello!
> Some time ago I've noticed, that "flush" kernel process associated with 
> device running nilfs2 filesystem, generate huge amount of IO activity 
> (using 100% CPU time). "iotop" utility show speeds about 5-15 GB\S on 
> that "flush" process, but no actual disk activity happens.

Sorry, but I have some misunderstanding. From one point of view, you
said that filesystem generates I/O activity but as a result you haven't
disk activity. Does it means that you create any files but you haven't
it on partition after umount?

> This condition lasts about 1-2 minutes, and then it stops for some time 
> (usually 10-20 minutes).
> 
> Is this a normal behavior ? I'm using kernel 3.6, and nilfs running on 
> luks-encrypted partitions, that belongs to software raid-1 array.
> 

Could you share more details about your configuration and situation?

Please, share:
1. System log content related to the problem (NILFS2, I/O subsystem).
2. Output of "top" and "ps -ax" in the case of the issue.
3. Describe your partitions and disks configuration ("mount" output,
"cat /proc/partitions" output).
4. Output of "lscp" and "lssu".

With the best regards,
Vyacheslav Dubeyko.

> //Sorry for my English
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour (fwdnilfsml: message 2 of 20)
  2012-11-13  6:13   ` Vyacheslav Dubeyko
@ 2012-11-13 13:04     ` Александр
       [not found]       ` <50A24564.7000704-3TRaYzfGuuI@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Александр @ 2012-11-13 13:04 UTC (permalink / raw)
  To: Vyacheslav Dubeyko - slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

> From one point of view, you
> said that filesystem generates I/O activity but as a result you haven't
> disk activity.
Yes, iotop shows activity on flush process about 5-15 GB/S - but my HDD 
does not write anything at this time.


 > Does it means that you create any files but you haven't
> it on partition after umount?

No, there is no problems with data loss or corruption. High IO activity 
on flush process happens when I do not modify anything. Most often it 
happens in 1-2 minutes after OS boot. I found, that frequency of 
described behaviour somehow related to my access to files on that 
filesystem: when I read or, write files, the frequency of described 
behaviour increasing, but when system is idle - flush process stops it's 
wierd IO activity after some time.

> Could you share more details about your configuration and situation?
Sorry, I don't have access to my PC right now, but i've captured "ps ax" 
command output when flush processes are working. Also, I'll try to 
reproduce this strange behavior on quemu virtual machine. Maybe there is 
something wrong with my configuration...

this processes are related to devices, that run nilfs2 filesystem:
474  ?        R      0:09 [flush-253:0]
2388 ?        R      1:30 [flush-253:5]


   PID TTY      STAT   TIME COMMAND
     1 ?        Ss     0:01 /sbin/init
     2 ?        S      0:00 [kthreadd]
     3 ?        S      0:00 [ksoftirqd/0]
     4 ?        S      0:00 [kworker/0:0]
     5 ?        S<     0:00 [kworker/0:0H]
     6 ?        S      0:00 [kworker/u:0]
     7 ?        S<     0:00 [kworker/u:0H]
     8 ?        S      0:00 [migration/0]
     9 ?        S      0:00 [rcuc/0]
    10 ?        S      0:00 [rcun/0]
    11 ?        S      0:00 [rcub/0]
    12 ?        S      0:00 [rcun/1]
    13 ?        S      0:00 [rcub/1]
    14 ?        S      0:00 [watchdog/0]
    15 ?        S      0:00 [migration/1]
    16 ?        S      0:00 [kworker/1:0]
    17 ?        S<     0:00 [kworker/1:0H]
    18 ?        S      0:00 [rcuc/1]
    19 ?        S      0:00 [ksoftirqd/1]
    20 ?        S      0:00 [watchdog/1]
    21 ?        S      0:00 [migration/2]
    22 ?        S      0:00 [kworker/2:0]
    23 ?        S<     0:00 [kworker/2:0H]
    24 ?        S      0:00 [rcuc/2]
    25 ?        S      0:00 [ksoftirqd/2]
    26 ?        S      0:00 [watchdog/2]
    27 ?        S      0:00 [migration/3]
    28 ?        S      0:00 [kworker/3:0]
    29 ?        S<     0:00 [kworker/3:0H]
    30 ?        S      0:00 [rcuc/3]
    31 ?        S      0:00 [ksoftirqd/3]
    32 ?        S      0:00 [watchdog/3]
    33 ?        S<     0:00 [cpuset]
    34 ?        S<     0:00 [khelper]
    35 ?        S      0:00 [kdevtmpfs]
    36 ?        S<     0:00 [netns]
    37 ?        S      0:00 [bdi-default]
    38 ?        S<     0:00 [kintegrityd]
    39 ?        S<     0:00 [kblockd]
    40 ?        S<     0:00 [ata_sff]
    41 ?        S      0:00 [khubd]
    42 ?        S<     0:00 [md]
    43 ?        S      0:00 [kworker/0:1]
    44 ?        S      0:00 [kworker/2:1]
    45 ?        S      0:00 [khungtaskd]
    46 ?        S      0:00 [kswapd0]
    47 ?        SN     0:00 [ksmd]
    48 ?        SN     0:00 [khugepaged]
    49 ?        S      0:00 [fsnotify_mark]
    50 ?        S<     0:00 [crypto]
    55 ?        S<     0:00 [kthrotld]
    56 ?        S      0:00 [scsi_eh_0]
    57 ?        S      0:00 [scsi_eh_1]
    58 ?        S      0:00 [scsi_eh_2]
    59 ?        S      0:00 [scsi_eh_3]
    60 ?        S      0:00 [scsi_eh_4]
    61 ?        S      0:00 [scsi_eh_5]
    62 ?        S      0:00 [kworker/u:1]
    63 ?        S      0:00 [kworker/u:2]
    64 ?        S      0:00 [kworker/u:3]
    65 ?        S      0:00 [kworker/u:4]
    66 ?        S      0:00 [kworker/u:5]
    67 ?        S      0:00 [kworker/u:6]
    68 ?        S<     0:00 [kpsmoused]
    69 ?        S<     0:00 [deferwq]
    70 ?        S      0:00 [kworker/u:7]
    71 ?        S      0:00 [kworker/3:1]
    72 ?        S      0:00 [kworker/u:8]
    73 ?        S<     0:00 [kworker/0:1H]
    76 ?        S      0:00 [kworker/1:1]
   106 ?        S      0:00 [scsi_eh_6]
   107 ?        S      0:00 [scsi_eh_7]
   130 ?        S<     0:00 [xfsalloc]
   131 ?        S<     0:00 [xfs_mru_cache]
   132 ?        S<     0:00 [xfslogd]
   147 ?        S<     0:00 [kmpath_rdacd]
   303 ?        S<     0:00 [kworker/2:1H]
   306 ?        S      0:00 [md0_raid1]
   405 ?        S<     0:00 [kdmflush]
   406 ?        S<     0:00 [kcryptd_io]
   409 ?        S<     0:00 [kcryptd]
   437 ?        S      0:00 [segctord]
   438 ?        S<     0:00 [kworker/3:1H]
   447 ?        S      0:01 [kworker/0:2]
   474 ?        R      0:09 [flush-253:0]
   488 ?        S      0:00 [kauditd]
   489 ?        Ss     0:00 /lib/systemd/systemd-stdout-syslog-bridge
   490 ?        S      0:00 [kworker/2:2]
   497 ?        Ss     0:00 /sbin/udevd
   512 ?        S      0:00 [kworker/3:2]
   520 ?        S<     0:00 [kworker/1:1H]
   532 ?        S      0:00 [kworker/1:2]
   716 ?        S<     0:00 [kvm-irqfd-clean]
   733 ?        S      0:00 [irq/42-mei]
   751 ?        S<     0:00 [hd-audio0]
   860 ?        S      0:00 [md200_raid1]
   865 ?        S      0:00 [md250_raid1]
   878 ?        S<     0:00 [kdmflush]
   879 ?        S<     0:00 [kcryptd_io]
   880 ?        S<     0:00 [kcryptd]
   882 ?        S<     0:00 [kdmflush]
   885 ?        S<     0:00 [kcryptd_io]
   886 ?        S<     0:00 [kcryptd]
   931 ?        S<     0:00 [kdmflush]
   933 ?        S<     0:00 [kcryptd_io]
   934 ?        S<     0:00 [kcryptd]
   938 ?        S<     0:00 [kdmflush]
   939 ?        S<     0:00 [kcryptd_io]
   940 ?        S<     0:00 [kcryptd]
   952 ?        S<     0:00 [xfs-data/dm-4]
   953 ?        S<     0:00 [xfs-conv/dm-4]
   954 ?        S<     0:00 [xfs-cil/dm-4]
   955 ?        S      0:00 [xfsaild/dm-4]
   972 ?        S<     0:00 [kdmflush]
   973 ?        S<     0:00 [kcryptd_io]
   974 ?        S<     0:00 [kcryptd]
   982 ?        S<     0:00 [xfs-data/dm-3]
   983 ?        S<     0:00 [xfs-conv/dm-3]
   984 ?        S<     0:00 [xfs-cil/dm-3]
   985 ?        S      0:00 [xfsaild/dm-3]
  1010 ?        S      0:00 [kjournald]
  1019 ?        S      0:00 [segctord]
  1020 ?        Ss     0:00 /sbin/nilfs_cleanerd -n /dev/mapper/raid_new 
/mnt/raid
  1079 ?        Ss     0:00 /lib/systemd/systemd-logind
  1092 ?        Ss     0:00 /usr/bin/vmware-usbarbitrator
  1112 ?        Sl     0:00 /sbin/rsyslogd -c 5 -f /etc/rsyslog.conf
  1127 ?        Ssl    0:00 /usr/sbin/NetworkManager --no-daemon
  1130 ?        Ss     0:00 /sbin/haveged -w 1024 -v 1
  1131 ?        Ss     0:00 /sbin/acpid
  1136 ?        Ss     0:00 /bin/dbus-daemon --system --address=systemd: 
--nofork --systemd-activation
  1137 ?        Ss     0:00 /usr/sbin/nscd
  1174 ?        S      0:00 [flush-8:0]
  1175 ?        S      0:00 [flush-253:1]
  1176 ?        S      0:00 [flush-253:2]
  1223 ?        Sl     0:00 /usr/lib/polkit-1/polkitd --no-debug
  1237 ?        Ss     0:00 /sbin/nilfs_cleanerd -c 
/etc/nilfs_cleanerd_rootfs.conf /dev/mapper/system
  1241 ?        S      0:00 /usr/bin/kdm
  1243 ?        S      0:00 /usr/sbin/modem-manager
  1261 ?        Ss     0:00 /usr/sbin/cron -n
  1288 ?        Ss     0:00 /usr/sbin/cupsd -C /etc/cups/cupsd.conf
  1323 ?        Ss     0:00 /usr/sbin/mcelog --daemon --pidfile 
/var/run/mcelog.pid
  1357 tty1     Ss+    0:00 /sbin/agetty tty1 38400
  1434 ?        S      0:00 /sbin/dhclient -d -4 -sf 
/usr/lib/nm-dhcp-client.action -pf /var/run/dhclient-eth0.pid -lf 
/var/lib/dhcp/dhclient-aa5261fd-d435-41e1-ae57-64a820a9b479-eth0.lease 
-cf /var/run/nm-dhclie
  1497 tty7     Ss+    0:01 /usr/bin/Xorg -br :0 vt7 -nolisten tcp -auth 
/var/lib/xdm/authdir/authfiles/A:0-tVZebb
  1498 ?        Ssl    0:00 /usr/lib/vmware/bin/vmware-vmblock-fuse -o 
subtype=vmware-vmblock,default_permissions,allow_other /var/run/vmblock-fuse
  1509 ?        S      0:00 /sbin/udevd
  1511 ?        S      0:00 /sbin/udevd
  1552 ?        Ss     0:00 /usr/sbin/vmware-authdlauncher
  1779 ?        S      0:00 -:0
  1830 ?        Ssl    0:00 /usr/sbin/console-kit-daemon --no-daemon
  1906 ?        Ss     0:00 /bin/sh /usr/bin/startkde
  1993 ?        Ss     0:00 /usr/bin/gpg-agent --sh --daemon 
--write-env-file /home/user/.gnupg/agent.info /usr/bin/ssh-agent 
/etc/X11/xinit/xinitrc
  1994 ?        Ss     0:00 /usr/bin/ssh-agent /etc/X11/xinit/xinitrc
  2034 ?        S      0:00 dbus-launch --sh-syntax --exit-with-session
  2035 ?        Ss     0:00 /bin/dbus-daemon --fork --print-pid 5 
--print-address 7 --session
  2042 ?        S      0:00 /usr/lib64/kde4/libexec/start_kdeinit 
+kcminit_startup
  2043 ?        Ss     0:00 kdeinit4: kdeinit4 Running...
  2044 ?        S      0:00 kdeinit4: klauncher [kdeinit] --fd=9
  2046 ?        Sl     0:00 kdeinit4: kded4 [kdeinit]
  2053 ?        S      0:00 kdeinit4: kglobalaccel [kdeinit]
  2057 ?        S      0:00 /usr/bin/kactivitymanagerd
  2059 ?        Sl     0:00 /usr/lib/upower/upowerd
  2062 ?        S      0:00 kwrapper4 ksmserver
  2063 ?        Sl     0:00 kdeinit4: ksmserver [kdeinit]
  2067 ?        S      0:00 kwin
  2126 ?        Sl     0:00 /usr/bin/knotify4
  2138 ?        Sl     0:00 /usr/lib/udisks/udisks-daemon
  2140 ?        S      0:00 udisks-daemon: not polling any devices
  2147 ?        Sl     0:03 kdeinit4: plasma-desktop [kdeinit]
  2151 ?        S      0:00 /usr/bin/kuiserver
  2153 ?        S      0:00 kdeinit4: kio_desktop [kdeinit] desktop 
local:/tmp/ksocket-war
  2159 ?        S      0:00 kdeinit4: kio_file [kdeinit] file 
local:/tmp/ksocket-user
  2171 ?        Sl     0:00 kdeinit4: krunner [kdeinit]
  2196 ?        Sl     0:00 /usr/bin/pulseaudio --start --log-target=syslog
  2198 ?        SNl    0:00 /usr/lib/rtkit/rtkit-daemon
  2204 ?        Sl     0:00 /usr/lib/tracker-store
  2205 ?        Sl     0:00 
/usr/lib64/kde4/libexec/polkit-kde-authentication-agent-1
  2207 ?        Sl     0:00 kdeinit4: kmix [kdeinit]
  2213 ?        S      0:00 kdeinit4: klipper [kdeinit]
  2216 ?        S      0:00 /usr/lib/pulse/gconf-helper
  2218 ?        S      0:00 /usr/lib/GConf/2/gconfd-2
  2223 ?        Ssl    0:03 /home/user/.dropbox-dist/dropbox
  2240 ?        S      0:00 /usr/lib/gvfs/gvfsd
  2245 ?        Ssl    0:00 /usr/lib/gvfs//gvfs-fuse-daemon /home/user/.gvfs
  2295 ?        Rl     0:01 kdeinit4: konsole [kdeinit]
  2298 pts/1    Ss     0:00 /bin/bash
  2337 ?        Sl     0:00 kdeinit4: kio_trash [kdeinit] trash 
local:/tmp/ksocket-user
  2367 ?        Ssl    0:00 mpd
  2388 ?        R      1:30 [flush-253:5]
  2437 pts/1    R+     0:00 ps ax




--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]       ` <50A24564.7000704-3TRaYzfGuuI@public.gmane.org>
@ 2012-11-13 13:41         ` Vyacheslav Dubeyko
  2012-11-13 14:01           ` fwdnilfsml.to.11df-2pMamKoQTv4
  2012-11-15 22:58           ` fwdnilfsml.to.11df-2pMamKoQTv4
  0 siblings, 2 replies; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-13 13:41 UTC (permalink / raw)
  To: fwdnilfsml.to.11df-2pMamKoQTv4; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2012-11-13 at 17:04 +0400, fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org wrote:
> > From one point of view, you
> > said that filesystem generates I/O activity but as a result you haven't
> > disk activity.
> Yes, iotop shows activity on flush process about 5-15 GB/S - but my HDD 
> does not write anything at this time.
> 
> 
>  > Does it means that you create any files but you haven't
> > it on partition after umount?
> 
> No, there is no problems with data loss or corruption. High IO activity 
> on flush process happens when I do not modify anything. Most often it 
> happens in 1-2 minutes after OS boot. I found, that frequency of 
> described behaviour somehow related to my access to files on that 
> filesystem: when I read or, write files, the frequency of described 
> behaviour increasing, but when system is idle - flush process stops it's 
> wierd IO activity after some time.
> 
> > Could you share more details about your configuration and situation?
> Sorry, I don't have access to my PC right now, but i've captured "ps ax" 
> command output when flush processes are working. Also, I'll try to 
> reproduce this strange behavior on quemu virtual machine. Maybe there is 
> something wrong with my configuration...
> 
> this processes are related to devices, that run nilfs2 filesystem:
> 474  ?        R      0:09 [flush-253:0]
> 2388 ?        R      1:30 [flush-253:5]
> 
> 
>    PID TTY      STAT   TIME COMMAND
>      1 ?        Ss     0:01 /sbin/init
>      2 ?        S      0:00 [kthreadd]
>      3 ?        S      0:00 [ksoftirqd/0]
>      4 ?        S      0:00 [kworker/0:0]
>      5 ?        S<     0:00 [kworker/0:0H]
>      6 ?        S      0:00 [kworker/u:0]
>      7 ?        S<     0:00 [kworker/u:0H]
>      8 ?        S      0:00 [migration/0]
>      9 ?        S      0:00 [rcuc/0]
>     10 ?        S      0:00 [rcun/0]
>     11 ?        S      0:00 [rcub/0]
>     12 ?        S      0:00 [rcun/1]
>     13 ?        S      0:00 [rcub/1]
>     14 ?        S      0:00 [watchdog/0]
>     15 ?        S      0:00 [migration/1]
>     16 ?        S      0:00 [kworker/1:0]
>     17 ?        S<     0:00 [kworker/1:0H]
>     18 ?        S      0:00 [rcuc/1]
>     19 ?        S      0:00 [ksoftirqd/1]
>     20 ?        S      0:00 [watchdog/1]
>     21 ?        S      0:00 [migration/2]
>     22 ?        S      0:00 [kworker/2:0]
>     23 ?        S<     0:00 [kworker/2:0H]
>     24 ?        S      0:00 [rcuc/2]
>     25 ?        S      0:00 [ksoftirqd/2]
>     26 ?        S      0:00 [watchdog/2]
>     27 ?        S      0:00 [migration/3]
>     28 ?        S      0:00 [kworker/3:0]
>     29 ?        S<     0:00 [kworker/3:0H]
>     30 ?        S      0:00 [rcuc/3]
>     31 ?        S      0:00 [ksoftirqd/3]
>     32 ?        S      0:00 [watchdog/3]
>     33 ?        S<     0:00 [cpuset]
>     34 ?        S<     0:00 [khelper]
>     35 ?        S      0:00 [kdevtmpfs]
>     36 ?        S<     0:00 [netns]
>     37 ?        S      0:00 [bdi-default]
>     38 ?        S<     0:00 [kintegrityd]
>     39 ?        S<     0:00 [kblockd]
>     40 ?        S<     0:00 [ata_sff]
>     41 ?        S      0:00 [khubd]
>     42 ?        S<     0:00 [md]
>     43 ?        S      0:00 [kworker/0:1]
>     44 ?        S      0:00 [kworker/2:1]
>     45 ?        S      0:00 [khungtaskd]
>     46 ?        S      0:00 [kswapd0]
>     47 ?        SN     0:00 [ksmd]
>     48 ?        SN     0:00 [khugepaged]
>     49 ?        S      0:00 [fsnotify_mark]
>     50 ?        S<     0:00 [crypto]
>     55 ?        S<     0:00 [kthrotld]
>     56 ?        S      0:00 [scsi_eh_0]
>     57 ?        S      0:00 [scsi_eh_1]
>     58 ?        S      0:00 [scsi_eh_2]
>     59 ?        S      0:00 [scsi_eh_3]
>     60 ?        S      0:00 [scsi_eh_4]
>     61 ?        S      0:00 [scsi_eh_5]
>     62 ?        S      0:00 [kworker/u:1]
>     63 ?        S      0:00 [kworker/u:2]
>     64 ?        S      0:00 [kworker/u:3]
>     65 ?        S      0:00 [kworker/u:4]
>     66 ?        S      0:00 [kworker/u:5]
>     67 ?        S      0:00 [kworker/u:6]
>     68 ?        S<     0:00 [kpsmoused]
>     69 ?        S<     0:00 [deferwq]
>     70 ?        S      0:00 [kworker/u:7]
>     71 ?        S      0:00 [kworker/3:1]
>     72 ?        S      0:00 [kworker/u:8]
>     73 ?        S<     0:00 [kworker/0:1H]
>     76 ?        S      0:00 [kworker/1:1]
>    106 ?        S      0:00 [scsi_eh_6]
>    107 ?        S      0:00 [scsi_eh_7]
>    130 ?        S<     0:00 [xfsalloc]
>    131 ?        S<     0:00 [xfs_mru_cache]
>    132 ?        S<     0:00 [xfslogd]
>    147 ?        S<     0:00 [kmpath_rdacd]
>    303 ?        S<     0:00 [kworker/2:1H]
>    306 ?        S      0:00 [md0_raid1]
>    405 ?        S<     0:00 [kdmflush]
>    406 ?        S<     0:00 [kcryptd_io]
>    409 ?        S<     0:00 [kcryptd]
>    437 ?        S      0:00 [segctord]
>    438 ?        S<     0:00 [kworker/3:1H]
>    447 ?        S      0:01 [kworker/0:2]
>    474 ?        R      0:09 [flush-253:0]
>    488 ?        S      0:00 [kauditd]
>    489 ?        Ss     0:00 /lib/systemd/systemd-stdout-syslog-bridge
>    490 ?        S      0:00 [kworker/2:2]
>    497 ?        Ss     0:00 /sbin/udevd
>    512 ?        S      0:00 [kworker/3:2]
>    520 ?        S<     0:00 [kworker/1:1H]
>    532 ?        S      0:00 [kworker/1:2]
>    716 ?        S<     0:00 [kvm-irqfd-clean]
>    733 ?        S      0:00 [irq/42-mei]
>    751 ?        S<     0:00 [hd-audio0]
>    860 ?        S      0:00 [md200_raid1]
>    865 ?        S      0:00 [md250_raid1]
>    878 ?        S<     0:00 [kdmflush]
>    879 ?        S<     0:00 [kcryptd_io]
>    880 ?        S<     0:00 [kcryptd]
>    882 ?        S<     0:00 [kdmflush]
>    885 ?        S<     0:00 [kcryptd_io]
>    886 ?        S<     0:00 [kcryptd]
>    931 ?        S<     0:00 [kdmflush]
>    933 ?        S<     0:00 [kcryptd_io]
>    934 ?        S<     0:00 [kcryptd]
>    938 ?        S<     0:00 [kdmflush]
>    939 ?        S<     0:00 [kcryptd_io]
>    940 ?        S<     0:00 [kcryptd]
>    952 ?        S<     0:00 [xfs-data/dm-4]
>    953 ?        S<     0:00 [xfs-conv/dm-4]
>    954 ?        S<     0:00 [xfs-cil/dm-4]
>    955 ?        S      0:00 [xfsaild/dm-4]
>    972 ?        S<     0:00 [kdmflush]
>    973 ?        S<     0:00 [kcryptd_io]
>    974 ?        S<     0:00 [kcryptd]
>    982 ?        S<     0:00 [xfs-data/dm-3]
>    983 ?        S<     0:00 [xfs-conv/dm-3]
>    984 ?        S<     0:00 [xfs-cil/dm-3]
>    985 ?        S      0:00 [xfsaild/dm-3]
>   1010 ?        S      0:00 [kjournald]
>   1019 ?        S      0:00 [segctord]
>   1020 ?        Ss     0:00 /sbin/nilfs_cleanerd -n /dev/mapper/raid_new 
> /mnt/raid

[snip]

>   1237 ?        Ss     0:00 /sbin/nilfs_cleanerd -c 
> /etc/nilfs_cleanerd_rootfs.conf /dev/mapper/system


I can see that you have two running nilfs_cleanerd deamons. Second one
(1237) is started with special configuration file. Moreover, I
understand that /dev/mapper/raid_new device is mounted to /mnt/raid. But
what is /dev/mapper/system device? And where did it mounted? What
content has /etc/nilfs_cleanerd_rootfs.conf? What difference
between /dev/mapper/raid_new and /dev/mapper/system?

With the best regards,
Vyacheslav Dubeyko.

>   1241 ?        S      0:00 /usr/bin/kdm
>   1243 ?        S      0:00 /usr/sbin/modem-manager
>   1261 ?        Ss     0:00 /usr/sbin/cron -n
>   1288 ?        Ss     0:00 /usr/sbin/cupsd -C /etc/cups/cupsd.conf
>   1323 ?        Ss     0:00 /usr/sbin/mcelog --daemon --pidfile 
> /var/run/mcelog.pid
>   1357 tty1     Ss+    0:00 /sbin/agetty tty1 38400
>   1434 ?        S      0:00 /sbin/dhclient -d -4 -sf 
> /usr/lib/nm-dhcp-client.action -pf /var/run/dhclient-eth0.pid -lf 
> /var/lib/dhcp/dhclient-aa5261fd-d435-41e1-ae57-64a820a9b479-eth0.lease 
> -cf /var/run/nm-dhclie
>   1497 tty7     Ss+    0:01 /usr/bin/Xorg -br :0 vt7 -nolisten tcp -auth 
> /var/lib/xdm/authdir/authfiles/A:0-tVZebb
>   1498 ?        Ssl    0:00 /usr/lib/vmware/bin/vmware-vmblock-fuse -o 
> subtype=vmware-vmblock,default_permissions,allow_other /var/run/vmblock-fuse
>   1509 ?        S      0:00 /sbin/udevd
>   1511 ?        S      0:00 /sbin/udevd
>   1552 ?        Ss     0:00 /usr/sbin/vmware-authdlauncher
>   1779 ?        S      0:00 -:0
>   1830 ?        Ssl    0:00 /usr/sbin/console-kit-daemon --no-daemon
>   1906 ?        Ss     0:00 /bin/sh /usr/bin/startkde
>   1993 ?        Ss     0:00 /usr/bin/gpg-agent --sh --daemon 
> --write-env-file /home/user/.gnupg/agent.info /usr/bin/ssh-agent 
> /etc/X11/xinit/xinitrc
>   1994 ?        Ss     0:00 /usr/bin/ssh-agent /etc/X11/xinit/xinitrc
>   2034 ?        S      0:00 dbus-launch --sh-syntax --exit-with-session
>   2035 ?        Ss     0:00 /bin/dbus-daemon --fork --print-pid 5 
> --print-address 7 --session
>   2042 ?        S      0:00 /usr/lib64/kde4/libexec/start_kdeinit 
> +kcminit_startup
>   2043 ?        Ss     0:00 kdeinit4: kdeinit4 Running...
>   2044 ?        S      0:00 kdeinit4: klauncher [kdeinit] --fd=9
>   2046 ?        Sl     0:00 kdeinit4: kded4 [kdeinit]
>   2053 ?        S      0:00 kdeinit4: kglobalaccel [kdeinit]
>   2057 ?        S      0:00 /usr/bin/kactivitymanagerd
>   2059 ?        Sl     0:00 /usr/lib/upower/upowerd
>   2062 ?        S      0:00 kwrapper4 ksmserver
>   2063 ?        Sl     0:00 kdeinit4: ksmserver [kdeinit]
>   2067 ?        S      0:00 kwin
>   2126 ?        Sl     0:00 /usr/bin/knotify4
>   2138 ?        Sl     0:00 /usr/lib/udisks/udisks-daemon
>   2140 ?        S      0:00 udisks-daemon: not polling any devices
>   2147 ?        Sl     0:03 kdeinit4: plasma-desktop [kdeinit]
>   2151 ?        S      0:00 /usr/bin/kuiserver
>   2153 ?        S      0:00 kdeinit4: kio_desktop [kdeinit] desktop 
> local:/tmp/ksocket-war
>   2159 ?        S      0:00 kdeinit4: kio_file [kdeinit] file 
> local:/tmp/ksocket-user
>   2171 ?        Sl     0:00 kdeinit4: krunner [kdeinit]
>   2196 ?        Sl     0:00 /usr/bin/pulseaudio --start --log-target=syslog
>   2198 ?        SNl    0:00 /usr/lib/rtkit/rtkit-daemon
>   2204 ?        Sl     0:00 /usr/lib/tracker-store
>   2205 ?        Sl     0:00 
> /usr/lib64/kde4/libexec/polkit-kde-authentication-agent-1
>   2207 ?        Sl     0:00 kdeinit4: kmix [kdeinit]
>   2213 ?        S      0:00 kdeinit4: klipper [kdeinit]
>   2216 ?        S      0:00 /usr/lib/pulse/gconf-helper
>   2218 ?        S      0:00 /usr/lib/GConf/2/gconfd-2
>   2223 ?        Ssl    0:03 /home/user/.dropbox-dist/dropbox
>   2240 ?        S      0:00 /usr/lib/gvfs/gvfsd
>   2245 ?        Ssl    0:00 /usr/lib/gvfs//gvfs-fuse-daemon /home/user/.gvfs
>   2295 ?        Rl     0:01 kdeinit4: konsole [kdeinit]
>   2298 pts/1    Ss     0:00 /bin/bash
>   2337 ?        Sl     0:00 kdeinit4: kio_trash [kdeinit] trash 
> local:/tmp/ksocket-user
>   2367 ?        Ssl    0:00 mpd
>   2388 ?        R      1:30 [flush-253:5]
>   2437 pts/1    R+     0:00 ps ax
> 
> 
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-13 13:41         ` Strange "flush" process bahaviour Vyacheslav Dubeyko
@ 2012-11-13 14:01           ` fwdnilfsml.to.11df-2pMamKoQTv4
  2012-11-15 22:58           ` fwdnilfsml.to.11df-2pMamKoQTv4
  1 sibling, 0 replies; 29+ messages in thread
From: fwdnilfsml.to.11df-2pMamKoQTv4 @ 2012-11-13 14:01 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

> I can see that you have two running nilfs_cleanerd deamons. Second one
> (1237) is started with special configuration file. Moreover, I
> understand that /dev/mapper/raid_new device is mounted to /mnt/raid. But
> what is /dev/mapper/system device? And where did it mounted? What
> content has /etc/nilfs_cleanerd_rootfs.conf? What difference
> between /dev/mapper/raid_new and /dev/mapper/system?
>
> With the best regards,
> Vyacheslav Dubeyko.

I have 2 different nilfs2 partitions. /dev/mapper/system - is a rootfs, 
about 32GB size. /dev/mapper/raid_new - is a data partition 1.8TB size.
Yes, there are 2 different config files for nilfs_cleanerd, because I 
need different GC collection policy for rootfs.
I'll send contents of nilfs_cleanerd*.conf files when I'll get to my PC.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found] ` <50A159BC.9060600-3TRaYzfGuuI@public.gmane.org>
  2012-11-13  6:13   ` Vyacheslav Dubeyko
@ 2012-11-13 15:28   ` Piotr Szymaniak
  2012-11-13 18:24     ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2012-11-13 15:28 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 912 bytes --]

On Tue, Nov 13, 2012 at 12:19:08AM +0400, fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org wrote:
> Hello!
> Some time ago I've noticed, that "flush" kernel process associated with 
> device running nilfs2 filesystem, generate huge amount of IO activity 
> (using 100% CPU time). "iotop" utility show speeds about 5-15 GB\S on 
> that "flush" process, but no actual disk activity happens.
> This condition lasts about 1-2 minutes, and then it stops for some time 
> (usually 10-20 minutes).

I can confirm this (but never bothered to check what caused it). But I
saw some huge load (few days ago) on 1 CPU core that was stopped
immediately after runnig "sync". This could be the same as mentioned
here.

And I just checked that machine and it eats my CPU. ps ax attached.


Piotr Szymaniak.
-- 
 - You know nothing about woman!
 - Of course not, I'm a man, so how could I know?
  -- Berserk

[-- Attachment #1.2: ps.out --]
[-- Type: text/plain, Size: 6974 bytes --]

  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:00 init [3]  
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:00 [ksoftirqd/0]
    4 ?        S      0:10 [kworker/0:0]
    5 ?        S<     0:00 [kworker/0:0H]
    7 ?        S<     0:00 [kworker/u:0H]
    8 ?        S      0:00 [migration/0]
    9 ?        S      0:00 [migration/1]
   11 ?        S<     0:00 [kworker/1:0H]
   12 ?        S      0:00 [ksoftirqd/1]
   13 ?        S<     0:00 [khelper]
   14 ?        S      0:00 [kdevtmpfs]
  149 ?        S      0:00 [bdi-default]
  151 ?        S<     0:00 [kblockd]
  274 ?        S<     0:00 [ata_sff]
  281 ?        S<     0:00 [md]
  397 ?        S      0:00 [kswapd0]
  398 ?        S      0:00 [fsnotify_mark]
  400 ?        S<     0:00 [crypto]
  441 ?        S<     0:00 [ttm_swap]
  472 ?        S      0:00 [scsi_eh_0]
  475 ?        S      0:00 [scsi_eh_1]
  478 ?        S      0:00 [scsi_eh_2]
  481 ?        S      0:00 [scsi_eh_3]
  484 ?        S      0:00 [scsi_eh_4]
  487 ?        S      0:00 [scsi_eh_5]
  491 ?        S      0:00 [kworker/u:3]
  492 ?        S      0:00 [kworker/u:4]
  508 ?        S<     0:00 [kpsmoused]
  522 ?        S<     0:00 [deferwq]
  567 ?        S      0:00 [md5_raid1]
  573 ?        S      0:00 [md1_raid1]
  577 ?        S      0:00 [md0_raid1]
  693 ?        Ss     0:00 /usr/lib/systemd/systemd-udevd --daemon
  757 ?        S      0:00 [scsi_eh_6]
  758 ?        S      0:00 [scsi_eh_7]
  760 ?        S      0:00 [khubd]
  767 ?        S<     0:00 [kworker/0:1H]
  793 ?        S<     0:00 [kworker/1:1H]
 1030 ?        S<     0:00 [iprt]
 1106 ?        S<     0:00 [hci0]
 1107 ?        S<     0:00 [kworker/u:1H]
 1130 ?        S      0:02 [segctord]
 1193 ?        S      0:00 [jbd2/md5-8]
 1194 ?        S<     0:00 [ext4-dio-unwrit]
 1195 ?        S<     0:00 [loop0]
 1202 ?        Ss     0:00 /usr/sbin/mount.ntfs-3g /dev/sdd1 /mnt/ce -o rw,locale=pl_PL.utf8
 1205 ?        Ss     0:00 /usr/sbin/mount.ntfs-3g /dev/sdd5 /mnt/de -o rw,locale=pl_PL.utf8
 1208 ?        Ss     0:00 /usr/sbin/mount.ntfs-3g /dev/sde1 /mnt/ef -o rw,locale=pl_PL.utf8
 1607 ?        Ss     0:00 /usr/bin/dbus-daemon --system
 1621 ?        Ss     0:00 metalog [MASTER]                                            
 1622 ?        S      0:00 metalog [KERNEL]                                            
 1637 ?        Ssl    0:00 /usr/sbin/console-kit-daemon
 1712 ?        Sl     0:01 /usr/lib/polkit-1/polkitd --no-debug
 1920 ?        Ss     0:00 /usr/bin/kdm
 1924 ?        R    111:40 [flush-8:0]
 1926 tty7     Ss+    8:59 /usr/bin/X -br -novtswitch -quiet :0 vt7 -nolisten tcp -auth /var/run/xauth/A:0-c2i4Cb
 1950 ?        S      0:00 -:0         
 1973 ?        SNs    0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --port 3632 --log-file /var/log/distccd.log --log-level critical --allow 192.168.1.1 --allow 10.0.0.0/24 -N 15
 1975 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --port 3632 --log-file /var/log/distccd.log --log-level critical --allow 192.168.1.1 --allow 10.0.0.0/24 -N 15
 2025 ?        Ss     0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid
 2109 ?        Ss     0:00 //usr/libexec/postfix/master
 2114 ?        S      0:00 qmgr -l -t fifo -u
 2125 ?        Ss     0:01 /usr/sbin/privoxy --pidfile /var/run/privoxy.pid --user privoxy.privoxy /etc/privoxy/config
 2126 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --port 3632 --log-file /var/log/distccd.log --log-level critical --allow 192.168.1.1 --allow 10.0.0.0/24 -N 15
 2140 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --port 3632 --log-file /var/log/distccd.log --log-level critical --allow 192.168.1.1 --allow 10.0.0.0/24 -N 15
 2141 ?        SN     0:00 /usr/bin/distccd --daemon --pid-file /var/run/distccd/distccd.pid --port 3632 --log-file /var/log/distccd.log --log-level critical --allow 192.168.1.1 --allow 10.0.0.0/24 -N 15
 2143 ?        S      0:00 /usr/sbin/smartd -p /var/run/smartd.pid
 2159 ?        Ss     0:00 /usr/sbin/sshd
 2175 ?        Ss     0:00 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd_rootfs.conf /dev/disk/by-uuid/1aa9e6fb-cf7d-45bd-bbfb-08110a8840b7
 2183 tty1     Ss+    0:00 /sbin/agetty 38400 tty1 linux
 2184 tty2     Ss+    0:00 /sbin/agetty 38400 tty2 linux
 2185 tty3     Ss+    0:00 /sbin/agetty 38400 tty3 linux
 2186 tty4     Ss+    0:00 /sbin/agetty 38400 tty4 linux
 2187 tty5     Ss+    0:00 /sbin/agetty 38400 tty5 linux
 2188 tty6     Ss+    0:00 /sbin/agetty 38400 tty6 linux
 2198 ?        Ss     0:00 /bin/sh /usr/bin/startkde
 2210 ?        S      0:00 /usr/bin/dbus-launch --sh-syntax --exit-with-session
 2211 ?        Ss     0:00 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
 2260 ?        Ss     0:00 kdeinit4: kdeinit4 Running...     
 2261 ?        S      0:00 kdeinit4: klauncher [kdeinit] --fd=8
 2263 ?        Sl     0:01 kdeinit4: kded4 [kdeinit]         
 2265 ?        S      0:00 /usr/libexec/gam_server
 2269 ?        S      0:00 kdeinit4: kglobalaccel [kdeinit]  
 2273 ?        Sl     0:00 /usr/libexec/upowerd
 2294 ?        Sl     0:02 /usr/libexec/udisks-daemon
 2299 ?        S      0:00 kwrapper4 ksmserver
 2301 ?        S      0:00 udisks-daemon: not polling any devices
 2302 ?        Sl     0:00 kdeinit4: ksmserver [kdeinit]     
 2323 ?        Sl     1:53 kwin -session 10e6cf737a000133274663900000025370000_1352712750_234757
 2327 ?        S      0:00 /usr/bin/kactivitymanagerd
 2342 ?        Sl     0:00 /usr/bin/knotify4
 2365 ?        Sl     4:05 kdeinit4: plasma-desktop [kdeinit]
 2370 ?        S      0:00 /usr/bin/kuiserver
 2385 ?        Sl     0:01 kdeinit4: krunner [kdeinit]       
 2391 ?        Sl     0:00 kdeinit4: kmix [kdeinit] -session 10e6cf737a000133274664200000025370007_1352712749_218942
 2396 ?        Sl     7:28 /usr/lib/opera/opera -session 10e6cf737a000135271066800000023000009_1352712749_219369 -sd /usr/share/opera/ -pd /home/foobar/.opera/
 2403 ?        S      0:00 kdeinit4: klipper [kdeinit]       
 2404 ?        Sl     0:00 /usr/lib/kde4/libexec/polkit-kde-authentication-agent-1
 2406 ?        S      0:02 /usr/bin/urxvt
 2408 pts/1    Ss     0:00 -bash
 2429 pts/1    S+     0:00 ssh wloczykij
 4342 ?        S      0:00 [flush-9:5]
 4354 ?        Sl     0:26 /usr/lib/opera/pluginwrapper/operapluginwrapper-native -multiproc 2,0.1.659,10,75,97 -logfolder /home/foobar/.opera/logs/ -type native
 4432 ?        S      0:00 pickup -l -t fifo -u
 4806 ?        S      0:00 [kworker/0:1]
 5112 ?        S      0:00 [kworker/1:0]
 5171 ?        S      0:01 [kworker/1:1]
 5206 ?        Ss     0:00 sshd: foobar [priv]
 5215 ?        S      0:00 sshd: foobar@pts/2
 5216 pts/2    Ss     0:00 -bash
 5235 pts/2    R+     0:00 ps ax

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]       ` <6B9E40B0-90F8-4130-9798-87253FEDDA78-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2012-11-13 17:49         ` fwdnilfsml.to.11df-2pMamKoQTv4
  0 siblings, 0 replies; 29+ messages in thread
From: fwdnilfsml.to.11df-2pMamKoQTv4 @ 2012-11-13 17:49 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

> So, do you have the reported issue as reproducible in the case of default way of mount or in not rootfs case?
Yes, I have this problem both for rootfs and for non-rootfs filesystem.
I'll try to reproduce this problem on virtual machine...

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-13 15:28   ` Piotr Szymaniak
@ 2012-11-13 18:24     ` Vyacheslav Dubeyko
       [not found]       ` <6B9E40B0-90F8-4130-9798-87253FEDDA78-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-13 18:24 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Nov 13, 2012, at 6:28 PM, Piotr Szymaniak wrote:

> On Tue, Nov 13, 2012 at 12:19:08AM +0400, fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org wrote:
>> Hello!
>> Some time ago I've noticed, that "flush" kernel process associated with 
>> device running nilfs2 filesystem, generate huge amount of IO activity 
>> (using 100% CPU time). "iotop" utility show speeds about 5-15 GB\S on 
>> that "flush" process, but no actual disk activity happens.
>> This condition lasts about 1-2 minutes, and then it stops for some time 
>> (usually 10-20 minutes).
> 
> I can confirm this (but never bothered to check what caused it). But I
> saw some huge load (few days ago) on 1 CPU core that was stopped
> immediately after runnig "sync". This could be the same as mentioned
> here.
> 

What operations were on file system before the issue reproduction? I simply need to understand the way of issue reproduction on my side.

> And I just checked that machine and it eats my CPU. ps ax attached.
> 

I see some similarities in these both cases. You mount rootfs with definition the special configuration file also:

2175 ?        Ss     0:00 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd_rootfs.conf /dev/disk/by-uuid/1aa9e6fb-cf7d-45bd-bbfb-08110a8840b7

So, do you have the reported issue as reproducible in the case of default way of mount or in not rootfs case?

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.
> -- 
> - You know nothing about woman!
> - Of course not, I'm a man, so how could I know?
>  -- Berserk
> <ps.out>

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-13 13:41         ` Strange "flush" process bahaviour Vyacheslav Dubeyko
  2012-11-13 14:01           ` fwdnilfsml.to.11df-2pMamKoQTv4
@ 2012-11-15 22:58           ` fwdnilfsml.to.11df-2pMamKoQTv4
       [not found]             ` <50A5737C.2040805-3TRaYzfGuuI@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: fwdnilfsml.to.11df-2pMamKoQTv4 @ 2012-11-15 22:58 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Still trying to reproduce this issue.
So far, I found a workaround, that dramatically reduce flush process 
activity: Using nilfs-tune utility, set "commit interval of segment" 
parameter to default zero value.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]             ` <50A5737C.2040805-3TRaYzfGuuI@public.gmane.org>
@ 2012-11-16  3:13               ` Ryusuke Konishi
       [not found]                 ` <201211160313.AA04185-ZdTO5nnmHvkOizVVqyxoihMFgDP4sedm@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Ryusuke Konishi @ 2012-11-16  3:13 UTC (permalink / raw)
  To: fwdnilfsml.to.11df-2pMamKoQTv4; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

>Still trying to reproduce this issue.
>So far, I found a workaround, that dramatically reduce flush process 
>activity: Using nilfs-tune utility, set "commit interval of segment" 
>parameter to default zero value.

Hmm, the default value of this parameter is zero.
Have you changed it with the nilfs-tune tool by some chance ?

Regards,
Ryusuke Konishi

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]                 ` <201211160313.AA04185-ZdTO5nnmHvkOizVVqyxoihMFgDP4sedm@public.gmane.org>
@ 2012-11-16 15:02                   ` fwdnilfsml.to.11df-2pMamKoQTv4
       [not found]                     ` <50A65588.3000205-3TRaYzfGuuI@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: fwdnilfsml.to.11df-2pMamKoQTv4 @ 2012-11-16 15:02 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

> Hmm, the default value of this parameter is zero.
> Have you changed it with the nilfs-tune tool by some chance ?
Yes, I've changed it to 60 sec for rootfs partition, and 180 sec for 
data partition. It worked flawlessly with this setup on root partition 
for last 14 months...

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]                     ` <50A65588.3000205-3TRaYzfGuuI@public.gmane.org>
@ 2012-11-17  1:27                       ` Ryusuke Konishi
       [not found]                         ` <CAKFNMomO6ebYkXk=A9H2Wieswvogt6dj16Otj0u74M4aHOOPBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Ryusuke Konishi @ 2012-11-17  1:27 UTC (permalink / raw)
  To: fwdnilfsml.to.11df-2pMamKoQTv4; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sat, Nov 17, 2012 at 12:02 AM,  <fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org> wrote:
>> Hmm, the default value of this parameter is zero.
>> Have you changed it with the nilfs-tune tool by some chance ?
>
> Yes, I've changed it to 60 sec for rootfs partition, and 180 sec for data
> partition. It worked flawlessly with this setup on root partition for last
> 14 months...

OK, please confirm version of your all kernels including initrd.

The interval parameter was broken in old NILFS2 kernel
modules due to a bug.  It was fixed at kernel 3.0 as follows:

> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -2573,7 +2573,7 @@ static struct nilfs_sc_info *nilfs_segctor_new(struct super_block *sb,
>
>         if (nilfs->ns_interval)
> -               sci->sc_interval = nilfs->ns_interval;
> +               sci->sc_interval = HZ * nilfs->ns_interval;

If you changed kernel back to an older one, this may be the
cause of your problem.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]                         ` <CAKFNMomO6ebYkXk=A9H2Wieswvogt6dj16Otj0u74M4aHOOPBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-27 15:19                           ` Piotr Szymaniak
  2012-11-27 17:43                             ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2012-11-27 15:19 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: fwdnilfsml.to.11df-2pMamKoQTv4,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]

On Sat, Nov 17, 2012 at 10:27:36AM +0900, Ryusuke Konishi wrote:
> On Sat, Nov 17, 2012 at 12:02 AM,  <fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org> wrote:
> >> Hmm, the default value of this parameter is zero.
> >> Have you changed it with the nilfs-tune tool by some chance ?
> >
> > Yes, I've changed it to 60 sec for rootfs partition, and 180 sec for data
> > partition. It worked flawlessly with this setup on root partition for last
> > 14 months...
> 
> OK, please confirm version of your all kernels including initrd.

As I got this issue here's some info:
(nilfs_cleanerd_rootfs.conf that I'm using for rootfs)
protection_period       3600
min_clean_segments      15%
max_clean_segments      25%
clean_check_interval    10
selection_policy        timestamp       # timestamp in ascend order
nsegments_per_clean     2
mc_nsegments_per_clean  4
cleaning_interval       5
mc_cleaning_interval    1
retry_interval          60
use_mmap
log_priority            info

maszn ~ (: uname -srm
Linux 3.6.4 i686

maszn ~ (: eix nilfs-utils -c
[I] sys-fs/nilfs-utils (2.1.4-bwmKdHsBrCaKN0S5hHIosQ@public.gmane.org): *snip*

What's the next step to debug this issue?

Piotr Szymaniak.
-- 
 Jedna z głów  Zaphoda spojrzała w bok.  Druga  zrobiła  zaraz to samo,
gdyż  chciała zobaczyć,  na co patrzy pierwsza,  nie było to jednak nic
konkretnego.
  -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-27 15:19                           ` Piotr Szymaniak
@ 2012-11-27 17:43                             ` Vyacheslav Dubeyko
       [not found]                               ` <295B3FE7-A44E-4D57-9A6A-B725EF1BA9EB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-27 17:43 UTC (permalink / raw)
  To: Piotr Szymaniak
  Cc: Ryusuke Konishi, fwdnilfsml.to.11df-2pMamKoQTv4,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Nov 27, 2012, at 6:19 PM, Piotr Szymaniak wrote:

> On Sat, Nov 17, 2012 at 10:27:36AM +0900, Ryusuke Konishi wrote:
>> On Sat, Nov 17, 2012 at 12:02 AM,  <fwdnilfsml.to.11df-2pMamKoQTv4@public.gmane.org> wrote:
>>>> Hmm, the default value of this parameter is zero.
>>>> Have you changed it with the nilfs-tune tool by some chance ?
>>> 
>>> Yes, I've changed it to 60 sec for rootfs partition, and 180 sec for data
>>> partition. It worked flawlessly with this setup on root partition for last
>>> 14 months...
>> 
>> OK, please confirm version of your all kernels including initrd.
> 
> As I got this issue here's some info:
> (nilfs_cleanerd_rootfs.conf that I'm using for rootfs)
> protection_period       3600
> min_clean_segments      15%
> max_clean_segments      25%
> clean_check_interval    10
> selection_policy        timestamp       # timestamp in ascend order
> nsegments_per_clean     2
> mc_nsegments_per_clean  4
> cleaning_interval       5
> mc_cleaning_interval    1
> retry_interval          60
> use_mmap
> log_priority            info
> 
> maszn ~ (: uname -srm
> Linux 3.6.4 i686
> 
> maszn ~ (: eix nilfs-utils -c
> [I] sys-fs/nilfs-utils (2.1.4-bwmKdHsBrCaKN0S5hHIosQ@public.gmane.org): *snip*
> 
> What's the next step to debug this issue?
> 

Could you describe the path of issue reproducing? What filesystem operations took place near before the issue?

With the best regards,
Vyacheslav Dubeyko.

> Piotr Szymaniak.
> -- 
> Jedna z głów  Zaphoda spojrzała w bok.  Druga  zrobiła  zaraz to samo,
> gdyż  chciała zobaczyć,  na co patrzy pierwsza,  nie było to jednak nic
> konkretnego.
>  -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy"

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]                               ` <295B3FE7-A44E-4D57-9A6A-B725EF1BA9EB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2012-11-28 14:21                                 ` Piotr Szymaniak
  2012-11-28 14:39                                   ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2012-11-28 14:21 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: Piotr Szymaniak, Ryusuke Konishi, fwdnilfsml.to.11df-2pMamKoQTv4,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

On Tue, Nov 27, 2012 at 08:43:01PM +0300, Vyacheslav Dubeyko wrote:
> Could you describe the path of issue reproducing? What filesystem
> operations took place near before the issue?

Reproduce: Use a computer with nilfs2? No idea... I notice this issue
from time to time. I suppose it's more visible when the system is idle
and then I notice a lot of CPU load without any reason (also as I wrote
before - it stops immediately after runnig sync).


Piotr Szymaniak.
-- 
Wanted
Dead or Alive

Schrödinger's Cat

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-28 14:21                                 ` Piotr Szymaniak
@ 2012-11-28 14:39                                   ` Vyacheslav Dubeyko
  2012-11-28 20:59                                     ` Piotr Szymaniak
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-28 14:39 UTC (permalink / raw)
  To: Piotr Szymaniak
  Cc: Ryusuke Konishi, fwdnilfsml.to.11df-2pMamKoQTv4,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2012-11-28 at 15:21 +0100, Piotr Szymaniak wrote:
> On Tue, Nov 27, 2012 at 08:43:01PM +0300, Vyacheslav Dubeyko wrote:
> > Could you describe the path of issue reproducing? What filesystem
> > operations took place near before the issue?
> 
> Reproduce: Use a computer with nilfs2? No idea... I notice this issue
> from time to time. I suppose it's more visible when the system is idle
> and then I notice a lot of CPU load without any reason (also as I wrote
> before - it stops immediately after runnig sync).
> 

Ok. Could you share output of "top" and "ps ax" for the case of issue
reproducing? These outputs can be helpful for beginning of analysis.

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-28 14:39                                   ` Vyacheslav Dubeyko
@ 2012-11-28 20:59                                     ` Piotr Szymaniak
  2012-11-29  7:00                                       ` Vyacheslav Dubeyko
  2013-01-08 13:22                                       ` Vyacheslav Dubeyko
  0 siblings, 2 replies; 29+ messages in thread
From: Piotr Szymaniak @ 2012-11-28 20:59 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On Wed, Nov 28, 2012 at 06:39:20PM +0400, Vyacheslav Dubeyko wrote:
> On Wed, 2012-11-28 at 15:21 +0100, Piotr Szymaniak wrote:
> > On Tue, Nov 27, 2012 at 08:43:01PM +0300, Vyacheslav Dubeyko wrote:
> > > Could you describe the path of issue reproducing? What filesystem
> > > operations took place near before the issue?
> > 
> > Reproduce: Use a computer with nilfs2? No idea... I notice this issue
> > from time to time. I suppose it's more visible when the system is idle
> > and then I notice a lot of CPU load without any reason (also as I wrote
> > before - it stops immediately after runnig sync).
> > 
> 
> Ok. Could you share output of "top" and "ps ax" for the case of issue
> reproducing? These outputs can be helpful for beginning of analysis.

Already attached [1] "ps ax" output before. Don't know if top can get
something usefull here, but I can try to "catch" it.

[1] http://www.spinics.net/lists/linux-nilfs/msg01481.html

Piotr Szymaniak.
-- 
 Moim jedynym bogiem jest wódka.  Moi święci to marki, franki i dolary.
Jedyna świątynia,  którą odwiedzam,  to pochwa kobiety. A mój sakrament
to pieprzenie.
  -- Nelson DeMille, "The Lion's Game"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-28 20:59                                     ` Piotr Szymaniak
@ 2012-11-29  7:00                                       ` Vyacheslav Dubeyko
  2013-03-28 12:07                                         ` Piotr Szymaniak
  2013-01-08 13:22                                       ` Vyacheslav Dubeyko
  1 sibling, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2012-11-29  7:00 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2012-11-28 at 21:59 +0100, Piotr Szymaniak wrote:
> On Wed, Nov 28, 2012 at 06:39:20PM +0400, Vyacheslav Dubeyko wrote:
> > On Wed, 2012-11-28 at 15:21 +0100, Piotr Szymaniak wrote:
> > > On Tue, Nov 27, 2012 at 08:43:01PM +0300, Vyacheslav Dubeyko wrote:
> > > > Could you describe the path of issue reproducing? What filesystem
> > > > operations took place near before the issue?
> > > 
> > > Reproduce: Use a computer with nilfs2? No idea... I notice this issue
> > > from time to time. I suppose it's more visible when the system is idle
> > > and then I notice a lot of CPU load without any reason (also as I wrote
> > > before - it stops immediately after runnig sync).
> > > 
> > 
> > Ok. Could you share output of "top" and "ps ax" for the case of issue
> > reproducing? These outputs can be helpful for beginning of analysis.
> 
> Already attached [1] "ps ax" output before. Don't know if top can get
> something usefull here, but I can try to "catch" it.
> 
> [1] http://www.spinics.net/lists/linux-nilfs/msg01481.html
> 

Thank you.

I think that it needs to use such tools for the issue investigation:
1. echo t > /proc/sysrq-trigger should tell us what the flusher is doing.
2. The /proc/meminfo and /proc/sys/vm/dirty_ratio output for the issue time can be also helpful.
3. It makes sense to use blktrace for getting I/O traces during flushing during sync command execution and near before issue was occurred. I/O traces can be useful for analysis but this output can contain huge info amount. So, I think first of all it needs to get info about what flusher doing during eating CPU time.

With the best regards,
Vyacheslav Dubeyko.


> Piotr Szymaniak.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-28 20:59                                     ` Piotr Szymaniak
  2012-11-29  7:00                                       ` Vyacheslav Dubeyko
@ 2013-01-08 13:22                                       ` Vyacheslav Dubeyko
  1 sibling, 0 replies; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2013-01-08 13:22 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Nov 28, 2012, at 11:59 PM, Piotr Szymaniak wrote:

> On Wed, Nov 28, 2012 at 06:39:20PM +0400, Vyacheslav Dubeyko wrote:
>> On Wed, 2012-11-28 at 15:21 +0100, Piotr Szymaniak wrote:
>>> On Tue, Nov 27, 2012 at 08:43:01PM +0300, Vyacheslav Dubeyko wrote:
>>>> Could you describe the path of issue reproducing? What filesystem
>>>> operations took place near before the issue?
>>> 
>>> Reproduce: Use a computer with nilfs2? No idea... I notice this issue
>>> from time to time. I suppose it's more visible when the system is idle
>>> and then I notice a lot of CPU load without any reason (also as I wrote
>>> before - it stops immediately after runnig sync).
>>> 
>> 

I have reproduced the issue by chance. The reproducibility of the issue is 100%.

The reproducing path is:
1. Generate some file about 100 - 500 GB (for example, by: "dd if=/dev/zero of=<file_name> bs=1048576 count=102400"). The size of the file defines the duration of the issue reproducibility.
2. Try to execute "rm", "truncate" or "dd" command. In the case of "dd" it needs to use without notrunc option (for example, "dd if=/dev/urandom of=<file_name> bs=1024 seek=51200 count=10").
3. As a result you can see flush kernel thread that uses 60 - 100 % of CPU time during 5 - 40 minutes:

top - 23:48:53 up  6:11, 15 users,  load average: 1.71, 1.16, 1.81
Tasks: 226 total,   3 running, 223 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.3%us, 94.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7640788k total,  3991784k used,  3649004k free,    28196k buffers
Swap:  1952764k total,        0k used,  1952764k free,  3126224k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                      
12358 root      20   0 99484  668  544 R   97  0.0   0:55.27 dd                                                                                                                           
 3027 root      20   0     0    0    0 R   87  0.0  31:46.94 flush-8:32

I suspect the nilfs_mdt_write_page() but I need a more time for the issue investigation.

With the best regards,
Vyacheslav Dubeyko.

>> Ok. Could you share output of "top" and "ps ax" for the case of issue
>> reproducing? These outputs can be helpful for beginning of analysis.
> 
> Already attached [1] "ps ax" output before. Don't know if top can get
> something usefull here, but I can try to "catch" it.
> 
> [1] http://www.spinics.net/lists/linux-nilfs/msg01481.html
> 
> Piotr Szymaniak.
> -- 
> Moim jedynym bogiem jest wódka.  Moi święci to marki, franki i dolary.
> Jedyna świątynia,  którą odwiedzam,  to pochwa kobiety. A mój sakrament
> to pieprzenie.
>  -- Nelson DeMille, "The Lion's Game"

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2012-11-29  7:00                                       ` Vyacheslav Dubeyko
@ 2013-03-28 12:07                                         ` Piotr Szymaniak
  2013-03-28 19:22                                           ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2013-03-28 12:07 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 4738 bytes --]

On Thu, Nov 29, 2012 at 11:00:59AM +0400, Vyacheslav Dubeyko wrote:
> Thank you.

Got this issue again (sincerely, I'm a bit lost and don't remember if
there was a patch for this?)


> I think that it needs to use such tools for the issue investigation:
> 1. echo t > /proc/sysrq-trigger should tell us what the flusher is
> doing.

Attached.


> 2. The /proc/meminfo and /proc/sys/vm/dirty_ratio output for the issue
> time can be also helpful.

MemTotal:        2074936 kB
MemFree:          696664 kB
Buffers:             940 kB
Cached:          1036912 kB
SwapCached:            0 kB
Active:           758376 kB
Inactive:         448196 kB
Active(anon):     168788 kB
Inactive(anon):      452 kB
Active(file):     589588 kB
Inactive(file):   447744 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       1187784 kB
HighFree:         535312 kB
LowTotal:         887152 kB
LowFree:          161352 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                24 kB
Writeback:             0 kB
AnonPages:        168732 kB
Mapped:            44004 kB
Shmem:               520 kB
Slab:             164796 kB
SReclaimable:     143208 kB
SUnreclaim:        21588 kB
KernelStack:         912 kB
PageTables:         1380 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1037468 kB
Committed_AS:     273168 kB
VmallocTotal:     122880 kB
VmallocUsed:        1136 kB
VmallocChunk:      92356 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
DirectMap4k:       12280 kB
DirectMap4M:      897024 kB


dirty_ratio is 20


> 3. It makes sense to use blktrace for getting I/O traces during
> flushing during sync command execution and near before issue was
> occurred. I/O traces can be useful for analysis but this output can
> contain huge info amount. So, I think first of all it needs to get
> info about what flusher doing during eating CPU time.

This seems to be "not fixed by sync", so if you can elaborate about this
I can test it.

Extra info:
wloczykij ~ (: grep -vE '^(#|$)' /etc/nilfs_cleanerd.conf
protection_period       3600
min_clean_segments      12%
max_clean_segments      30%
clean_check_interval    10
selection_policy        timestamp       # timestamp in ascend order
nsegments_per_clean     2
mc_nsegments_per_clean  4
cleaning_interval       10
mc_cleaning_interval    1
retry_interval          60
use_mmap
log_priority            info

I'm using vanilla kernel 3.6.10.

Current snapshot and checkpoint state:
wloczykij ~ # lscp -s
                 CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
wloczykij ~ # lscp | head
                 CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
              140088  2013-03-18 11:38:03   cp    -          119      48230
              140089  2013-03-18 11:48:05   cp    -          106      48230
              140090  2013-03-18 11:58:07   cp    -           85      48230
              140091  2013-03-18 12:05:43   cp    -          119      48231
              140092  2013-03-18 12:05:45   cp    -           40      48226
              140093  2013-03-18 12:15:46   cp    -          320      48325
              140094  2013-03-18 12:17:11   cp    -          139      48225
              140095  2013-03-18 12:17:15   cp    -           42      48225
              140096  2013-03-18 12:17:17   cp    -           21      48225
wloczykij ~ # lscp | tail
              142148  2013-03-28 12:59:38   cp    -           29      49204
              142149  2013-03-28 12:59:44   cp    -           29      49204
              142150  2013-03-28 13:00:20   cp    -          138      49208
              142151  2013-03-28 13:01:50   cp    -           40      49205
              142152  2013-03-28 13:02:23   cp    -           29      49205
              142153  2013-03-28 13:02:23   cp    -           29      49205
              142154  2013-03-28 13:02:27   cp    -           29      49205
              142155  2013-03-28 13:02:32   cp    -           29      49205
              142156  2013-03-28 13:03:02   cp    -           58      49205
              142157  2013-03-28 13:03:02   cp    -           28      49205



Piotr Szymaniak.
-- 
I ten smród. Diabli wiedzą, co tam gniło, w tym mięsiwie, ale Redowi
wydało się, że sto tysięcy rozbitych cuchnących jaj wylanych na sto
tysięcy cuchnących rybich łbów i zdechłych kotów nie może śmierdzieć
tak, jak śmierdziała ta maź.
 -- Arkadij i Borys Strugaccy, „Piknik na skraju drogi”

[-- Attachment #1.2: psax --]
[-- Type: text/plain, Size: 5565 bytes --]

  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:31 init [3]  
    2 ?        S      0:00 [kthreadd]
    3 ?        S      1:02 [ksoftirqd/0]
    5 ?        S<     0:00 [kworker/0:0H]
    6 ?        S      0:00 [kworker/u:0]
    7 ?        S<     0:00 [kworker/u:0H]
    8 ?        S<     0:00 [khelper]
    9 ?        S      0:00 [kdevtmpfs]
   10 ?        S      0:00 [kworker/u:1]
  158 ?        S      0:00 [bdi-default]
  160 ?        S<     0:00 [kblockd]
  239 ?        S<     0:00 [ata_sff]
  246 ?        S      0:00 [khubd]
  350 ?        S<     0:00 [rpciod]
  365 ?        S      0:32 [kswapd0]
  366 ?        S      0:00 [fsnotify_mark]
  368 ?        S<     0:00 [crypto]
  471 ?        S<     0:00 [deferwq]
  474 ?        S      0:00 [scsi_eh_0]
  475 ?        S      1:03 [usb-storage]
  488 ?        S<     0:24 [kworker/0:1H]
  497 ?        S      0:12 [segctord]
  589 ?        Ss     0:00 /sbin/udevd --daemon
  647 ?        S      0:00 [scsi_eh_1]
  657 ?        S      0:00 [scsi_eh_2]
  937 ?        S    49157:04 [flush-8:0]
 1046 ?        S<     0:00 [kdmflush]
 1066 ?        S<     0:00 [kcryptd_io]
 1067 ?        S<     0:00 [kcryptd]
 1094 ?        S<     0:00 [kdmflush]
 1095 ?        S<     0:00 [kcryptd_io]
 1096 ?        S<     0:00 [kcryptd]
 1225 ?        Ssl    2:42 /sbin/apcupsd -f /etc/apcupsd/apcupsd.conf -P /var/run/apcupsd.pid
 1240 ?        S      0:00 supervising syslog-ng                                                                       
 1241 ?        Ssl    0:30 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --pidfile /var/run/syslog-ng.pid
 1255 ?        Ss     0:11 /usr/sbin/crond
 1342 ?        Ss     0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
 1344 ?        S      1:16 nginx: worker process                   
 1345 ?        S      2:18 nginx: worker process                   
 1346 ?        S      2:15 nginx: worker process                   
 1347 ?        S      2:10 nginx: worker process                   
 1359 ?        Ss     2:54 /usr/sbin/ntpd -p /var/run/ntpd.pid
 1442 ?        Ss     0:15 /usr/libexec/postfix/master -w
 1445 ?        S      0:02 qmgr -l -t unix -u
 1460 ?        Ss     0:00 /usr/bin/rsync --daemon
 1468 ?        S      0:00 tlsmgr -l -t unix -u
 1520 ?        Ss     0:25 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
 1525 ?        S      0:00 [btrfs-genwork-1]
 1526 ?        S      0:19 [btrfs-submit-1]
 1527 ?        S      0:00 [btrfs-delalloc-]
 1528 ?        S      0:00 [btrfs-fixup-1]
 1531 ?        S      0:00 [btrfs-endio-met]
 1533 ?        S      0:00 [btrfs-freespace]
 1535 ?        S      0:01 [btrfs-cache-1]
 1536 ?        S      0:00 [btrfs-readahead]
 1537 ?        S      0:00 [btrfs-cleaner]
 1538 ?        S      3:45 [btrfs-transacti]
 1541 tty1     Ss+    0:00 /sbin/agetty 38400 tty1 linux
 1542 tty2     Ss+    0:00 /sbin/agetty 38400 tty2 linux
 1543 tty3     Ss+    0:00 /sbin/agetty 38400 tty3 linux
 1544 tty4     Ss+    0:00 /sbin/agetty 38400 tty4 linux
 1545 tty5     Ss+    0:00 /sbin/agetty 38400 tty5 linux
 1546 tty6     Ss+    0:00 /sbin/agetty 38400 tty6 linux
 1600 ?        Sl   3227:31 /usr/bin/tor -f /etc/tor/torrc --runasdaemon 1 --PidFile /var/run/tor/tor.pid
 1669 ?        Ss     5:37 SCREEN
 1670 pts/1    Ssl+  22:23 irssi
 1671 pts/2    Ss+   35:30 ekg2 -m
 1672 pts/3    Ss+   10:51 mutt -y
 1834 ?        Ss     2:30 SCREEN
 1835 pts/4    Ss     0:00 -/bin/bash
 1839 pts/4    T      1:01 ekg2 -m
 1843 pts/5    Ss     0:00 -/bin/bash
 1847 pts/5    Sl+   25:16 irssi
 2086 pts/8    Ss     0:00 -/bin/bash
 2094 pts/8    S      0:00 su -
 2097 pts/8    S      0:01 -su
 2130 ?        Ss     0:00 /usr/bin/svnserve --foreground --daemon --root=/home/services/svn
 5330 ?        Ss     0:00 /usr/sbin/sshd
 5360 ?        Sl     1:49 /usr/bin/monit
 8080 ?        Ss     0:00 /usr/sbin/privoxy --pidfile /var/run/privoxy.pid --user privoxy.privoxy /etc/privoxy/config
12034 ?        S      0:00 [btrfs-endio-met]
12624 ?        Ss     0:19 /usr/sbin/unbound -c /etc/unbound/unbound.conf
14098 pts/4    S+    26:16 ekg2 -m
14977 ?        Ss     0:00 sshd: barbaz [priv]
14981 ?        S      0:00 sshd: barbaz@pts/0
14982 pts/0    Ss     0:00 -bash
14986 pts/0    S+     0:00 screen -r
15113 ?        S      0:00 [btrfs-worker-2]
15292 ?        S      0:00 [btrfs-endio-wri]
15754 ?        S      0:00 pickup -l -t unix -u
15980 ?        S      0:00 [kworker/0:2]
16006 ?        S      0:00 [kworker/0:0]
16022 ?        Ss     0:00 sshd: foobar [priv]
16025 ?        S      0:00 sshd: foobar@pts/6
16026 pts/6    Ss     0:00 -bash
16034 pts/6    S+     0:00 screen -r
16150 ?        Ss     0:00 /sbin/nilfs_cleanerd -n /dev/sda3 /
16160 ?        S      0:00 [kworker/0:1]
16164 ?        S      0:00 [btrfs-endio-wri]
16166 ?        S      0:00 smtpd -n smtp -t inet -u -o stress= -s 2
16168 ?        S      0:00 proxymap -t unix -u
16169 ?        S      0:00 trivial-rewrite -n rewrite -t unix -u
16170 ?        S      0:00 cleanup -z -t unix -u
16171 ?        S      0:00 smtp -t unix -u
16172 ?        S      0:00 bounce -z -n defer -t unix -u
16181 pts/8    R+     0:00 ps ax
19174 ?        Ss     0:00 ssh-agent
19200 ?        Ss     1:06 gpg-agent --daemon
21519 ?        S      0:07 [flush-btrfs-1]
27995 ?        S      0:07 [btrfs-endio-2]
29537 ?        S      0:00 [btrfs-delayed-m]

[-- Attachment #1.3: sysrq-trigger.bz2 --]
[-- Type: application/x-bzip2, Size: 16450 bytes --]

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-28 12:07                                         ` Piotr Szymaniak
@ 2013-03-28 19:22                                           ` Vyacheslav Dubeyko
       [not found]                                             ` <E24177B1-2EF0-4274-BD13-4642E3663D08-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-28 19:22 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Piotr,

On Mar 28, 2013, at 3:07 PM, Piotr Szymaniak wrote:

> On Thu, Nov 29, 2012 at 11:00:59AM +0400, Vyacheslav Dubeyko wrote:
>> Thank you.
> 
> Got this issue again (sincerely, I'm a bit lost and don't remember if
> there was a patch for this?)
> 
> 
>> I think that it needs to use such tools for the issue investigation:
>> 1. echo t > /proc/sysrq-trigger should tell us what the flusher is
>> doing.
> 
> Attached.
> 

Thank you for additional details. Unfortunately, your sysrq-trigger output is not complete. So, I can't make conclusion about what operation was a reason of issue on your side. Could you send to me the full log of sysrq-trigger output?

I can easily reproduce the issue by big file (100 - 500 GB) deletion or truncation. Please, find description in: http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01504.html.

This issue doesn't fixed yet. The issue has complex nature. I described the reason of the issue in e-mail: http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01547.html. I think that the issue requires long-term implementation efforts. First of all, it makes sense to implement extents support. If it doesn't resolve the issue completely then it needs to implement special technique of file deletion/truncation or to modify technique working with checkpoints.

Currently, I don't start this implementation because of investigation of the issue with "bad b-tree node" issue. I have reproduction path for the issue (http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01558.html) and I am trying to localize the reason of the issue. But I worry that I can reproduce this issue for 1 KB block size. So, it can be a 1 KB block size related issue.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]                                             ` <E24177B1-2EF0-4274-BD13-4642E3663D08-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-03-29  7:42                                               ` Piotr Szymaniak
  2013-03-29  8:15                                                 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2013-03-29  7:42 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1747 bytes --]

On Thu, Mar 28, 2013 at 10:22:52PM +0300, Vyacheslav Dubeyko wrote:
> Thank you for additional details. Unfortunately, your sysrq-trigger
> output is not complete. So, I can't make conclusion about what
> operation was a reason of issue on your side. Could you send to me the
> full log of sysrq-trigger output?

How to generete more verbose log?

Btw looking at ps aux output it seems, that this flush is hanging there
almost from first boot:
root       937 88.2  0.0      0     0 ?        S    Feb17 50160:39 [flush-8:0]

And an uptime:
08:25:18 up 39 days, 10:57,  4 users,  load average: 0.72, 0.89, 0.98

Also I don't know if this isn't some kind of regression. I'm using nilfs
for, well, some time now (looking at fs creation date Aug 2011) and
didn't noticed any strange behaviour before. I think I won't be able to
check this, but before that last boot I used (95% sure) vanilla kernel
3.4.4 and it was not flushing things or I didn't noticed. I could go to
some older LTS kernel on other machine and check that.

> I can easily reproduce the issue by big file (100 - 500 GB) deletion
> or truncation. Please, find description in:
> http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01504.html.

If this brings something new - I'm not using huge files like that (this
flush above is a 3.7G device). But if this reproduces the issue, it
could be related.  (:

Piotr Szymaniak.
-- 
Ogólnie  rzecz  biorąc,  chodzi o różnice w objętości  i  spiczastości.
Kiedy oglądam zdjęcia w "National  Geographic",  widzę, że cycki kobiet
niewiele  się różnią od tego,  co ma koza czy krowa - tyle że są trochę
bardziej niewygodne.
  -- William Wharton, "Birdy"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-29  7:42                                               ` Piotr Szymaniak
@ 2013-03-29  8:15                                                 ` Vyacheslav Dubeyko
  2013-03-29 10:37                                                   ` Piotr Szymaniak
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-29  8:15 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 2013-03-29 at 08:42 +0100, Piotr Szymaniak wrote:
> On Thu, Mar 28, 2013 at 10:22:52PM +0300, Vyacheslav Dubeyko wrote:
> > Thank you for additional details. Unfortunately, your sysrq-trigger
> > output is not complete. So, I can't make conclusion about what
> > operation was a reason of issue on your side. Could you send to me the
> > full log of sysrq-trigger output?
> 
> How to generete more verbose log?
> 

I meant that sysrq-trigger output ends with resume about runnable tasks
and locks in the system (for example):

[ 8670.960040] runnable tasks:
[ 8670.960040]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 8670.960040] ----------------------------------------------------------------------------------------------------------
[ 8670.960040]      migration/1    14         0.002037      2223     0         0.002037      2308.610315         0.000000 /
[ 8670.960040]      kworker/1:1    29   2713578.453275     43727   120   2713578.453275      2098.177537   8659787.039783 /
[ 8670.960040] R           bash 11089        61.102602       142   120        61.102602        39.773082     46581.905519 /autogroup-171
[ 8670.960040] 
[ 8670.960040] 
[ 8670.960040] Showing all locks held in the system:
[ 8670.960040] 1 lock held by getty/892:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/901:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/927:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/930:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/937:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/1214:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/9572:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/9797:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/9879:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/10021:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/10224:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 2 locks held by bash/11089:
[ 8670.960040]  #0:  (sysrq_key_table_lock){......}, at: [<ffffffff814689c2>] __handle_sysrq+0x32/0x190
[ 8670.960040]  #1:  (tasklist_lock){.?.?.-}, at: [<ffffffff810bf4e4>] debug_show_all_locks+0x44/0x1e0
[ 8670.960040] 
[ 8670.960040] =============================================

So, I need in full sysrq-trigger output for understanding situation on
your side. Could you share it?


> Btw looking at ps aux output it seems, that this flush is hanging there
> almost from first boot:
> root       937 88.2  0.0      0     0 ?        S    Feb17 50160:39 [flush-8:0]
> 

Could you share "cat /proc/partitions" and "mount" outputs? I need to
understand what partition is processed by [flush-8:0].

> And an uptime:
> 08:25:18 up 39 days, 10:57,  4 users,  load average: 0.72, 0.89, 0.98
> 

Yes, it can define a reason of the issue on your side.

> Also I don't know if this isn't some kind of regression. I'm using nilfs
> for, well, some time now (looking at fs creation date Aug 2011) and
> didn't noticed any strange behaviour before. I think I won't be able to
> check this, but before that last boot I used (95% sure) vanilla kernel
> 3.4.4 and it was not flushing things or I didn't noticed. I could go to
> some older LTS kernel on other machine and check that.
> 
> 
> > I can easily reproduce the issue by big file (100 - 500 GB) deletion
> > or truncation. Please, find description in:
> > http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01504.html.
> 
> If this brings something new - I'm not using huge files like that (this
> flush above is a 3.7G device). But if this reproduces the issue, it
> could be related.  (:
> 

Yes, it is really important to understand the situation on your side.
Because you can have another reason of the issue with similar symptoms.
So, we need to investigate your case more deeply, I think.

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-29  8:15                                                 ` Vyacheslav Dubeyko
@ 2013-03-29 10:37                                                   ` Piotr Szymaniak
  2013-03-29 11:29                                                     ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2013-03-29 10:37 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2927 bytes --]

On Fri, Mar 29, 2013 at 12:15:10PM +0400, Vyacheslav Dubeyko wrote:
> I meant that sysrq-trigger output ends with resume about runnable tasks
> and locks in the system (for example):
> *snip*
> So, I need in full sysrq-trigger output for understanding situation on
> your side. Could you share it?

If this should be in echo t > /proc/sysrq-trigger then I shared full
output already. Maybe I'm missing some (debug?) features in kernel for
more verbose output.


> > Btw looking at ps aux output it seems, that this flush is hanging there
> > almost from first boot:
> > root       937 88.2  0.0      0     0 ?        S    Feb17 50160:39 [flush-8:0]
> > 
> 
> Could you share "cat /proc/partitions" and "mount" outputs? I need to
> understand what partition is processed by [flush-8:0].

~ # cat /proc/partitions
major minor  #blocks  name

   8        0    3915776 sda
   8        1      32768 sda1
   8        2      65536 sda2
   8        3    3816448 sda3
   8       16  156290904 sdb
   8       17    4200966 sdb1
   8       18     273105 sdb2
   8       19  146810002 sdb3
   8       20    5004247 sdb4
 254        0      65536 dm-0
 254        1  146808974 dm-1


~ # mount
rootfs on / type rootfs (rw)
/dev/sda3 on / type nilfs2 (rw,noatime,nodiratime,gcpid=16150)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=10240k,nr_inodes=221714,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
/dev/mapper/chome on /home type btrfs (rw,noatime)


> > > I can easily reproduce the issue by big file (100 - 500 GB) deletion
> > > or truncation. Please, find description in:
> > > http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01504.html.
> > 
> > If this brings something new - I'm not using huge files like that (this
> > flush above is a 3.7G device). But if this reproduces the issue, it
> > could be related.  (:
> > 
> 
> Yes, it is really important to understand the situation on your side.
> Because you can have another reason of the issue with similar symptoms.
> So, we need to investigate your case more deeply, I think.

Got this issue on two different machines, but none of them uses nilfs2 at
device bigger then 60G (in fact, the partitions are, afair, around 30G and the
mentioned 3.7G).


Piotr Szymaniak.
-- 
I ten smród. Diabli wiedzą, co tam gniło, w tym mięsiwie, ale Redowi
wydało się, że sto tysięcy rozbitych cuchnących jaj wylanych na sto
tysięcy cuchnących rybich łbów i zdechłych kotów nie może śmierdzieć
tak, jak śmierdziała ta maź.
 -- Arkadij i Borys Strugaccy, „Piknik na skraju drogi”

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-29 10:37                                                   ` Piotr Szymaniak
@ 2013-03-29 11:29                                                     ` Vyacheslav Dubeyko
  2013-03-29 13:57                                                       ` Piotr Szymaniak
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-29 11:29 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 2013-03-29 at 11:37 +0100, Piotr Szymaniak wrote:
> On Fri, Mar 29, 2013 at 12:15:10PM +0400, Vyacheslav Dubeyko wrote:
> > I meant that sysrq-trigger output ends with resume about runnable tasks
> > and locks in the system (for example):
> > *snip*
> > So, I need in full sysrq-trigger output for understanding situation on
> > your side. Could you share it?
> 
> If this should be in echo t > /proc/sysrq-trigger then I shared full
> output already. Maybe I'm missing some (debug?) features in kernel for
> more verbose output.
> 

I haven't any special additional configuration options for MagicSysRq in
my kernel configuration. It is simply enabled.

As I see (https://www.kernel.org/doc/Documentation/sysrq.txt), it is
possible to define log level by means of setting value
in /proc/sys/kernel/sysrq. I have "1" (enable all functions of sysrq) in
this file. Maybe, have you another log level on your side?

> 
> > > Btw looking at ps aux output it seems, that this flush is hanging there
> > > almost from first boot:
> > > root       937 88.2  0.0      0     0 ?        S    Feb17 50160:39 [flush-8:0]
> > > 
> > 
> > Could you share "cat /proc/partitions" and "mount" outputs? I need to
> > understand what partition is processed by [flush-8:0].
> 
> ~ # cat /proc/partitions
> major minor  #blocks  name
> 
>    8        0    3915776 sda
>    8        1      32768 sda1
>    8        2      65536 sda2
>    8        3    3816448 sda3
>    8       16  156290904 sdb
>    8       17    4200966 sdb1
>    8       18     273105 sdb2
>    8       19  146810002 sdb3
>    8       20    5004247 sdb4
>  254        0      65536 dm-0
>  254        1  146808974 dm-1
> 
> 
> ~ # mount
> rootfs on / type rootfs (rw)
> /dev/sda3 on / type nilfs2 (rw,noatime,nodiratime,gcpid=16150)
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
> udev on /dev type devtmpfs (rw,nosuid,relatime,size=10240k,nr_inodes=221714,mode=755)
> devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620)
> shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> /dev/mapper/chome on /home type btrfs (rw,noatime)
> 

I see two nilfs_cleanerd deamons from "ps ax" output:

1520 ?        Ss     0:25 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
16150 ?        Ss     0:00 /sbin/nilfs_cleanerd -n /dev/sda3 /

But I can't understand what partition is serviced by nilfs_cleanerd with
pid #1520. Could you share more details about it?

Thanks,
Vyacheslav Dubeyko.

> 
> > > > I can easily reproduce the issue by big file (100 - 500 GB) deletion
> > > > or truncation. Please, find description in:
> > > > http://www.mail-archive.com/linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01504.html.
> > > 
> > > If this brings something new - I'm not using huge files like that (this
> > > flush above is a 3.7G device). But if this reproduces the issue, it
> > > could be related.  (:
> > > 
> > 
> > Yes, it is really important to understand the situation on your side.
> > Because you can have another reason of the issue with similar symptoms.
> > So, we need to investigate your case more deeply, I think.
> 
> Got this issue on two different machines, but none of them uses nilfs2 at
> device bigger then 60G (in fact, the partitions are, afair, around 30G and the
> mentioned 3.7G).
> 
> 
> Piotr Szymaniak.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-29 11:29                                                     ` Vyacheslav Dubeyko
@ 2013-03-29 13:57                                                       ` Piotr Szymaniak
  2013-03-31 11:14                                                         ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2013-03-29 13:57 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]

On Fri, Mar 29, 2013 at 03:29:17PM +0400, Vyacheslav Dubeyko wrote:
> I haven't any special additional configuration options for MagicSysRq in
> my kernel configuration. It is simply enabled.
> 
> As I see (https://www.kernel.org/doc/Documentation/sysrq.txt), it is
> possible to define log level by means of setting value
> in /proc/sys/kernel/sysrq. I have "1" (enable all functions of sysrq) in
> this file. Maybe, have you another log level on your side?

Than this must depend on some different option as I also have "1" in
/proc/sys/kernel/sysrq.


> > ~ # mount
> > rootfs on / type rootfs (rw)
> > /dev/sda3 on / type nilfs2 (rw,noatime,nodiratime,gcpid=16150)
> > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> > tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
> > udev on /dev type devtmpfs (rw,nosuid,relatime,size=10240k,nr_inodes=221714,mode=755)
> > devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620)
> > shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
> > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> > /dev/mapper/chome on /home type btrfs (rw,noatime)
> > 
> 
> I see two nilfs_cleanerd deamons from "ps ax" output:
> 
> 1520 ?        Ss     0:25 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
> 16150 ?        Ss     0:00 /sbin/nilfs_cleanerd -n /dev/sda3 /
> 
> But I can't understand what partition is serviced by nilfs_cleanerd with
> pid #1520. Could you share more details about it?

~ # readlink /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
../../sda3

I suppose that this is caused by manually (well, by script, but it's not
"remount") running nilfs_cleanerd on system startup on that device.
After that there was a remount that started second cleaner. And it seems
that this was yesterday:
~ # ps aux | grep nilfs
root      1520  0.0  0.0   2572  1032 ?        Ss   Feb17   0:25
/sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf
/dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
root     16150  0.0  0.0   2348   796 ?        Ss   Mar28   0:00
/sbin/nilfs_cleanerd -n /dev/sda3 /

I think that running cleanerd manually is already changed to -o remount
on my second machine with nilfs and I just forgot about this one. I had
some issues with that because of missing /dev/root.


Piotr Szymaniak.
-- 
Na  kazda  strzelanine   zakonczona  czyjas  smiercia  przypadaja  trzy
strzelaniny  bez ofiar.  To po prostu  niemozliwe  do  zaakceptowania w
Ameryce. Po prostu nie do zaakceptowania. Musimy cos z tym zrobic.
  -- George W. Bush

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-29 13:57                                                       ` Piotr Szymaniak
@ 2013-03-31 11:14                                                         ` Vyacheslav Dubeyko
       [not found]                                                           ` <1D751B59-8E60-4D32-80CA-734BD25EE12C-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2013-03-31 11:14 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


On Mar 29, 2013, at 4:57 PM, Piotr Szymaniak wrote:

> On Fri, Mar 29, 2013 at 03:29:17PM +0400, Vyacheslav Dubeyko wrote:
>> I haven't any special additional configuration options for MagicSysRq in
>> my kernel configuration. It is simply enabled.
>> 
>> As I see (https://www.kernel.org/doc/Documentation/sysrq.txt), it is
>> possible to define log level by means of setting value
>> in /proc/sys/kernel/sysrq. I have "1" (enable all functions of sysrq) in
>> this file. Maybe, have you another log level on your side?
> 
> Than this must depend on some different option as I also have "1" in
> /proc/sys/kernel/sysrq.
> 

Your sysrq-triger output is really strange. The sysrq-triger takes momentary system state snapshot. So, this output makes sense only inside issue occurrence. And I feel that this output was taken after issue ending. Or, maybe, it needs to make several trying of getting sysrq-triger output during issue reproducibility.

I think so because flusher doesn't make something NILFS2 related and, moreover, it simply sleeps:

Mar 28 12:57:05 wloczykij kernel: [3338967.379023] flush-8:0       S f6370c34     0   937      2 0x00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  f57ad030 00000046 f6370c34 f6370c34 f57ad030 000040fd c104f550 f5777f40
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  00001d59 00000000 0000000e 0000001f f6370c34 00000001 00000000 00000012
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  c108085c 000040fd 000081fb f6370cac 00000000 c1481000 c700069f c1481000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023] Call Trace:
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c104f550>] ? bdi_dirty_limit+0x21/0xc3
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c108085c>] ? over_bground_thresh+0x3a/0x51
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12ad2af>] ? schedule_timeout+0xc0/0xdd
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c10226ba>] ? usleep_range+0x38/0x38
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c1081475>] ? bdi_writeback_thread+0x9b/0xd2
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c10813da>] ? wb_do_writeback+0x134/0x134
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102ba9b>] ? kthread+0x67/0x6c
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102ba34>] ? kthread_worker_fn+0x7f/0x7f
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12aee36>] ? kernel_thread_helper+0x6/0xd

The nilfs_cleanerd daemons sleep also:

Mar 28 12:57:05 wloczykij kernel: [3338967.379023] nilfs_cleanerd  S ec179f24     0  1520      1 0x00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  f5037cb0 00000086 c13beb68 ec179f24 f5037cb0 c113e520 f57dff20 a56028b0
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  12ef93e5 00000286 00000000 00000000 00000286 00000000 12ef93e5 00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  00000000 00000078 c102e2e7 00000000 00000000 00000001 12ef93e5 00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023] Call Trace:
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c113e520>] ? timerqueue_add+0x72/0x92
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102e2e7>] ? hrtimer_start_range_ns+0xf/0x13
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12ad81c>] ? schedule_hrtimeout_range_clock+0xb9/0xde
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102df7d>] ? update_rmtp+0x5e/0x5e
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c11196fd>] ? wq_sleep+0x84/0xd4
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c1119f66>] ? sys_mq_timedreceive+0x109/0x2f6
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12ae46d>] ? syscall_call+0x7/0xb

Mar 28 12:57:05 wloczykij kernel: [3338967.379023] nilfs_cleanerd  S f57dff24     0 16150      1 0x00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  f569a670 00000082 c13beb68 f57dff24 f569a670 c113e520 ec179f20 a77736f0
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  12ef93e7 00000286 00000000 00000000 00000286 00000000 12ef93e7 00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  00000000 00000078 c102e2e7 00000000 00000000 00000001 12ef93e7 00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023] Call Trace:
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c113e520>] ? timerqueue_add+0x72/0x92
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102e2e7>] ? hrtimer_start_range_ns+0xf/0x13
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12ad81c>] ? schedule_hrtimeout_range_clock+0xb9/0xde
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102df7d>] ? update_rmtp+0x5e/0x5e
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c11196fd>] ? wq_sleep+0x84/0xd4
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c1119f66>] ? sys_mq_timedreceive+0x109/0x2f6
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12ae46d>] ? syscall_call+0x7/0xb

The segctor thread sleeps too:

Mar 28 12:57:05 wloczykij kernel: [3338967.379023] segctord        S f63f7fac     0   497      2 0x00000000
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  f6379990 00000046 c110e2ac f63f7fac f6379990 f63760b8 e59e5978 f6379990
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  c12b32b8 c1030882 f57f3670 00000046 00000001 00000000 00000046 f6343bc0
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  ec6bfe98 f6343c48 c110fde5 00000246 00000246 f6343c54 f6343bc0 f6343c5c
Mar 28 12:57:05 wloczykij kernel: [3338967.379023] Call Trace:
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c110e2ac>] ? nilfs_dispose_list+0xdb/0xf4
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c1030882>] ? check_preempt_curr+0x20/0x5b
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c110fde5>] ? nilfs_segctor_construct+0x12e/0x193
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c1110195>] ? nilfs_segctor_thread+0x260/0x265
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102bdfc>] ? finish_wait+0x33/0x33
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c110ff35>] ? nilfs_transaction_unlock.isra.27+0x54/0x54
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102ba9b>] ? kthread+0x67/0x6c
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c102ba34>] ? kthread_worker_fn+0x7f/0x7f
Mar 28 12:57:05 wloczykij kernel: [3338967.379023]  [<c12aee36>] ? kernel_thread_helper+0x6/0xd

Maybe I misunderstand something but your sysrq-triger output doesn't contain info about any NILFS2 related activity. So, I can't see any details about the issue with kernel flusher thread in shared sysrq-triger output.

Thanks,
Vyacheslav Dubeyko.

> 
>>> ~ # mount
>>> rootfs on / type rootfs (rw)
>>> /dev/sda3 on / type nilfs2 (rw,noatime,nodiratime,gcpid=16150)
>>> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
>>> tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
>>> udev on /dev type devtmpfs (rw,nosuid,relatime,size=10240k,nr_inodes=221714,mode=755)
>>> devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620)
>>> shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
>>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>>> /dev/mapper/chome on /home type btrfs (rw,noatime)
>>> 
>> 
>> I see two nilfs_cleanerd deamons from "ps ax" output:
>> 
>> 1520 ?        Ss     0:25 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
>> 16150 ?        Ss     0:00 /sbin/nilfs_cleanerd -n /dev/sda3 /
>> 
>> But I can't understand what partition is serviced by nilfs_cleanerd with
>> pid #1520. Could you share more details about it?
> 
> ~ # readlink /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
> ../../sda3
> 
> I suppose that this is caused by manually (well, by script, but it's not
> "remount") running nilfs_cleanerd on system startup on that device.
> After that there was a remount that started second cleaner. And it seems
> that this was yesterday:
> ~ # ps aux | grep nilfs
> root      1520  0.0  0.0   2572  1032 ?        Ss   Feb17   0:25
> /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf
> /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
> root     16150  0.0  0.0   2348   796 ?        Ss   Mar28   0:00
> /sbin/nilfs_cleanerd -n /dev/sda3 /
> 
> I think that running cleanerd manually is already changed to -o remount
> on my second machine with nilfs and I just forgot about this one. I had
> some issues with that because of missing /dev/root.
> 
> 
> Piotr Szymaniak.
> -- 
> Na  kazda  strzelanine   zakonczona  czyjas  smiercia  przypadaja  trzy
> strzelaniny  bez ofiar.  To po prostu  niemozliwe  do  zaakceptowania w
> Ameryce. Po prostu nie do zaakceptowania. Musimy cos z tym zrobic.
>  -- George W. Bush

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
       [not found]                                                           ` <1D751B59-8E60-4D32-80CA-734BD25EE12C-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-03-31 13:26                                                             ` Piotr Szymaniak
  2013-04-01  6:06                                                               ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 29+ messages in thread
From: Piotr Szymaniak @ 2013-03-31 13:26 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

On Sun, Mar 31, 2013 at 03:14:00PM +0400, Vyacheslav Dubeyko wrote:
> Your sysrq-triger output is really strange. The sysrq-triger takes
> momentary system state snapshot. So, this output makes sense only
> inside issue occurrence. And I feel that this output was taken after
> issue ending. Or, maybe, it needs to make several trying of getting
> sysrq-triger output during issue reproducibility.
> 
> I think so because flusher doesn't make something NILFS2 related and, moreover, it simply sleeps:
> *snip*
> The nilfs_cleanerd daemons sleep also:
> *snip*
> The segctor thread sleeps too:
> 
> Maybe I misunderstand something but your sysrq-triger output doesn't
> contain info about any NILFS2 related activity. So, I can't see any
> details about the issue with kernel flusher thread in shared
> sysrq-triger output.

Hrm, maybe this is btrfs realated then? But I suppose I don't have any
btrfs on the second machine with flush issues.

I had a system freeze yesterday and right after reboot flush is back.
So it looks like I can reproduce it (something makes the flush thing
while booting. Maybe I can catch some debug info there?). Right now I'm
going back to 3.4.x series kernel.


Piotr Szymaniak.
-- 
Siedziałem z (...) takim, co włamał się do budki z piwem i całą noc
wiadrami nosił piwo do domu. Miał pełną wannę i akwarium, kiedy rano
go złapali.
  -- Jacek Hugo-Bader, "Egzamin na świra"

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Strange "flush" process bahaviour
  2013-03-31 13:26                                                             ` Piotr Szymaniak
@ 2013-04-01  6:06                                                               ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 29+ messages in thread
From: Vyacheslav Dubeyko @ 2013-04-01  6:06 UTC (permalink / raw)
  To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sun, 2013-03-31 at 15:26 +0200, Piotr Szymaniak wrote:

[snip]
> 
> Hrm, maybe this is btrfs realated then? But I suppose I don't have any
> btrfs on the second machine with flush issues.
> 

If you detected issue with flush-8:0 kernel thread then it is a NILFS2
related issue, as I understand. Because flush-8:0 processes /dev/sda
drive and you have NILFS2 partition on /dev/sda3 (/dev/sda3 on / type
nilfs2 (rw,noatime,nodiratime,gcpid=16150)).

> I had a system freeze yesterday and right after reboot flush is back.
> So it looks like I can reproduce it (something makes the flush thing
> while booting. Maybe I can catch some debug info there?). Right now I'm
> going back to 3.4.x series kernel.
> 

The visible symptoms of the issue (flush kernel thread "abnormal"
activity) is not a reason of the issue. Such "abnormal" activity of
flush kernel thread takes place because in nilfs_mdt_write_page() and
nilfs_write_page() can be called with wbc->for_kupdate or
wbc->for_background flags. It is made simply
redirty_page_for_writepage() for such case.

The real issue hides in other NILFS2 subsystem. And only clear
understanding of file system operation that initiates such "abnormal"
activity of flush kernel thread can be a basis for further debugging and
investigation of the issue.

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-04-01  6:06 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-12 20:19 Strange "flush" process bahaviour fwdnilfsml.to.11df-2pMamKoQTv4
     [not found] ` <50A159BC.9060600-3TRaYzfGuuI@public.gmane.org>
2012-11-13  6:13   ` Vyacheslav Dubeyko
2012-11-13 13:04     ` Strange "flush" process bahaviour (fwdnilfsml: message 2 of 20) Александр
     [not found]       ` <50A24564.7000704-3TRaYzfGuuI@public.gmane.org>
2012-11-13 13:41         ` Strange "flush" process bahaviour Vyacheslav Dubeyko
2012-11-13 14:01           ` fwdnilfsml.to.11df-2pMamKoQTv4
2012-11-15 22:58           ` fwdnilfsml.to.11df-2pMamKoQTv4
     [not found]             ` <50A5737C.2040805-3TRaYzfGuuI@public.gmane.org>
2012-11-16  3:13               ` Ryusuke Konishi
     [not found]                 ` <201211160313.AA04185-ZdTO5nnmHvkOizVVqyxoihMFgDP4sedm@public.gmane.org>
2012-11-16 15:02                   ` fwdnilfsml.to.11df-2pMamKoQTv4
     [not found]                     ` <50A65588.3000205-3TRaYzfGuuI@public.gmane.org>
2012-11-17  1:27                       ` Ryusuke Konishi
     [not found]                         ` <CAKFNMomO6ebYkXk=A9H2Wieswvogt6dj16Otj0u74M4aHOOPBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-27 15:19                           ` Piotr Szymaniak
2012-11-27 17:43                             ` Vyacheslav Dubeyko
     [not found]                               ` <295B3FE7-A44E-4D57-9A6A-B725EF1BA9EB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2012-11-28 14:21                                 ` Piotr Szymaniak
2012-11-28 14:39                                   ` Vyacheslav Dubeyko
2012-11-28 20:59                                     ` Piotr Szymaniak
2012-11-29  7:00                                       ` Vyacheslav Dubeyko
2013-03-28 12:07                                         ` Piotr Szymaniak
2013-03-28 19:22                                           ` Vyacheslav Dubeyko
     [not found]                                             ` <E24177B1-2EF0-4274-BD13-4642E3663D08-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-03-29  7:42                                               ` Piotr Szymaniak
2013-03-29  8:15                                                 ` Vyacheslav Dubeyko
2013-03-29 10:37                                                   ` Piotr Szymaniak
2013-03-29 11:29                                                     ` Vyacheslav Dubeyko
2013-03-29 13:57                                                       ` Piotr Szymaniak
2013-03-31 11:14                                                         ` Vyacheslav Dubeyko
     [not found]                                                           ` <1D751B59-8E60-4D32-80CA-734BD25EE12C-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-03-31 13:26                                                             ` Piotr Szymaniak
2013-04-01  6:06                                                               ` Vyacheslav Dubeyko
2013-01-08 13:22                                       ` Vyacheslav Dubeyko
2012-11-13 15:28   ` Piotr Szymaniak
2012-11-13 18:24     ` Vyacheslav Dubeyko
     [not found]       ` <6B9E40B0-90F8-4130-9798-87253FEDDA78-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2012-11-13 17:49         ` fwdnilfsml.to.11df-2pMamKoQTv4

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).