* Potential Bug in NILFS2: Disk Space Not Freed After File Deletion
@ 2024-08-01 15:43 Yifei Liu
2024-08-01 18:40 ` Ryusuke Konishi
0 siblings, 1 reply; 5+ messages in thread
From: Yifei Liu @ 2024-08-01 15:43 UTC (permalink / raw)
To: konishi.ryusuke, linux-nilfs; +Cc: Erez Zadok, Geoff Kuenning, Scott Smolka
Dear NILFS2 Maintainers,
I hope this message finds you well. I am writing to report a potential
bug we have encountered in NILFS2 related to disk space management
while testing it with our model checking tool, Metis. The issue arises
after performing the following operations:
Steps to Reproduce:
1. Mount the NILFS2 file system.
2. Continuously create files in the NILFS2 file system until the disk
space is completely used up (ENOSPC).
3. Delete all the files created in the previous step.
4. Sleep for 1 minute to allow the cleanerd to run.
5. Repeat steps 2-4 a few times.
Note: The protection_period parameter in nilfs_cleanerd.conf has been
changed from the default 3600 seconds to 10 seconds for quicker
observation of the bug.
Expected Behavior: After deleting all files, the disk usage should
decrease to zero or near zero, reflecting the freed space.
Observed Behavior: Occasionally, after deleting the files, the file
system remains stuck at a high usage (88% or 100% in our experiments)
and does not free any space. When we try to create another file, it
fails and reports "no space left on the device". We also tried
manually running the cleanerd once the system’s space usage was stuck
at high percentages; even though some of the segments appear to be not
protected and have 0% live blocks, according to the lssu output, the
space was still not cleaned. This issue occurs sporadically and is not
consistent across all tests (thus, we suspect it may be a race
condition).
We have created a GitHub repository containing a detailed README, the
script used to generate this problem, an example log generated in one
of our experiments, and the necessary files. Running this script and
obtaining all the outputs takes approximately 10 minutes. The script
sets up a ramdisk and mounts NILFS2 with the minimum possible size of
1028 KiB. Here is the link to the GitHub repository:
https://github.com/sbu-fsl/nilfs2-full-space.git.
I would appreciate any insights or assistance you could provide
regarding this issue. If you require any further information, logs, or
specific test cases, please let me know, and I will be happy to
provide them.
Best regards,
Yifei Liu
File systems and Storage Lab (Stony Brook University)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential Bug in NILFS2: Disk Space Not Freed After File Deletion
2024-08-01 15:43 Potential Bug in NILFS2: Disk Space Not Freed After File Deletion Yifei Liu
@ 2024-08-01 18:40 ` Ryusuke Konishi
2024-08-05 4:30 ` Yifei Liu
0 siblings, 1 reply; 5+ messages in thread
From: Ryusuke Konishi @ 2024-08-01 18:40 UTC (permalink / raw)
To: Yifei Liu; +Cc: linux-nilfs, Erez Zadok, Geoff Kuenning, Scott Smolka
On Fri, Aug 2, 2024 at 12:44 AM Yifei Liu wrote:
>
> Dear NILFS2 Maintainers,
>
> I hope this message finds you well. I am writing to report a potential
> bug we have encountered in NILFS2 related to disk space management
> while testing it with our model checking tool, Metis. The issue arises
> after performing the following operations:
>
> Steps to Reproduce:
> 1. Mount the NILFS2 file system.
> 2. Continuously create files in the NILFS2 file system until the disk
> space is completely used up (ENOSPC).
> 3. Delete all the files created in the previous step.
> 4. Sleep for 1 minute to allow the cleanerd to run.
> 5. Repeat steps 2-4 a few times.
>
> Note: The protection_period parameter in nilfs_cleanerd.conf has been
> changed from the default 3600 seconds to 10 seconds for quicker
> observation of the bug.
>
> Expected Behavior: After deleting all files, the disk usage should
> decrease to zero or near zero, reflecting the freed space.
>
> Observed Behavior: Occasionally, after deleting the files, the file
> system remains stuck at a high usage (88% or 100% in our experiments)
> and does not free any space. When we try to create another file, it
> fails and reports "no space left on the device". We also tried
> manually running the cleanerd once the system’s space usage was stuck
> at high percentages; even though some of the segments appear to be not
> protected and have 0% live blocks, according to the lssu output, the
> space was still not cleaned. This issue occurs sporadically and is not
> consistent across all tests (thus, we suspect it may be a race
> condition).
>
> We have created a GitHub repository containing a detailed README, the
> script used to generate this problem, an example log generated in one
> of our experiments, and the necessary files. Running this script and
> obtaining all the outputs takes approximately 10 minutes. The script
> sets up a ramdisk and mounts NILFS2 with the minimum possible size of
> 1028 KiB. Here is the link to the GitHub repository:
> https://github.com/sbu-fsl/nilfs2-full-space.git.
>
> I would appreciate any insights or assistance you could provide
> regarding this issue. If you require any further information, logs, or
> specific test cases, please let me know, and I will be happy to
> provide them.
>
> Best regards,
>
> Yifei Liu
> File systems and Storage Lab (Stony Brook University)
Hi Yifei,
I checked what your script was doing, and one thing I noticed was that
nilfs_cleanerd seemed to be started twice.
nilfs_cleanerd is designed to be automatically started via the
mount.nilfs2 helper program when you mount a device with the mount
command, and to be shut down via the umount.nilfs2 helper program
before actually issuing the unmount system call when you try to
unmount a device with the umount command.
Basically, this program is designed to be a resident program that runs
in the background while the device is mounted.
In your script, you run nilfs_cleanerd manually after mounting and
writing, so at this point, it seems that there are two nilfs_cleanerd
processes, and both of them are requesting GC on the same device.
If that happens, it will prevent fatal situations that would cause FS
destruction, but normal operation is not guaranteed regarding GC. So,
could you please check the existing processes with the ps command?
If you start it via the mount command, it should not be started twice
for the same device.
If you want to run GC manually, use the "nilfs-clean" command to
activate nilfs_cleanerd as follows:
# nilfs-clean -p 0 $DEVICE
If you really want to run nilfs_cleanerd manually, specify "nogc"
mount option when mounting:
# mount -o nogc $DEVICE $MOUNT_POINT
In this case, you need to manually kill nilfs_cleanerd when unmounting.
Depending on your environment, you may need to specify the file system manually:
# mount -t nilfs2 -o nogc $DEVICE $MOUNT_POINT
Also, the version of nilfs-utils used is old, so in order to isolate
known bugs, it would be helpful if you could use the latest version of
nilfs-utils-2.2.11 (or nilfs-utils 2.3.0-dev) for testing.
You can download the latest version tarball from the site [1] or from
github as described in [2].
[1] https://nilfs.sourceforge.io/en/download.html
[2] https://nilfs.sourceforge.io/en/git_repos.html
Thank you.
Ryusuke Konishi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential Bug in NILFS2: Disk Space Not Freed After File Deletion
2024-08-01 18:40 ` Ryusuke Konishi
@ 2024-08-05 4:30 ` Yifei Liu
2024-08-06 2:42 ` Ryusuke Konishi
0 siblings, 1 reply; 5+ messages in thread
From: Yifei Liu @ 2024-08-05 4:30 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: linux-nilfs, Erez Zadok, Geoff Kuenning, Scott Smolka
Hi Ryusuke,
Thank you for your prompt reply!
I investigated this issue further based on your feedback. Yes, we
manually started nilfs_cleanerd in addition to the one triggered by
the mount.nilfs2, so two nilfs_cleanerd processes are running as
indicated in the "ps" output.
I used the latest version of nilfs-utils-2.2.11 and conducted some new
experiments. However, when I relied solely on the "nilfs_cleanerd"
started by "mount.nilfs2", the NILFS2 file system remained at a high
usage percentage (88%) even after all files and directories were
deleted. I also tried mounting with the "nogc" option and manually
starting the cleaner using "nilfs_cleanerd -p 1 ${DEVICE}
${MOUNT_POINT}", which consistently reduced space usage to 50%. I
believe this is because the manually-started nilfs_cleanerd sets the
interval (-p) to 1. I would like to know if the space usage result
after cleaning is reasonable, considering it was initially 25% when
the file system was first mounted. Additionally, it seems like
running two instances of nilfs_cleanerd for a single device can
potentially cause issues that prevent the cleaner from freeing up
space.
I have updated the script accordingly. Please feel free to contact me
if you need anything from my side. Thanks again.
Best regards,
Yifei Liu
File systems and Storage Lab (Stony Brook University)
On Thu, Aug 1, 2024 at 2:40 PM Ryusuke Konishi
<konishi.ryusuke@gmail.com> wrote:
>
> On Fri, Aug 2, 2024 at 12:44 AM Yifei Liu wrote:
> >
> > Dear NILFS2 Maintainers,
> >
> > I hope this message finds you well. I am writing to report a potential
> > bug we have encountered in NILFS2 related to disk space management
> > while testing it with our model checking tool, Metis. The issue arises
> > after performing the following operations:
> >
> > Steps to Reproduce:
> > 1. Mount the NILFS2 file system.
> > 2. Continuously create files in the NILFS2 file system until the disk
> > space is completely used up (ENOSPC).
> > 3. Delete all the files created in the previous step.
> > 4. Sleep for 1 minute to allow the cleanerd to run.
> > 5. Repeat steps 2-4 a few times.
> >
> > Note: The protection_period parameter in nilfs_cleanerd.conf has been
> > changed from the default 3600 seconds to 10 seconds for quicker
> > observation of the bug.
> >
> > Expected Behavior: After deleting all files, the disk usage should
> > decrease to zero or near zero, reflecting the freed space.
> >
> > Observed Behavior: Occasionally, after deleting the files, the file
> > system remains stuck at a high usage (88% or 100% in our experiments)
> > and does not free any space. When we try to create another file, it
> > fails and reports "no space left on the device". We also tried
> > manually running the cleanerd once the system’s space usage was stuck
> > at high percentages; even though some of the segments appear to be not
> > protected and have 0% live blocks, according to the lssu output, the
> > space was still not cleaned. This issue occurs sporadically and is not
> > consistent across all tests (thus, we suspect it may be a race
> > condition).
> >
> > We have created a GitHub repository containing a detailed README, the
> > script used to generate this problem, an example log generated in one
> > of our experiments, and the necessary files. Running this script and
> > obtaining all the outputs takes approximately 10 minutes. The script
> > sets up a ramdisk and mounts NILFS2 with the minimum possible size of
> > 1028 KiB. Here is the link to the GitHub repository:
> > https://github.com/sbu-fsl/nilfs2-full-space.git.
> >
> > I would appreciate any insights or assistance you could provide
> > regarding this issue. If you require any further information, logs, or
> > specific test cases, please let me know, and I will be happy to
> > provide them.
> >
> > Best regards,
> >
> > Yifei Liu
> > File systems and Storage Lab (Stony Brook University)
>
> Hi Yifei,
>
> I checked what your script was doing, and one thing I noticed was that
> nilfs_cleanerd seemed to be started twice.
>
> nilfs_cleanerd is designed to be automatically started via the
> mount.nilfs2 helper program when you mount a device with the mount
> command, and to be shut down via the umount.nilfs2 helper program
> before actually issuing the unmount system call when you try to
> unmount a device with the umount command.
>
> Basically, this program is designed to be a resident program that runs
> in the background while the device is mounted.
>
> In your script, you run nilfs_cleanerd manually after mounting and
> writing, so at this point, it seems that there are two nilfs_cleanerd
> processes, and both of them are requesting GC on the same device.
>
> If that happens, it will prevent fatal situations that would cause FS
> destruction, but normal operation is not guaranteed regarding GC. So,
> could you please check the existing processes with the ps command?
> If you start it via the mount command, it should not be started twice
> for the same device.
>
> If you want to run GC manually, use the "nilfs-clean" command to
> activate nilfs_cleanerd as follows:
>
> # nilfs-clean -p 0 $DEVICE
>
> If you really want to run nilfs_cleanerd manually, specify "nogc"
> mount option when mounting:
>
> # mount -o nogc $DEVICE $MOUNT_POINT
>
> In this case, you need to manually kill nilfs_cleanerd when unmounting.
>
> Depending on your environment, you may need to specify the file system manually:
>
> # mount -t nilfs2 -o nogc $DEVICE $MOUNT_POINT
>
> Also, the version of nilfs-utils used is old, so in order to isolate
> known bugs, it would be helpful if you could use the latest version of
> nilfs-utils-2.2.11 (or nilfs-utils 2.3.0-dev) for testing.
>
> You can download the latest version tarball from the site [1] or from
> github as described in [2].
>
> [1] https://nilfs.sourceforge.io/en/download.html
> [2] https://nilfs.sourceforge.io/en/git_repos.html
>
>
> Thank you.
>
> Ryusuke Konishi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential Bug in NILFS2: Disk Space Not Freed After File Deletion
2024-08-05 4:30 ` Yifei Liu
@ 2024-08-06 2:42 ` Ryusuke Konishi
2024-08-14 1:09 ` Yifei Liu
0 siblings, 1 reply; 5+ messages in thread
From: Ryusuke Konishi @ 2024-08-06 2:42 UTC (permalink / raw)
To: Yifei Liu; +Cc: linux-nilfs, Erez Zadok, Geoff Kuenning, Scott Smolka
On Mon, Aug 5, 2024 at 1:31 PM Yifei Liu wrote:
>
> Hi Ryusuke,
>
> Thank you for your prompt reply!
>
> I investigated this issue further based on your feedback. Yes, we
> manually started nilfs_cleanerd in addition to the one triggered by
> the mount.nilfs2, so two nilfs_cleanerd processes are running as
> indicated in the "ps" output.
>
> I used the latest version of nilfs-utils-2.2.11 and conducted some new
> experiments. However, when I relied solely on the "nilfs_cleanerd"
> started by "mount.nilfs2", the NILFS2 file system remained at a high
> usage percentage (88%) even after all files and directories were
> deleted.
Well, what you're saying is that the file system got stuck. And you
cannot recover it even by running the "nilfs-clean" command or
manually starting nilfs_cleanerd. Right?
It certainly seems that reserved segments can be used up and the file
system can get stuck, especially in environments with small segment
sizes and a small number of segments.
Since NILFS is a log-structured filesystem, it requires writing logs
to change the state of the filesystem, including GC, so we may need to
improve disk full management.
I was able to resolve the stuck state by expanding the partition and
then using nilfs-resize to expand the filesystem.
It may not be possible to fundamentally solve the problem (other than
mitigating it), but I've recognized the problem.
Thank you for your feedback.
> I also tried mounting with the "nogc" option and manually
> starting the cleaner using "nilfs_cleanerd -p 1 ${DEVICE}
> ${MOUNT_POINT}", which consistently reduced space usage to 50%. I
> believe this is because the manually-started nilfs_cleanerd sets the
> interval (-p) to 1. I would like to know if the space usage result
> after cleaning is reasonable, considering it was initially 25% when
> the file system was first mounted.
At present, I think the most accurate way to check whether disk usage
is reasonable is to use "lssu -l", which uses the same function as GC
to determine whether blocks are alive or dead. Note that this command
allows you to specify a protection period as an option.
If you want to dig deeper into what is going on, one way is to output
the block configuration of the segment you want to see with the
"dumpseg" command (just for your reference).
> Additionally, it seems like
> running two instances of nilfs_cleanerd for a single device can
> potentially cause issues that prevent the cleaner from freeing up
> space.
Multiple invocation of nilfs_cleanerd on the same device are not
supported, so frankly I would like to exclude them. However, to avoid
accidental problems, I would like to deal with that case as well if
possible.
Thanks,
Ryusuke Konishi
>
> I have updated the script accordingly. Please feel free to contact me
> if you need anything from my side. Thanks again.
>
> Best regards,
>
> Yifei Liu
> File systems and Storage Lab (Stony Brook University)
>
>
> On Thu, Aug 1, 2024 at 2:40 PM Ryusuke Konishi
> <konishi.ryusuke@gmail.com> wrote:
> >
> > On Fri, Aug 2, 2024 at 12:44 AM Yifei Liu wrote:
> > >
> > > Dear NILFS2 Maintainers,
> > >
> > > I hope this message finds you well. I am writing to report a potential
> > > bug we have encountered in NILFS2 related to disk space management
> > > while testing it with our model checking tool, Metis. The issue arises
> > > after performing the following operations:
> > >
> > > Steps to Reproduce:
> > > 1. Mount the NILFS2 file system.
> > > 2. Continuously create files in the NILFS2 file system until the disk
> > > space is completely used up (ENOSPC).
> > > 3. Delete all the files created in the previous step.
> > > 4. Sleep for 1 minute to allow the cleanerd to run.
> > > 5. Repeat steps 2-4 a few times.
> > >
> > > Note: The protection_period parameter in nilfs_cleanerd.conf has been
> > > changed from the default 3600 seconds to 10 seconds for quicker
> > > observation of the bug.
> > >
> > > Expected Behavior: After deleting all files, the disk usage should
> > > decrease to zero or near zero, reflecting the freed space.
> > >
> > > Observed Behavior: Occasionally, after deleting the files, the file
> > > system remains stuck at a high usage (88% or 100% in our experiments)
> > > and does not free any space. When we try to create another file, it
> > > fails and reports "no space left on the device". We also tried
> > > manually running the cleanerd once the system’s space usage was stuck
> > > at high percentages; even though some of the segments appear to be not
> > > protected and have 0% live blocks, according to the lssu output, the
> > > space was still not cleaned. This issue occurs sporadically and is not
> > > consistent across all tests (thus, we suspect it may be a race
> > > condition).
> > >
> > > We have created a GitHub repository containing a detailed README, the
> > > script used to generate this problem, an example log generated in one
> > > of our experiments, and the necessary files. Running this script and
> > > obtaining all the outputs takes approximately 10 minutes. The script
> > > sets up a ramdisk and mounts NILFS2 with the minimum possible size of
> > > 1028 KiB. Here is the link to the GitHub repository:
> > > https://github.com/sbu-fsl/nilfs2-full-space.git.
> > >
> > > I would appreciate any insights or assistance you could provide
> > > regarding this issue. If you require any further information, logs, or
> > > specific test cases, please let me know, and I will be happy to
> > > provide them.
> > >
> > > Best regards,
> > >
> > > Yifei Liu
> > > File systems and Storage Lab (Stony Brook University)
> >
> > Hi Yifei,
> >
> > I checked what your script was doing, and one thing I noticed was that
> > nilfs_cleanerd seemed to be started twice.
> >
> > nilfs_cleanerd is designed to be automatically started via the
> > mount.nilfs2 helper program when you mount a device with the mount
> > command, and to be shut down via the umount.nilfs2 helper program
> > before actually issuing the unmount system call when you try to
> > unmount a device with the umount command.
> >
> > Basically, this program is designed to be a resident program that runs
> > in the background while the device is mounted.
> >
> > In your script, you run nilfs_cleanerd manually after mounting and
> > writing, so at this point, it seems that there are two nilfs_cleanerd
> > processes, and both of them are requesting GC on the same device.
> >
> > If that happens, it will prevent fatal situations that would cause FS
> > destruction, but normal operation is not guaranteed regarding GC. So,
> > could you please check the existing processes with the ps command?
> > If you start it via the mount command, it should not be started twice
> > for the same device.
> >
> > If you want to run GC manually, use the "nilfs-clean" command to
> > activate nilfs_cleanerd as follows:
> >
> > # nilfs-clean -p 0 $DEVICE
> >
> > If you really want to run nilfs_cleanerd manually, specify "nogc"
> > mount option when mounting:
> >
> > # mount -o nogc $DEVICE $MOUNT_POINT
> >
> > In this case, you need to manually kill nilfs_cleanerd when unmounting.
> >
> > Depending on your environment, you may need to specify the file system manually:
> >
> > # mount -t nilfs2 -o nogc $DEVICE $MOUNT_POINT
> >
> > Also, the version of nilfs-utils used is old, so in order to isolate
> > known bugs, it would be helpful if you could use the latest version of
> > nilfs-utils-2.2.11 (or nilfs-utils 2.3.0-dev) for testing.
> >
> > You can download the latest version tarball from the site [1] or from
> > github as described in [2].
> >
> > [1] https://nilfs.sourceforge.io/en/download.html
> > [2] https://nilfs.sourceforge.io/en/git_repos.html
> >
> >
> > Thank you.
> >
> > Ryusuke Konishi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Potential Bug in NILFS2: Disk Space Not Freed After File Deletion
2024-08-06 2:42 ` Ryusuke Konishi
@ 2024-08-14 1:09 ` Yifei Liu
0 siblings, 0 replies; 5+ messages in thread
From: Yifei Liu @ 2024-08-14 1:09 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: linux-nilfs, Erez Zadok, Geoff Kuenning, Scott Smolka
> Well, what you're saying is that the file system got stuck. And you
> cannot recover it even by running the "nilfs-clean" command or
> manually starting nilfs_cleanerd. Right?
Yes, that's right.
Thank you so much for the explanation and feedback!
Best regards,
Yifei
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-14 1:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01 15:43 Potential Bug in NILFS2: Disk Space Not Freed After File Deletion Yifei Liu
2024-08-01 18:40 ` Ryusuke Konishi
2024-08-05 4:30 ` Yifei Liu
2024-08-06 2:42 ` Ryusuke Konishi
2024-08-14 1:09 ` Yifei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).