linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* virtio-blk/ext4 error handling for host-side ENOSPC
@ 2024-06-17  3:34 Keiichi Watanabe
  2024-06-18  8:33 ` Keiichi Watanabe
  0 siblings, 1 reply; 5+ messages in thread
From: Keiichi Watanabe @ 2024-06-17  3:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Junichi Uekawa, Takaya Saeki, tytso, Daniel Verkamp

Hi,

I'm using ext4 over virtio-blk for VMs, and I'd like to discuss the
situation where the host storage gets full.
Let's say you create a disk image file formatted with ext4 on the host
side as a sparse file and share it with the guest using virtio-blk.
When the host storage is full and the sparse file cannot be expanded
any further, the guest will know the error when it flushes disk
caches.
In the current implementation, the VMM's virtio-blk device returns
VIRTIO_BLK_S_IOERR, and the virtio-blk driver converts it to
BLK_STS_IOERR. Then, the ext4 module calls mapping_set_error for that
area.

However, the host's ENOSPC may be recoverable. For example, if a host
service periodically deletes cache files, it'd be nice if the guest
kernel can wait a while and then retry flushing.
So, I wonder if we can't have a special handling for host-side's
ENOSPC in virtio-blk and ext4.

My idea is like this:
First, (1) define a new error code, VIRTIO_BLK_S_ENOSPC, in
virtio-blk. Then, (2) if the guest file system receives this error
code, periodically retry flushing. We may want to make the retry limit
via a mount option or something.

What do you think of this idea? Also, has anything similar been attempted yet?
Thanks in advance.

Best,
Keiichi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-blk/ext4 error handling for host-side ENOSPC
  2024-06-17  3:34 virtio-blk/ext4 error handling for host-side ENOSPC Keiichi Watanabe
@ 2024-06-18  8:33 ` Keiichi Watanabe
  2024-06-19 13:57   ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Keiichi Watanabe @ 2024-06-18  8:33 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Junichi Uekawa, Takaya Saeki, tytso, Daniel Verkamp

The corresponding proposal to virtio-spec is here:
https://lore.kernel.org/virtio-comment/20240618081858.2795400-1-keiichiw@chromium.org/T/#t

Best,
Keiichi

On Mon, Jun 17, 2024 at 12:34 PM Keiichi Watanabe <keiichiw@chromium.org> wrote:
>
> Hi,
>
> I'm using ext4 over virtio-blk for VMs, and I'd like to discuss the
> situation where the host storage gets full.
> Let's say you create a disk image file formatted with ext4 on the host
> side as a sparse file and share it with the guest using virtio-blk.
> When the host storage is full and the sparse file cannot be expanded
> any further, the guest will know the error when it flushes disk
> caches.
> In the current implementation, the VMM's virtio-blk device returns
> VIRTIO_BLK_S_IOERR, and the virtio-blk driver converts it to
> BLK_STS_IOERR. Then, the ext4 module calls mapping_set_error for that
> area.
>
> However, the host's ENOSPC may be recoverable. For example, if a host
> service periodically deletes cache files, it'd be nice if the guest
> kernel can wait a while and then retry flushing.
> So, I wonder if we can't have a special handling for host-side's
> ENOSPC in virtio-blk and ext4.
>
> My idea is like this:
> First, (1) define a new error code, VIRTIO_BLK_S_ENOSPC, in
> virtio-blk. Then, (2) if the guest file system receives this error
> code, periodically retry flushing. We may want to make the retry limit
> via a mount option or something.
>
> What do you think of this idea? Also, has anything similar been attempted yet?
> Thanks in advance.
>
> Best,
> Keiichi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-blk/ext4 error handling for host-side ENOSPC
  2024-06-18  8:33 ` Keiichi Watanabe
@ 2024-06-19 13:57   ` Stefan Hajnoczi
  2024-06-28  3:29     ` Keiichi Watanabe
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2024-06-19 13:57 UTC (permalink / raw)
  To: Keiichi Watanabe; +Cc: dverkamp, linux-fsdevel, takayas, tytso, uekawa

[-- Attachment #1: Type: text/plain, Size: 1459 bytes --]

> What do you think of this idea? Also, has anything similar been attempted yet?

Hi Keiichi,
Yes, there is an existing approach that is related but not identical to
what you are exploring:

QEMU has an option to pause the guest and raise a notification to the
management tool that ENOSPC has been reached. The guest is unable to
resolve ENOSPC itself and guest applications are likely to fail the disk
becomes unavailable, hence the guest is simply paused.

In systems that expect to hit this condition, this pause behavior can be
combined with an early notification when a free space watermark is hit.
This way guest are almost never paused because free space can be added
before ENOSPC is reached. QEMU has a write watermark feature that works
well on top of qcow2 images (they grow incrementally so it's trivial to
monitor how much space is being consumed).

I wanted to share this existing approach in case you think it would work
nicely for your use case.

The other thought I had was: how does the new ENOSPC error fit into the
block device model? Hopefully this behavior is not virtio-blk-specific
behavior but rather something general that other storage protocols like
NVMe and SCSI support too. That way file systems can handle this in a
generic fashion.

The place I would check is Logical Block Provisioning in SCSI and NVMe.
Perhaps there are features in these protocols for reporting low
resources? (Sorry, I didn't have time to check.)

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-blk/ext4 error handling for host-side ENOSPC
  2024-06-19 13:57   ` Stefan Hajnoczi
@ 2024-06-28  3:29     ` Keiichi Watanabe
  2024-07-11  6:02       ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Keiichi Watanabe @ 2024-06-28  3:29 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: dverkamp, linux-fsdevel, takayas, tytso, uekawa

Hi Stefan,

Thanks for sharing QEMU's approach!
We also have a similar early notification mechanism to avoid low-disk
conditions.
However, the approach I would like to propose is to prevent pausing
the guest by allowing the guest retry requests after a while.

On Wed, Jun 19, 2024 at 10:57 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> > What do you think of this idea? Also, has anything similar been attempted yet?
>
> Hi Keiichi,
> Yes, there is an existing approach that is related but not identical to
> what you are exploring:
>
> QEMU has an option to pause the guest and raise a notification to the
> management tool that ENOSPC has been reached. The guest is unable to
> resolve ENOSPC itself and guest applications are likely to fail the disk
> becomes unavailable, hence the guest is simply paused.
>
> In systems that expect to hit this condition, this pause behavior can be
> combined with an early notification when a free space watermark is hit.
> This way guest are almost never paused because free space can be added
> before ENOSPC is reached. QEMU has a write watermark feature that works
> well on top of qcow2 images (they grow incrementally so it's trivial to
> monitor how much space is being consumed).
>
> I wanted to share this existing approach in case you think it would work
> nicely for your use case.
>
> The other thought I had was: how does the new ENOSPC error fit into the
> block device model? Hopefully this behavior is not virtio-blk-specific
> behavior but rather something general that other storage protocols like
> NVMe and SCSI support too. That way file systems can handle this in a
> generic fashion.
>
> The place I would check is Logical Block Provisioning in SCSI and NVMe.
> Perhaps there are features in these protocols for reporting low
> resources? (Sorry, I didn't have time to check.)

For scsi, THIN_PROVISIONING_SOFT_THRESHOLD_REACHED looks like the one.
For NVMe, NVME_SC_CAPACITY_EXCEEDED looks like this.

I guess we can add a new error state in ext4 layer. Le'ts say it's
"HOST_NOSPACE" in ext4. This should be used when virtio-blk returns
ENOSPACE or virtio-scsi returns
THIN_PROVISIONING_SOFT_THRESHOLD_REACHED. I'm not sure if there is a
case where NVME_SC_CAPACITY_EXCEEDED is translated to this state
because we don't have virito-nvme.
If ext4 is in the state of HOST_NOSPACE, ext4 will periodically try to
write to the disk (= virtio-blk or virtio-scsi) several times. If this
fails a certain number of times, the guest will report a disk error.
What do you think?

Best,
Keiichi


>
> Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: virtio-blk/ext4 error handling for host-side ENOSPC
  2024-06-28  3:29     ` Keiichi Watanabe
@ 2024-07-11  6:02       ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2024-07-11  6:02 UTC (permalink / raw)
  To: Keiichi Watanabe; +Cc: dverkamp, linux-fsdevel, takayas, tytso, uekawa

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

On Fri, Jun 28, 2024 at 12:29:05PM +0900, Keiichi Watanabe wrote:
> Hi Stefan,
> 
> Thanks for sharing QEMU's approach!
> We also have a similar early notification mechanism to avoid low-disk
> conditions.
> However, the approach I would like to propose is to prevent pausing
> the guest by allowing the guest retry requests after a while.
> 
> On Wed, Jun 19, 2024 at 10:57 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > > What do you think of this idea? Also, has anything similar been attempted yet?
> >
> > Hi Keiichi,
> > Yes, there is an existing approach that is related but not identical to
> > what you are exploring:
> >
> > QEMU has an option to pause the guest and raise a notification to the
> > management tool that ENOSPC has been reached. The guest is unable to
> > resolve ENOSPC itself and guest applications are likely to fail the disk
> > becomes unavailable, hence the guest is simply paused.
> >
> > In systems that expect to hit this condition, this pause behavior can be
> > combined with an early notification when a free space watermark is hit.
> > This way guest are almost never paused because free space can be added
> > before ENOSPC is reached. QEMU has a write watermark feature that works
> > well on top of qcow2 images (they grow incrementally so it's trivial to
> > monitor how much space is being consumed).
> >
> > I wanted to share this existing approach in case you think it would work
> > nicely for your use case.
> >
> > The other thought I had was: how does the new ENOSPC error fit into the
> > block device model? Hopefully this behavior is not virtio-blk-specific
> > behavior but rather something general that other storage protocols like
> > NVMe and SCSI support too. That way file systems can handle this in a
> > generic fashion.
> >
> > The place I would check is Logical Block Provisioning in SCSI and NVMe.
> > Perhaps there are features in these protocols for reporting low
> > resources? (Sorry, I didn't have time to check.)
> 
> For scsi, THIN_PROVISIONING_SOFT_THRESHOLD_REACHED looks like the one.
> For NVMe, NVME_SC_CAPACITY_EXCEEDED looks like this.
> 
> I guess we can add a new error state in ext4 layer. Le'ts say it's
> "HOST_NOSPACE" in ext4. This should be used when virtio-blk returns
> ENOSPACE or virtio-scsi returns
> THIN_PROVISIONING_SOFT_THRESHOLD_REACHED. I'm not sure if there is a
> case where NVME_SC_CAPACITY_EXCEEDED is translated to this state
> because we don't have virito-nvme.
> If ext4 is in the state of HOST_NOSPACE, ext4 will periodically try to
> write to the disk (= virtio-blk or virtio-scsi) several times. If this
> fails a certain number of times, the guest will report a disk error.
> What do you think?

I'm sure virtio-blk can be extended if you can work with the file system
maintainers to introduce the concept of logical block exhaustion. There
might be complications for fsync and memory pressure if pages cannot be
written back to exhausted devices, but I'm not an expert.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-11  6:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17  3:34 virtio-blk/ext4 error handling for host-side ENOSPC Keiichi Watanabe
2024-06-18  8:33 ` Keiichi Watanabe
2024-06-19 13:57   ` Stefan Hajnoczi
2024-06-28  3:29     ` Keiichi Watanabe
2024-07-11  6:02       ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).