[LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems

All of lore.kernel.org
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
@ 2024-01-15  8:46 ` Viacheslav Dubeyko
  2024-01-15 17:54   ` Javier González
  0 siblings, 1 reply; 10+ messages in thread
From: Viacheslav Dubeyko @ 2024-01-15  8:46 UTC (permalink / raw)
  To: lsf-pc, linux-fsdevel, javier.gonz
  Cc: a.manzanares, linux-scsi, linux-nvme, linux-block, slava,
	Viacheslav Dubeyko

Hi Javier,

Samsung introduced Flexible Data Placement (FDP) technology
pretty recently. As far as I know, currently, this technology
is available for user-space solutions only. I assume it will be
good to have discussion how kernel-space file systems could
work with SSDs that support FDP technology by employing
FDP benefits.

How soon FDP API will be available for kernel-space file systems?
How kernel-space file systems can adopt FDP technology?
How FDP technology can improve efficiency and reliability of
kernel-space file system?
Which new challenges FDP technology introduces for kernel-space
file systems?

Could we have such discussion leading from Samsung side?

Thanks,
Slava

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-15  8:46 ` [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems Viacheslav Dubeyko
@ 2024-01-15 17:54   ` Javier González
  2024-01-16  8:39     ` Viacheslav Dubeyko
  2024-01-16 17:46     ` Bart Van Assche
  0 siblings, 2 replies; 10+ messages in thread
From: Javier González @ 2024-01-15 17:54 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: lsf-pc, linux-fsdevel, a.manzanares, linux-scsi, linux-nvme,
	linux-block, slava, Kanchan Joshi, Bart Van Assche

On 15.01.2024 11:46, Viacheslav Dubeyko wrote:
>Hi Javier,
>
>Samsung introduced Flexible Data Placement (FDP) technology
>pretty recently. As far as I know, currently, this technology
>is available for user-space solutions only. I assume it will be
>good to have discussion how kernel-space file systems could
>work with SSDs that support FDP technology by employing
>FDP benefits.

Slava,

Thanks for bringing this up.

First, this is not a Samsung technology. Several vendors are building
FDP and several customers are already deploying first product.

We enabled FDP thtough I/O Passthru to avoid unnecesary noise in the
block layer until we had a clear idea on use-cases. We have been
following and reviewing Bart's write hint series and it covers all the
block layer and interface needed to support FDP. Currently, we have
patches with small changes to wire the NVMe driver. We plan to submit
them after Bart's patches are applied. Now it is a good time since we
have LSF and there are also 2 customers using FDP on block and file.

>
>How soon FDP API will be available for kernel-space file systems?

The work is done. We will submit as Bart's patches are applied.

Kanchan is doing this work.

>How kernel-space file systems can adopt FDP technology?

It is based on write hints. There is no FS-specific placement decisions.
All the responsibility is in the application.

Kanchan: Can you comment a bit more on this?

>How FDP technology can improve efficiency and reliability of
>kernel-space file system?

This is an open problem. Our experience is that making data placement
decisions on the FS is tricky (beyond the obvious data / medatadata). If
someone has a good use-case for this, I think it is worth exploring.
F2FS is a good candidate, but I am not sure FDP is of interest for
mobile - here ZUFS seems to be the current dominant technology.

>Which new challenges FDP technology introduces for kernel-space
>file systems?

See above. All we have done is wire up the NVMe driver. This is a good
discussion for LSF/

>Could we have such discussion leading from Samsung side?

Of course. We are happy to host a session on this if it gets selected.
We will add it to one of our submission.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-15 17:54   ` Javier González
@ 2024-01-16  8:39     ` Viacheslav Dubeyko
  2024-01-17 11:58       ` Javier González
  2024-01-16 17:46     ` Bart Van Assche
  1 sibling, 1 reply; 10+ messages in thread
From: Viacheslav Dubeyko @ 2024-01-16  8:39 UTC (permalink / raw)
  To: Javier González
  Cc: lsf-pc, Linux FS Devel, Adam Manzanares, linux-scsi, linux-nvme,
	linux-block, slava, Kanchan Joshi, Bart Van Assche

> On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz@samsung.com> wrote:
> 
> On 15.01.2024 11:46, Viacheslav Dubeyko wrote:
>> Hi Javier,
>> 
>> Samsung introduced Flexible Data Placement (FDP) technology
>> pretty recently. As far as I know, currently, this technology
>> is available for user-space solutions only. I assume it will be
>> good to have discussion how kernel-space file systems could
>> work with SSDs that support FDP technology by employing
>> FDP benefits.
> 
> Slava,
> 
> Thanks for bringing this up.
> 
> First, this is not a Samsung technology. Several vendors are building
> FDP and several customers are already deploying first product.
> 
> We enabled FDP thtough I/O Passthru to avoid unnecesary noise in the
> block layer until we had a clear idea on use-cases. We have been
> following and reviewing Bart's write hint series and it covers all the
> block layer and interface needed to support FDP. Currently, we have
> patches with small changes to wire the NVMe driver. We plan to submit
> them after Bart's patches are applied. Now it is a good time since we
> have LSF and there are also 2 customers using FDP on block and file.
> 
>> 
>> How soon FDP API will be available for kernel-space file systems?
> 
> The work is done. We will submit as Bart's patches are applied.
> 
> Kanchan is doing this work.
> 
>> How kernel-space file systems can adopt FDP technology?
> 
> It is based on write hints. There is no FS-specific placement decisions.
> All the responsibility is in the application.
> 
> Kanchan: Can you comment a bit more on this?
> 
>> How FDP technology can improve efficiency and reliability of
>> kernel-space file system?
> 
> This is an open problem. Our experience is that making data placement
> decisions on the FS is tricky (beyond the obvious data / medatadata). If
> someone has a good use-case for this, I think it is worth exploring.
> F2FS is a good candidate, but I am not sure FDP is of interest for
> mobile - here ZUFS seems to be the current dominant technology.
> 

If I understand the FDP technology correctly, I can see the benefits for
file systems. :)

For example, SSDFS is based on segment concept and it has multiple
types of segments (superblock, mapping table, segment bitmap, b-tree
nodes, user data). So, at first, I can use hints to place different segment
types into different reclaim units. The first point is clear, I can place different
type of data/metadata (with different “hotness”) into different reclaim units.
Second point could be not so clear. SSDFS provides the way to define
the size of erase block. If it’s ZNS SSD, then mkfs tool uses the size of zone
that storage device exposes to mkfs tool. However, for the case of conventional
SSD, the size of erase block is defined by user. Technically speaking, this size
could be smaller or bigger that the real erase block inside of SSD. Also, FTL could
use a tricky mapping scheme that could combine LBAs in the way making
FS activity inefficient even by using erase block or segment concept. I can see
how FDP can help here. First of all, reclaim unit makes guarantee that erase
blocks or segments on file system side will match to erase blocks (reclaim units)
on SSD side. Also, I can use various sizes of logical erase blocks but the logical
erase blocks of the same segment type will be placed into the same reclaim unit.
It could guarantee the decreasing the write amplification and predictable reclaiming on
SSD side. The flexibility to use various logical erase block sizes provides
the better efficiency of file system because various workloads could require
different logical erase block sizes.

Technically speaking, any file system can place different types of metadata in
different reclaim units. However, user data is slightly more tricky case. Potentially,
file system logic can track “hotness” or frequency of updates of some user data
and try to direct the different types of user data in different reclaim units.
But, from another point of view, we have folders in file system namespace.
If application can place different types of data in different folders, then, technically
speaking, file system logic can place the content of different folders into different
reclaim units. But application needs to follow some “discipline” to store different
types of user data (different “hotness”, for example) in different folders.

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-15 17:54   ` Javier González
  2024-01-16  8:39     ` Viacheslav Dubeyko
@ 2024-01-16 17:46     ` Bart Van Assche
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Van Assche @ 2024-01-16 17:46 UTC (permalink / raw)
  To: Javier González, Viacheslav Dubeyko
  Cc: lsf-pc, linux-fsdevel, a.manzanares, linux-scsi, linux-nvme,
	linux-block, slava, Kanchan Joshi

On 1/15/24 09:54, Javier González wrote:
> On 15.01.2024 11:46, Viacheslav Dubeyko wrote:
>> How soon FDP API will be available for kernel-space file systems?
> 
> The work is done. We will submit as Bart's patches are applied.

Since the FDP users need the data lifetime patch series, how about helping
with a review of this patch series?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-16  8:39     ` Viacheslav Dubeyko
@ 2024-01-17 11:58       ` Javier González
  2024-01-17 21:51         ` Dave Chinner
  2024-01-19  9:58         ` Kanchan Joshi
  0 siblings, 2 replies; 10+ messages in thread
From: Javier González @ 2024-01-17 11:58 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: lsf-pc, Linux FS Devel, Adam Manzanares, linux-scsi, linux-nvme,
	linux-block, slava, Kanchan Joshi, Bart Van Assche

On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
>
>
>> On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz@samsung.com> wrote:
>>
>> On 15.01.2024 11:46, Viacheslav Dubeyko wrote:
>>> Hi Javier,
>>>
>>> Samsung introduced Flexible Data Placement (FDP) technology
>>> pretty recently. As far as I know, currently, this technology
>>> is available for user-space solutions only. I assume it will be
>>> good to have discussion how kernel-space file systems could
>>> work with SSDs that support FDP technology by employing
>>> FDP benefits.
>>
>> Slava,
>>
>> Thanks for bringing this up.
>>
>> First, this is not a Samsung technology. Several vendors are building
>> FDP and several customers are already deploying first product.
>>
>> We enabled FDP thtough I/O Passthru to avoid unnecesary noise in the
>> block layer until we had a clear idea on use-cases. We have been
>> following and reviewing Bart's write hint series and it covers all the
>> block layer and interface needed to support FDP. Currently, we have
>> patches with small changes to wire the NVMe driver. We plan to submit
>> them after Bart's patches are applied. Now it is a good time since we
>> have LSF and there are also 2 customers using FDP on block and file.
>>
>>>
>>> How soon FDP API will be available for kernel-space file systems?
>>
>> The work is done. We will submit as Bart's patches are applied.
>>
>> Kanchan is doing this work.
>>
>>> How kernel-space file systems can adopt FDP technology?
>>
>> It is based on write hints. There is no FS-specific placement decisions.
>> All the responsibility is in the application.
>>
>> Kanchan: Can you comment a bit more on this?
>>
>>> How FDP technology can improve efficiency and reliability of
>>> kernel-space file system?
>>
>> This is an open problem. Our experience is that making data placement
>> decisions on the FS is tricky (beyond the obvious data / medatadata). If
>> someone has a good use-case for this, I think it is worth exploring.
>> F2FS is a good candidate, but I am not sure FDP is of interest for
>> mobile - here ZUFS seems to be the current dominant technology.
>>
>
>If I understand the FDP technology correctly, I can see the benefits for
>file systems. :)
>
>For example, SSDFS is based on segment concept and it has multiple
>types of segments (superblock, mapping table, segment bitmap, b-tree
>nodes, user data). So, at first, I can use hints to place different segment
>types into different reclaim units.

Yes. This is what I meant with data / metadata. We have looked also into
using 1 RUH for metadata and rest make available to applications. We
decided to go with a simple solution to start with and complete as we
see users.

For SSDFS it makes sense.

>The first point is clear, I can place different
>type of data/metadata (with different “hotness”) into different reclaim units.
>Second point could be not so clear. SSDFS provides the way to define
>the size of erase block. If it’s ZNS SSD, then mkfs tool uses the size of zone
>that storage device exposes to mkfs tool. However, for the case of conventional
>SSD, the size of erase block is defined by user. Technically speaking, this size
>could be smaller or bigger that the real erase block inside of SSD. Also, FTL could
>use a tricky mapping scheme that could combine LBAs in the way making
>FS activity inefficient even by using erase block or segment concept. I can see
>how FDP can help here. First of all, reclaim unit makes guarantee that erase
>blocks or segments on file system side will match to erase blocks (reclaim units)
>on SSD side. Also, I can use various sizes of logical erase blocks but the logical
>erase blocks of the same segment type will be placed into the same reclaim unit.
>It could guarantee the decreasing the write amplification and predictable reclaiming on
>SSD side. The flexibility to use various logical erase block sizes provides
>the better efficiency of file system because various workloads could require
>different logical erase block sizes.

Sounds good. I see you sent a proposal on SSDFS specificaly. It makes
sense to cover this specific uses there.
>
>Technically speaking, any file system can place different types of metadata in
>different reclaim units. However, user data is slightly more tricky case. Potentially,
>file system logic can track “hotness” or frequency of updates of some user data
>and try to direct the different types of user data in different reclaim units.
>But, from another point of view, we have folders in file system namespace.
>If application can place different types of data in different folders, then, technically
>speaking, file system logic can place the content of different folders into different
>reclaim units. But application needs to follow some “discipline” to store different
>types of user data (different “hotness”, for example) in different folders.

Exactly. This is why I think it makes sense to look at specific FSs as
there are real deployments that we can use to argue for changes that
cover a large percentage of use-cases.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-17 11:58       ` Javier González
@ 2024-01-17 21:51         ` Dave Chinner
  2024-01-18  7:12           ` Javier González
                             ` (2 more replies)
  2024-01-19  9:58         ` Kanchan Joshi
  1 sibling, 3 replies; 10+ messages in thread
From: Dave Chinner @ 2024-01-17 21:51 UTC (permalink / raw)
  To: Javier González
  Cc: Viacheslav Dubeyko, lsf-pc, Linux FS Devel, Adam Manzanares,
	linux-scsi, linux-nvme, linux-block, slava, Kanchan Joshi,
	Bart Van Assche

On Wed, Jan 17, 2024 at 12:58:12PM +0100, Javier González wrote:
> On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
> > > On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz@samsung.com> wrote:
> > > > How FDP technology can improve efficiency and reliability of
> > > > kernel-space file system?
> > > 
> > > This is an open problem. Our experience is that making data placement
> > > decisions on the FS is tricky (beyond the obvious data / medatadata). If
> > > someone has a good use-case for this, I think it is worth exploring.
> > > F2FS is a good candidate, but I am not sure FDP is of interest for
> > > mobile - here ZUFS seems to be the current dominant technology.
> > > 
> > 
> > If I understand the FDP technology correctly, I can see the benefits for
> > file systems. :)
> > 
> > For example, SSDFS is based on segment concept and it has multiple
> > types of segments (superblock, mapping table, segment bitmap, b-tree
> > nodes, user data). So, at first, I can use hints to place different segment
> > types into different reclaim units.
> 
> Yes. This is what I meant with data / metadata. We have looked also into
> using 1 RUH for metadata and rest make available to applications. We
> decided to go with a simple solution to start with and complete as we
> see users.

XFS has an abstract type definition for metadata that is uses to
prioritise cache reclaim (i.e. classifies what metadata is more
important/hotter) and that could easily be extended to IO hints
to indicate placement.

We also have a separate journal IO path, and that is probably the
hotest LBA region of the filesystem (circular overwrite region)
which would stand to have it's own classification as well.

We've long talked about making use of write IO hints for separating
these things out, but requiring 10+ IO hint channels just for
filesystem metadata to be robustly classified has been a show
stopper. Doing nothing is almost always better than doing placement
hinting poorly.

> > Technically speaking, any file system can place different types of metadata in
> > different reclaim units. However, user data is slightly more tricky case. Potentially,
> > file system logic can track “hotness” or frequency of updates of some user data
> > and try to direct the different types of user data in different reclaim units.

*cough*

We already do this in the LBA space via the filesytsem allocators.
It's often configurable and generally called "allocation policies".

> > But, from another point of view, we have folders in file system namespace.
> > If application can place different types of data in different folders, then, technically
> > speaking, file system logic can place the content of different folders into different
> > reclaim units. But application needs to follow some “discipline” to store different
> > types of user data (different “hotness”, for example) in different folders.

Yup, XFS does this "physical locality is determined by parent
directory" separation by default (the inode64 allocation policy).
Every new directory inode is placed in a different allocation group
(LBA space) based on a rotor mechanism. All the files within that
directory are kept local to the directory (i.e. in the same AG/LBA
space) as much as possible.

Most filesystems have LBA locality policies like this because it is
highly efficient on physical seek latency limited storage hardware.
i.e. the storage hardware we've mostly been using since the early
1980s.

We could make allocation groups have different reclaim units,
but then we are talking about needing an arbitrary number of
different IO hints - XFS supports ~2^31 AGs if the filesystem is
large enough, and there's no way we're going to try to support that
many IO hints (software or hardware) in the foreseeable future.

IF devices want to try to classify related data themselves, then
using LBA locality internally to classify relationships below the
level of IO hints, then that would be a much closer match to how
filesystems have traditionally structured the data and metadata on
disk. Related data and metadata tends to get written to the same LBA
regions because that's the fastest way to access related and
metadata on seek-limited hardware.

Yeah, I know that these are SSDs we are talking about and they
aren't seek limited, but when we already have filesystem
implementations that try to clump related things to nearby LBA
spaces, it might be best to try to leverage this behaviour rather
than try to rely on kernel and userspace to correctly provide hints
about their data patterns.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-17 21:51         ` Dave Chinner
@ 2024-01-18  7:12           ` Javier González
  2024-01-19 10:49           ` Kanchan Joshi
  2024-01-19 20:49           ` Keith Busch
  2 siblings, 0 replies; 10+ messages in thread
From: Javier González @ 2024-01-18  7:12 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Viacheslav Dubeyko, lsf-pc, Linux FS Devel, Adam Manzanares,
	linux-scsi, linux-nvme, linux-block, slava, Kanchan Joshi,
	Bart Van Assche

On 18.01.2024 08:51, Dave Chinner wrote:
>On Wed, Jan 17, 2024 at 12:58:12PM +0100, Javier González wrote:
>> On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
>> > > On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz@samsung.com> wrote:
>> > > > How FDP technology can improve efficiency and reliability of
>> > > > kernel-space file system?
>> > >
>> > > This is an open problem. Our experience is that making data placement
>> > > decisions on the FS is tricky (beyond the obvious data / medatadata). If
>> > > someone has a good use-case for this, I think it is worth exploring.
>> > > F2FS is a good candidate, but I am not sure FDP is of interest for
>> > > mobile - here ZUFS seems to be the current dominant technology.
>> > >
>> >
>> > If I understand the FDP technology correctly, I can see the benefits for
>> > file systems. :)
>> >
>> > For example, SSDFS is based on segment concept and it has multiple
>> > types of segments (superblock, mapping table, segment bitmap, b-tree
>> > nodes, user data). So, at first, I can use hints to place different segment
>> > types into different reclaim units.
>>
>> Yes. This is what I meant with data / metadata. We have looked also into
>> using 1 RUH for metadata and rest make available to applications. We
>> decided to go with a simple solution to start with and complete as we
>> see users.
>
>XFS has an abstract type definition for metadata that is uses to
>prioritise cache reclaim (i.e. classifies what metadata is more
>important/hotter) and that could easily be extended to IO hints
>to indicate placement.
>
>We also have a separate journal IO path, and that is probably the
>hotest LBA region of the filesystem (circular overwrite region)
>which would stand to have it's own classification as well.
>
>We've long talked about making use of write IO hints for separating
>these things out, but requiring 10+ IO hint channels just for
>filesystem metadata to be robustly classified has been a show
>stopper. Doing nothing is almost always better than doing placement
>hinting poorly.

I fully agree with the last statement.

In my experience, if doing something, it is probably better to target 2
or 3 data streams that target what you would expect it to be the larger
metric gap (be it data hotness, size, etc).

The difficult thing is identifying these small changes that can bring a
percentage of the benefit without getting into corner cases that take
most of the effort.

>
>> > Technically speaking, any file system can place different types of metadata in
>> > different reclaim units. However, user data is slightly more tricky case. Potentially,
>> > file system logic can track “hotness” or frequency of updates of some user data
>> > and try to direct the different types of user data in different reclaim units.
>
>*cough*
>
>We already do this in the LBA space via the filesytsem allocators.
>It's often configurable and generally called "allocation policies".
>
>> > But, from another point of view, we have folders in file system namespace.
>> > If application can place different types of data in different folders, then, technically
>> > speaking, file system logic can place the content of different folders into different
>> > reclaim units. But application needs to follow some “discipline” to store different
>> > types of user data (different “hotness”, for example) in different folders.
>
>Yup, XFS does this "physical locality is determined by parent
>directory" separation by default (the inode64 allocation policy).
>Every new directory inode is placed in a different allocation group
>(LBA space) based on a rotor mechanism. All the files within that
>directory are kept local to the directory (i.e. in the same AG/LBA
>space) as much as possible.
>
>Most filesystems have LBA locality policies like this because it is
>highly efficient on physical seek latency limited storage hardware.
>i.e. the storage hardware we've mostly been using since the early
>1980s.
>
>We could make allocation groups have different reclaim units,
>but then we are talking about needing an arbitrary number of
>different IO hints - XFS supports ~2^31 AGs if the filesystem is
>large enough, and there's no way we're going to try to support that
>many IO hints (software or hardware) in the foreseeable future.
>
>IF devices want to try to classify related data themselves, then
>using LBA locality internally to classify relationships below the
>level of IO hints, then that would be a much closer match to how
>filesystems have traditionally structured the data and metadata on
>disk. Related data and metadata tends to get written to the same LBA
>regions because that's the fastest way to access related and
>metadata on seek-limited hardware.
>
>Yeah, I know that these are SSDs we are talking about and they
>aren't seek limited, but when we already have filesystem
>implementations that try to clump related things to nearby LBA
>spaces, it might be best to try to leverage this behaviour rather
>than try to rely on kernel and userspace to correctly provide hints
>about their data patterns.

+1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-17 11:58       ` Javier González
  2024-01-17 21:51         ` Dave Chinner
@ 2024-01-19  9:58         ` Kanchan Joshi
  1 sibling, 0 replies; 10+ messages in thread
From: Kanchan Joshi @ 2024-01-19  9:58 UTC (permalink / raw)
  To: Javier González, Viacheslav Dubeyko
  Cc: lsf-pc, Linux FS Devel, Adam Manzanares, linux-scsi, linux-nvme,
	linux-block, slava, Bart Van Assche

On 1/17/2024 5:28 PM, Javier González wrote:
> On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
>>
>>
>>> On Jan 15, 2024, at 8:54 PM, Javier González 
>>> <javier.gonz@samsung.com> wrote:
>>>
>>> On 15.01.2024 11:46, Viacheslav Dubeyko wrote:
>>>> Hi Javier,
>>>>
>>>> Samsung introduced Flexible Data Placement (FDP) technology
>>>> pretty recently. As far as I know, currently, this technology
>>>> is available for user-space solutions only. I assume it will be
>>>> good to have discussion how kernel-space file systems could
>>>> work with SSDs that support FDP technology by employing
>>>> FDP benefits.
>>>
>>> Slava,
>>>
>>> Thanks for bringing this up.
>>>
>>> First, this is not a Samsung technology. Several vendors are building
>>> FDP and several customers are already deploying first product.
>>>
>>> We enabled FDP thtough I/O Passthru to avoid unnecesary noise in the
>>> block layer until we had a clear idea on use-cases. We have been
>>> following and reviewing Bart's write hint series and it covers all the
>>> block layer and interface needed to support FDP. Currently, we have
>>> patches with small changes to wire the NVMe driver. We plan to submit
>>> them after Bart's patches are applied. Now it is a good time since we
>>> have LSF and there are also 2 customers using FDP on block and file.
>>>
>>>>
>>>> How soon FDP API will be available for kernel-space file systems?
>>>
>>> The work is done. We will submit as Bart's patches are applied.
>>>
>>> Kanchan is doing this work.
>>>
>>>> How kernel-space file systems can adopt FDP technology?
>>>
>>> It is based on write hints. There is no FS-specific placement decisions.
>>> All the responsibility is in the application.
>>>
>>> Kanchan: Can you comment a bit more on this?

Application-specific placement (with write-hints) is almost 
FS-independent, and some applications (that require more control or 
predictable outcomes across file systems) prefer that.
It also punts the responsibility of accuracy on the application side 
(and kernel prefers that at times).

FS-specific placement is the next step, but it needs to be 
discussed/done at each FS level. The effort (and likely outcome, too) 
will vary.

Also, we will need to avoid the hint collision between the user and 
kernel. Either by extending/reserving a different range of hints 
exclusively for in-kernel use or take the F2FS approach (i.e., mount 
option that keeps hints for the user or FS use but not both).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-17 21:51         ` Dave Chinner
  2024-01-18  7:12           ` Javier González
@ 2024-01-19 10:49           ` Kanchan Joshi
  2024-01-19 20:49           ` Keith Busch
  2 siblings, 0 replies; 10+ messages in thread
From: Kanchan Joshi @ 2024-01-19 10:49 UTC (permalink / raw)
  To: Dave Chinner, Javier González
  Cc: Viacheslav Dubeyko, lsf-pc, Linux FS Devel, Adam Manzanares,
	linux-scsi, linux-nvme, linux-block, slava, Bart Van Assche

On 1/18/2024 3:21 AM, Dave Chinner wrote:
> On Wed, Jan 17, 2024 at 12:58:12PM +0100, Javier González wrote:
>> On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
>>>> On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz@samsung.com> wrote:
>>>>> How FDP technology can improve efficiency and reliability of
>>>>> kernel-space file system?
>>>>
>>>> This is an open problem. Our experience is that making data placement
>>>> decisions on the FS is tricky (beyond the obvious data / medatadata). If
>>>> someone has a good use-case for this, I think it is worth exploring.
>>>> F2FS is a good candidate, but I am not sure FDP is of interest for
>>>> mobile - here ZUFS seems to be the current dominant technology.
>>>>
>>>
>>> If I understand the FDP technology correctly, I can see the benefits for
>>> file systems. :)
>>>
>>> For example, SSDFS is based on segment concept and it has multiple
>>> types of segments (superblock, mapping table, segment bitmap, b-tree
>>> nodes, user data). So, at first, I can use hints to place different segment
>>> types into different reclaim units.
>>
>> Yes. This is what I meant with data / metadata. We have looked also into
>> using 1 RUH for metadata and rest make available to applications. We
>> decided to go with a simple solution to start with and complete as we
>> see users.
> 
> XFS has an abstract type definition for metadata that is uses to
> prioritise cache reclaim (i.e. classifies what metadata is more
> important/hotter) and that could easily be extended to IO hints
> to indicate placement.

That sounds very useful.

> We also have a separate journal IO path, and that is probably the
> hotest LBA region of the filesystem (circular overwrite region)
> which would stand to have it's own classification as well.

In the past, I saw nice wins after separating the journal in XFS and 
Ext4 [1]. This is low-effort, high-gain item.

[1]https://www.usenix.org/system/files/conference/fast18/fast18-rho.pdf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
  2024-01-17 21:51         ` Dave Chinner
  2024-01-18  7:12           ` Javier González
  2024-01-19 10:49           ` Kanchan Joshi
@ 2024-01-19 20:49           ` Keith Busch
  2 siblings, 0 replies; 10+ messages in thread
From: Keith Busch @ 2024-01-19 20:49 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Javier González, Viacheslav Dubeyko, lsf-pc, Linux FS Devel,
	Adam Manzanares, linux-scsi, linux-nvme, linux-block, slava,
	Kanchan Joshi, Bart Van Assche

On Thu, Jan 18, 2024 at 08:51:37AM +1100, Dave Chinner wrote:
> On Wed, Jan 17, 2024 at 12:58:12PM +0100, Javier González wrote:
> > On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
> > > > On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz@samsung.com> wrote:
> > > > > How FDP technology can improve efficiency and reliability of
> > > > > kernel-space file system?
> > > > 
> > > > This is an open problem. Our experience is that making data placement
> > > > decisions on the FS is tricky (beyond the obvious data / medatadata). If
> > > > someone has a good use-case for this, I think it is worth exploring.
> > > > F2FS is a good candidate, but I am not sure FDP is of interest for
> > > > mobile - here ZUFS seems to be the current dominant technology.
> > > > 
> > > 
> > > If I understand the FDP technology correctly, I can see the benefits for
> > > file systems. :)
> > > 
> > > For example, SSDFS is based on segment concept and it has multiple
> > > types of segments (superblock, mapping table, segment bitmap, b-tree
> > > nodes, user data). So, at first, I can use hints to place different segment
> > > types into different reclaim units.
> > 
> > Yes. This is what I meant with data / metadata. We have looked also into
> > using 1 RUH for metadata and rest make available to applications. We
> > decided to go with a simple solution to start with and complete as we
> > see users.
> 
> XFS has an abstract type definition for metadata that is uses to
> prioritise cache reclaim (i.e. classifies what metadata is more
> important/hotter) and that could easily be extended to IO hints
> to indicate placement.
>
> We also have a separate journal IO path, and that is probably the
> hotest LBA region of the filesystem (circular overwrite region)
> which would stand to have it's own classification as well.

Filesystem metadata is pretty small spatially in the LBA range, but
seems to have higher overwrite frequency than other data, so this could
be a great fit for FDP. Some of my _very_ early analysis though
indicates REQ_META is too coarse to really get the benefits, so finer
grain separation through new flag or hints should help.
 
> We've long talked about making use of write IO hints for separating
> these things out, but requiring 10+ IO hint channels just for
> filesystem metadata to be robustly classified has been a show
> stopper. Doing nothing is almost always better than doing placement
> hinting poorly.

Yeah, a totally degenerate application could make things worse than just
not using these write hints. NVMe's FDP has a standard defined feedback
mechanism through log pages to see how well you're doing with respect to
write amplification. If we assume applications using this optimization
are acting in good faith, we should be able to tune the use cases. The
FDP abstractions seem appropriate to provide generic solutions that
don't tailor to just any one vendor.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-19 20:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20240115084656eucas1p219dd48243e2eaec4180e5e6ecf5e8ad9@eucas1p2.samsung.com>
2024-01-15  8:46 ` [LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems Viacheslav Dubeyko
2024-01-15 17:54   ` Javier González
2024-01-16  8:39     ` Viacheslav Dubeyko
2024-01-17 11:58       ` Javier González
2024-01-17 21:51         ` Dave Chinner
2024-01-18  7:12           ` Javier González
2024-01-19 10:49           ` Kanchan Joshi
2024-01-19 20:49           ` Keith Busch
2024-01-19  9:58         ` Kanchan Joshi
2024-01-16 17:46     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.