[LSF/MM ATTEND] Stackable Union Filesystem Implementation

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM ATTEND] Stackable Union Filesystem Implementation
@ 2014-01-07 10:32 Saket Sinha
  0 siblings, 0 replies; 16+ messages in thread
From: Saket Sinha @ 2014-01-07 10:32 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, lsf-pc

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

I would like to attend LSF/MM summit. I will like to discuss approach to be
taken to finally bring up a Union Filesystem for Linux kernel.

My tryst with Union Filesystem began when I was involved developing a
filesystem as a part of  GSOC2013(Google Summer of Code) for CERN called
Hepunion Filesystem.

CERN needs a union filesystem for LHCb to provide fast diskless booting for
its nodes. For such an implementation, they need a file system with two
branches a Read-Write and a Read Only so they decided to write a completely
new union file system called Hepunion. The driver was  partially completed and
worked somewhat with some issues on 2.6.18. since they were using
SCL5(Scientific
Linux),

Now since LHCb is  moving to newer kernels, we ported it to newer
kernels but this is where the problem started. The design of our
filesystem was this that we used "path" to map the VFS and the lower
filesystems. With the addition of RCU-lookup in 2.6.35, a lot of
locking was added  in kernel functions like kern_path and made our
driver unstable beyond repair.

So now we are redesigning the entire thing from scratch.

We want to develop this Filesystem to finally have a stackable union
filesystem for the mainline Linux kernel . For such an effort,
collaborative development and community support is a must.

For the redesign, AFAIK
I can think of two ways to do it-

 1. VFS-based stacking solution- I would like to cite the work done by
Valerie Aurora was closest.

 2. Non-VFS-based stacking solution -  UnionFS, Aufs and the new Overlay FS

Patches for kernel exists for overlayfs & unionfs.
What is  communities view like which one would be good fit to go with?

The use case that I am looking from the stackable filesystem is  that of
"diskless node handling" (for CERN where it is required to provide a faster
diskless
booting to the Large Hadron Collider Beauty nodes).

 For this we need a
1. A global Read Only FIlesystem
2. A client-specific Read Write FIlesystem via NFS
3. A local Merged(of the above two) Read Write FIlesystem on ramdisk.

Thus to design such a fileystem I need community support and hence want to
attend LSF/MM summit.

  Regards,
  Saket Sinha

[-- Attachment #2: Type: text/html, Size: 6602 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [LSF/MM ATTEND] Stackable Union Filesystem Implementation
@ 2014-01-07 10:34 Saket Sinha
  2014-01-07 12:23 ` Jan Kara
  2014-01-07 16:52 ` J. R. Okajima
  0 siblings, 2 replies; 16+ messages in thread
From: Saket Sinha @ 2014-01-07 10:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, lsf-pc

I would like to attend LSF/MM summit. I will like to discuss approach
to be taken to finally bring up a Union Filesystem for Linux kernel.

My tryst with Union Filesystem began when I was involved developing a
filesystem as a part of  GSOC2013(Google Summer of Code) for CERN
called Hepunion Filesystem.

CERN needs a union filesystem for LHCb to provide fast diskless
booting for its nodes. For such an implementation, they need a file
system with two branches a Read-Write and a Read Only so they decided
to write a completely new union file system called Hepunion. The
driver was  partially completed and worked somewhat with some issues
on 2.6.18. since they were using SCL5(Scientific Linux),

Now since LHCb is  moving to newer kernels, we ported it to newer
kernels but this is where the problem started. The design of our
filesystem was this that we used "path" to map the VFS and the lower
filesystems. With the addition of RCU-lookup in 2.6.35, a lot of
locking was added  in kernel functions like kern_path and made our
driver unstable beyond repair.

So now we are redesigning the entire thing from scratch.

We want to develop this Filesystem to finally have a stackable union
filesystem for the mainline Linux kernel . For such an effort,
collaborative development and community support is a must.

For the redesign, AFAIK
I can think of two ways to do it-

 1. VFS-based stacking solution- I would like to cite the work done by
Valerie Aurora was closest.

 2. Non-VFS-based stacking solution -  UnionFS, Aufs and the new Overlay FS

Patches for kernel exists for overlayfs & unionfs.
What is  communities view like which one would be good fit to go with?

The use case that I am looking from the stackable filesystem is  that
of "diskless node handling" (for CERN where it is required to provide
a faster diskless
booting to the Large Hadron Collider Beauty nodes).

 For this we need a
1. A global Read Only FIlesystem
2. A client-specific Read Write FIlesystem via NFS
3. A local Merged(of the above two) Read Write FIlesystem on ramdisk.

Thus to design such a fileystem I need community support and hence
want to attend LSF/MM summit.

  Regards,
  Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-07 10:34 [LSF/MM ATTEND] Stackable Union Filesystem Implementation Saket Sinha
@ 2014-01-07 12:23 ` Jan Kara
  2014-01-07 20:04   ` Saket Sinha
  2014-01-07 16:52 ` J. R. Okajima
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-01-07 12:23 UTC (permalink / raw)
  To: Saket Sinha; +Cc: linux-fsdevel, linux-mm, lsf-pc

On Tue 07-01-14 16:04:03, Saket Sinha wrote:
> I would like to attend LSF/MM summit. I will like to discuss approach
> to be taken to finally bring up a Union Filesystem for Linux kernel.
> 
> My tryst with Union Filesystem began when I was involved developing a
> filesystem as a part of  GSOC2013(Google Summer of Code) for CERN
> called Hepunion Filesystem.
> 
> CERN needs a union filesystem for LHCb to provide fast diskless
> booting for its nodes. For such an implementation, they need a file
> system with two branches a Read-Write and a Read Only so they decided
> to write a completely new union file system called Hepunion. The
> driver was  partially completed and worked somewhat with some issues
> on 2.6.18. since they were using SCL5(Scientific Linux),
> 
> Now since LHCb is  moving to newer kernels, we ported it to newer
> kernels but this is where the problem started. The design of our
> filesystem was this that we used "path" to map the VFS and the lower
> filesystems. With the addition of RCU-lookup in 2.6.35, a lot of
> locking was added  in kernel functions like kern_path and made our
> driver unstable beyond repair.
> 
> So now we are redesigning the entire thing from scratch.
> 
> We want to develop this Filesystem to finally have a stackable union
> filesystem for the mainline Linux kernel . For such an effort,
> collaborative development and community support is a must.
> 
> 
> For the redesign, AFAIK
> I can think of two ways to do it-
> 
>  1. VFS-based stacking solution- I would like to cite the work done by
> Valerie Aurora was closest.
> 
>  2. Non-VFS-based stacking solution -  UnionFS, Aufs and the new Overlay FS
  So I'm wondering, have you tried using any of the above mentioned
solutions? I know at least Overlay FS should be pretty usable with any
recent kernel, aufs seems to be ported to recent kernels as well. I'm not
sure how recent patches can you get for unionfs. Are you missing some
functionality?

> Patches for kernel exists for overlayfs & unionfs.
> What is communities view like which one would be good fit to go with?
  Currently Miklos Szeredi is working on getting his Overlay FS upstream,
also UnionFS has reasonable chance of getting there eventually. Currently
both of them are blocked by some VFS changes AFAIK and Miklos is working on
them.

> The use case that I am looking from the stackable filesystem is  that
> of "diskless node handling" (for CERN where it is required to provide
> a faster diskless
> booting to the Large Hadron Collider Beauty nodes).
> 
>  For this we need a
> 1. A global Read Only FIlesystem
> 2. A client-specific Read Write FIlesystem via NFS
> 3. A local Merged(of the above two) Read Write FIlesystem on ramdisk.
  I'm not sure I understand. So you have one read-only FS which is exported
to cliens over NFS I presume. Then you have another client specific
filesystem, again mounted over NFS. I'm somewhat puzzled by the
'read-write' note there - do you mean that the client-specific filesystem
can be changed while it is mounted by a client? Or do you mean that the
client can change the filesystem to store its data? And if client can store
data on NFS, what is the purpose of a filesystem on ramdisk?

> Thus to design such a fileystem I need community support and hence
> want to attend LSF/MM summit.
  So my suggestion would be to try OverlayFS / UnionFS, see what works /
doesn't work for you and work with respective developers to address your
needs. We definitely don't need yet another fs-unioning implementation.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-07 10:34 [LSF/MM ATTEND] Stackable Union Filesystem Implementation Saket Sinha
  2014-01-07 12:23 ` Jan Kara
@ 2014-01-07 16:52 ` J. R. Okajima
  2014-01-07 20:21   ` Saket Sinha
  1 sibling, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2014-01-07 16:52 UTC (permalink / raw)
  To: Saket Sinha; +Cc: linux-fsdevel, linux-mm, lsf-pc

Saket Sinha:
>  1. VFS-based stacking solution- I would like to cite the work done by
> Valerie Aurora was closest.
>
>  2. Non-VFS-based stacking solution -  UnionFS, Aufs and the new Overlay FS

Overayfs is essentially a rewrite of UnionMount (implemented in VFS
layer), to be a filesystem. They both have several unresolved issues by
design "name-based union", and I have pointed out on LKML several times.
For example, here is a URL of my last post about it.
http://marc.info/?l=linux-kernel&m=136310958022160&w=2

> The use case that I am looking from the stackable filesystem is  that
> of "diskless node handling" (for CERN where it is required to provide
> a faster diskless
> booting to the Large Hadron Collider Beauty nodes).

Just out of curious, I remember a guy in CERN had posted a message to
aufs-users ML.
http://www.mail-archive.com/aufs-users@lists.sourceforge.net/msg04020.html

Are you co-working with him? Or CERN totally stopped using aufs?

J. R. Okajima

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-07 12:23 ` Jan Kara
@ 2014-01-07 20:04   ` Saket Sinha
  2014-01-08  5:10     ` J. R. Okajima
  2014-01-08 11:16     ` Jan Kara
  0 siblings, 2 replies; 16+ messages in thread
From: Saket Sinha @ 2014-01-07 20:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm, lsf-pc

>   So I'm wondering, have you tried using any of the above mentioned
> solutions? I know at least Overlay FS should be pretty usable with any
> recent kernel, aufs seems to be ported to recent kernels as well. I'm not
> sure how recent patches can you get for unionfs.
>

Several implementations of union file system fusion were evaluated.
The results of the evaluation is shown at the below link-
http://www.4shared.com/download/7IgHqn4tce/1_online.png

While evaluating union file systems implementations, it became clear
that none was perfect for net booted nodes.
All were designed with totally different goals than ours.

One of the big problems was that too many copyups were made on the
read-write file system. So we decided to implement an union file
system designed for diskless systems, with the following
functionalities:

1. union between only one read-only and one read-write file systems

2. if only the file metadata are modified, then do not
copy the whole file on the read-write files system but
only the metadata (stored with a file named as the file
itself prefixed by '.me.')

3. check when files on the read-write file system can be removed

>Are you missing some  functionality?

The use case of a union type filesystem that I am looking for (CERN)
is diskless clients which AFAIR this can not be done in overlayfs.
Correct me if I am wrong.

>> Patches for kernel exists for overlayfs & unionfs.
>> What is communities view like which one would be good fit to go with?
>   Currently Miklos Szeredi is working on getting his Overlay FS upstream,
> also UnionFS has reasonable chance of getting there eventually. Currently
> both of them are blocked by some VFS changes AFAIK and Miklos is working on
> them.
>
This is what I am looking forward too. I want to know what all exactly
kernel maintainers want from a stackable Union filesystem which they
finally would let into mainline kernel. I even wrote to Al-Viro and
linux-fsdevel community but haven't got any responses. UnionFS and
Aufs have existed for many years outside the mainline kernel with no
signs of ever get included. Recently I have heard a lot about Overlay
Fs too but I even doubt its fate.

>>  For this we need a
>> 1. A global Read Only FIlesystem
>> 2. A client-specific Read Write FIlesystem via NFS
>> 3. A local Merged(of the above two) Read Write FIlesystem on ramdisk.
>   I'm not sure I understand.

Let me answer question one by one to explain
>So you have one read-only FS which is exported  to cliens over NFS I presume. Then you have another client specific
> filesystem, again mounted over NFS.
We first tried to make the union on the nodes during diskless
initialisation but finally choose to do it on the
server, and NFS exports the “unioned” file system. Client side union
was just using too much memory.

>I'm somewhat puzzled by the  'read-write' note there - do you mean that the client-specific filesystem
> can be changed while it is mounted by a client? Or do you mean that the
> client can change the filesystem to store its data?
I mean the client has the permission to change the data and modify it.

 >And if client can store
> data on NFS, what is the purpose of a filesystem on ramdisk?

I am sorry for that I wanted to give that as an alternative to the
above approach. Just a typo.
A local Merged(of the above two) Read Write FIlesystem on ramdisk is
something what happens in Knoppix distro where you get an impression
that you are able to change and modify data.

Let me list the RHEL way of setting up a diskless server for perhaps
better understanding.
Up to RHEL5, Red Hat had a package named systemconfig-netboot to setup
diskless servers. It was a set of
python and bash scripts that were setting up the dhcpd and tftpd
servers, customising the shared root file system
and making the initial ramdisk for the diskless nodes.

To make some files of the root file system writeable for the nodes,
with possibly different contents, the
initialisation script from this package was making the following
actions after having mounted the root file
system:
• Mount the 'snapshot' directory from the server, in read/write mode.
This directory contains one subdirectory
per node and two files with the list of the files that need to be writable.
• Remount (using the bind mount option) each of these files from the
node's snapshot to the root file system.

There are two problems with this method:
• Only files or folders of the fixed list can be writeable. To add a
file to that list, we have to reboot the nodes
after the file list modification.
• The mount table is 'polluted' by all these remounts.

Our solution: To add flexibility to the diskless nodes handling, we
had the idea of using file system union

>> Thus to design such a fileystem I need community support and hence
>> want to attend LSF/MM summit.
>   So my suggestion would be to try OverlayFS / UnionFS, see what works /
> doesn't work for you and work with respective developers to address your
> needs. We definitely don't need yet another fs-unioning implementation.
>
I am sorry but as all the existing solutions do not completely meet my
use-case mentioned above. Even I do not want to re-invent the wheel
but  I have mentioned above the reasons why we went for a new
solution.

Regards,
Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-07 16:52 ` J. R. Okajima
@ 2014-01-07 20:21   ` Saket Sinha
  0 siblings, 0 replies; 16+ messages in thread
From: Saket Sinha @ 2014-01-07 20:21 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: linux-fsdevel, linux-mm, lsf-pc

> Just out of curious, I remember a guy in CERN had posted a message to
> aufs-users ML.
> http://www.mail-archive.com/aufs-users@lists.sourceforge.net/msg04020.html
>
> Are you co-working with him?
Yes. Jacob Bloomer was one of my mentors during this initiative.

 >> CERN totally stopped using aufs?
You can see we decided to write our own. The details about our effort
can be found here at my github page
https://github.com/disdi/hepunion

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-07 20:04   ` Saket Sinha
@ 2014-01-08  5:10     ` J. R. Okajima
  2014-01-08 18:06       ` Saket Sinha
  2014-01-08 11:16     ` Jan Kara
  1 sibling, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2014-01-08  5:10 UTC (permalink / raw)
  To: Saket Sinha; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

Saket Sinha:
> Several implementations of union file system fusion were evaluated.
> The results of the evaluation is shown at the below link-
> http://www.4shared.com/download/7IgHqn4tce/1_online.png

As far as I know, aufs supports NFS branches and also you can export
aufs via NFS.
For example,
http://sourceforge.net/p/aufs/mailman/message/20639513/

> 2. if only the file metadata are modified, then do not
> copy the whole file on the read-write files system but
> only the metadata (stored with a file named as the file
> itself prefixed by '.me.')

Once I have considered such approach to implement it in aufs.
But I don't think it a good idea to store metadata in multiple places,
one in the original file and the other is in .me. file.
For such purpose, a "block device level union" (instead of filesystem
level union) may be an option for you, such as "dm snapshot".

J. R. Okajima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-07 20:04   ` Saket Sinha
  2014-01-08  5:10     ` J. R. Okajima
@ 2014-01-08 11:16     ` Jan Kara
  2014-01-08 18:26       ` Saket Sinha
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-01-08 11:16 UTC (permalink / raw)
  To: Saket Sinha; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

On Wed 08-01-14 01:34:47, Saket Sinha wrote:
> >   So I'm wondering, have you tried using any of the above mentioned
> > solutions? I know at least Overlay FS should be pretty usable with any
> > recent kernel, aufs seems to be ported to recent kernels as well. I'm not
> > sure how recent patches can you get for unionfs.
> >
> 
> Several implementations of union file system fusion were evaluated.
> The results of the evaluation is shown at the below link-
> http://www.4shared.com/download/7IgHqn4tce/1_online.png
> 
> While evaluating union file systems implementations, it became clear
> that none was perfect for net booted nodes.
> All were designed with totally different goals than ours.
> 
> One of the big problems was that too many copyups were made on the
> read-write file system. So we decided to implement an union file
> system designed for diskless systems, with the following
> functionalities:
> 
> 1. union between only one read-only and one read-write file systems
> 
> 2. if only the file metadata are modified, then do not
> copy the whole file on the read-write files system but
> only the metadata (stored with a file named as the file
> itself prefixed by '.me.')
  So do you do anything special at CERN so that metadata is often modified
without data being changed? Because there are only two operations where I
can imagine this to be useful:
1) atime update - but you better turn atime off for unioned filesystem
   anyway.
2) xattr update

> 3. check when files on the read-write file system can be removed
  How can that happen?

> >Are you missing some  functionality?
> 
> The use case of a union type filesystem that I am looking for (CERN)
> is diskless clients which AFAIR this can not be done in overlayfs.
> Correct me if I am wrong.
  Well, I believe all unioning solutions want to support the read-only
filesystem overlayed by a read-write filesystem. Your points 2. and 3. is
what makes your requirements non-standard.

> >>  For this we need a
> >> 1. A global Read Only FIlesystem
> >> 2. A client-specific Read Write FIlesystem via NFS
> >> 3. A local Merged(of the above two) Read Write FIlesystem on ramdisk.
> >   I'm not sure I understand.
> 
> Let me answer question one by one to explain
> >So you have one read-only FS which is exported  to cliens over NFS I presume. Then you have another client specific
> > filesystem, again mounted over NFS.
> We first tried to make the union on the nodes during diskless
> initialisation but finally choose to do it on the
> server, and NFS exports the “unioned” file system. Client side union
> was just using too much memory.
> 
> >I'm somewhat puzzled by the  'read-write' note there - do you mean that the client-specific filesystem
> > can be changed while it is mounted by a client? Or do you mean that the
> > client can change the filesystem to store its data?
> I mean the client has the permission to change the data and modify it.
> 
> 
>  >And if client can store
> > data on NFS, what is the purpose of a filesystem on ramdisk?
> 
> I am sorry for that I wanted to give that as an alternative to the
> above approach. Just a typo.
> A local Merged(of the above two) Read Write FIlesystem on ramdisk is
> something what happens in Knoppix distro where you get an impression
> that you are able to change and modify data.
  OK, that makes sense now. Thanks for explanation.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-08  5:10     ` J. R. Okajima
@ 2014-01-08 18:06       ` Saket Sinha
  2014-01-09  7:32         ` J. R. Okajima
  0 siblings, 1 reply; 16+ messages in thread
From: Saket Sinha @ 2014-01-08 18:06 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

>> Several implementations of union file system fusion were evaluated.
>> The results of the evaluation is shown at the below link-
>> http://www.4shared.com/download/7IgHqn4tce/1_online.png
>
> As far as I know, aufs supports NFS branches and also you can export
> aufs via NFS.
> For example,
> http://sourceforge.net/p/aufs/mailman/message/20639513/
>
>
I am not sure of this. These results were given to me by Cern and I
really have to check this out to make sure it works.


>> 2. if only the file metadata are modified, then do not
>> copy the whole file on the read-write files system but
>> only the metadata (stored with a file named as the file
>> itself prefixed by '.me.')
>
> Once I have considered such approach to implement it in aufs.
> But I don't think it a good idea to store metadata in multiple places,
> one in the original file and the other is in .me. file.
> For such purpose, a "block device level union" (instead of filesystem
> level union) may be an option for you, such as "dm snapshot".
>
I imagine that this would make things more complicated as ideally this
should be done in a filesystem driver. Again a "block device level
union" would all the more have lesser chances of getting this
filesystem driver included in the mainline kernel as kernel
maintainers prefer the drivers to be as simple as possible.

Before taking any approach I really want to discuss it with kernel
maintainers as to what solution they are expecting. The truth is that
the architecture of Linux kernel is such that a stackable filesystem
implementation would surely involve some vicious hacks.

Regards,
Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-08 11:16     ` Jan Kara
@ 2014-01-08 18:26       ` Saket Sinha
  2014-01-08 21:26         ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Saket Sinha @ 2014-01-08 18:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm, lsf-pc

>> One of the big problems was that too many copyups were made on the
>> read-write file system. So we decided to implement an union file
>> system designed for diskless systems, with the following
>> functionalities:
>>
>> 1. union between only one read-only and one read-write file systems
>>
>> 2. if only the file metadata are modified, then do not
>> copy the whole file on the read-write files system but
>> only the metadata (stored with a file named as the file
>> itself prefixed by '.me.')
>   So do you do anything special at CERN so that metadata is often modified
> without data being changed? Because there are only two operations where I
> can imagine this to be useful:
> 1) atime update - but you better turn atime off for unioned filesystem
>    anyway.
> 2) xattr update
>
As already mentioned that the issue that we were facing was that "too
many copyups were made on the  read-write file system".
Writes to a file system in a  unioning file system will produce many
duplicated blocks in memory since it uses a stackable filesystem
approach so response time for a particular operation is also a
concern.

Regards,
Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-08 18:26       ` Saket Sinha
@ 2014-01-08 21:26         ` Jan Kara
  2014-01-09 10:06           ` Saket Sinha
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-01-08 21:26 UTC (permalink / raw)
  To: Saket Sinha; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

On Wed 08-01-14 23:56:57, Saket Sinha wrote:
> >> One of the big problems was that too many copyups were made on the
> >> read-write file system. So we decided to implement an union file
> >> system designed for diskless systems, with the following
> >> functionalities:
> >>
> >> 1. union between only one read-only and one read-write file systems
> >>
> >> 2. if only the file metadata are modified, then do not
> >> copy the whole file on the read-write files system but
> >> only the metadata (stored with a file named as the file
> >> itself prefixed by '.me.')
> >   So do you do anything special at CERN so that metadata is often modified
> > without data being changed? Because there are only two operations where I
> > can imagine this to be useful:
> > 1) atime update - but you better turn atime off for unioned filesystem
> >    anyway.
> > 2) xattr update
> >
> As already mentioned that the issue that we were facing was that "too
> many copyups were made on the  read-write file system".
  But my question is: In which cases specifically do you want to avoid
copyups as compared to e.g. Overlayfs?

> Writes to a file system in a  unioning file system will produce many
> duplicated blocks in memory since it uses a stackable filesystem
> approach so response time for a particular operation is also a
> concern.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-08 18:06       ` Saket Sinha
@ 2014-01-09  7:32         ` J. R. Okajima
  2014-01-09  9:19           ` Saket Sinha
  0 siblings, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2014-01-09  7:32 UTC (permalink / raw)
  To: Saket Sinha; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc


Saket Sinha:
> > For such purpose, a "block device level union" (instead of filesystem
> > level union) may be an option for you, such as "dm snapshot".
> >
> I imagine that this would make things more complicated as ideally this
> should be done in a filesystem driver. Again a "block device level
> union" would all the more have lesser chances of getting this
> filesystem driver included in the mainline kernel as kernel
> maintainers prefer the drivers to be as simple as possible.

??
I am afraid that I cannot fully understand what you wrote.
If you think "dm snapshot" does not exist currently, and you or someone
else are going to develop a new feature, that is wrong. You already have
"dm snapshot" feature and you can "stack" the block devices by using it.
(cf. http://aufs.sourceforge.net/aufs2/report/sq/sq.pdf which is a bit
old)


J. R. Okajima

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-09  7:32         ` J. R. Okajima
@ 2014-01-09  9:19           ` Saket Sinha
  2014-01-09 14:17             ` J. R. Okajima
  0 siblings, 1 reply; 16+ messages in thread
From: Saket Sinha @ 2014-01-09  9:19 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

On Thu, Jan 9, 2014 at 1:02 PM, J. R. Okajima <hooanon05g@gmail.com> wrote:
>
> Saket Sinha:
>> > For such purpose, a "block device level union" (instead of filesystem
>> > level union) may be an option for you, such as "dm snapshot".
>> >
>> I imagine that this would make things more complicated as ideally this
>> should be done in a filesystem driver. Again a "block device level
>> union" would all the more have lesser chances of getting this
>> filesystem driver included in the mainline kernel as kernel
>> maintainers prefer the drivers to be as simple as possible.
>
> ??
> I am afraid that I cannot fully understand what you wrote.

I am sorry for not explaining it properly. I was abrupt and hence was
misunderstood. My fault!.

> If you think "dm snapshot" does not exist currently, and you or someone
> else are going to develop a new feature, that is wrong. You already have
> "dm snapshot" feature and you can "stack" the block devices by using it.
> (cf. http://aufs.sourceforge.net/aufs2/report/sq/sq.pdf which is a bit
> old)
NO. I know it very much exists.  It forms the foundation of LVM2,
software RAIDs, dm-crypt disk encryption, and offers additional
features such as file system snapshots and I do not doubt either its
functionality or usage.

What I am referring here is the topic  <storing metadata in multiple
places vs  "block device level union">. See DM operates on block
device/sector, but a stackable ﬁlesystem operates on ﬁlesystem/ﬁle. My
point is this that which is the better approach according to the
kernel maintainers, so that this concept of Unioning gets universally
accepted and we have a mainline kernel union filesystem.

Regards,
Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-08 21:26         ` Jan Kara
@ 2014-01-09 10:06           ` Saket Sinha
  0 siblings, 0 replies; 16+ messages in thread
From: Saket Sinha @ 2014-01-09 10:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm, lsf-pc

>> As already mentioned that the issue that we were facing was that "too
>> many copyups were made on the  read-write file system".
>   But my question is: In which cases specifically do you want to avoid
> copyups as compared to e.g. Overlayfs?
>
    To be honest I do not the answer. I had senior kernel developers
from Cern who guided me when working on this driver. I need to consult
them in order to answer you correctly. I would try to be bring them in
this thread to get you the right answer.

Regards,
Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-09  9:19           ` Saket Sinha
@ 2014-01-09 14:17             ` J. R. Okajima
  2014-01-11 17:21               ` Saket Sinha
  0 siblings, 1 reply; 16+ messages in thread
From: J. R. Okajima @ 2014-01-09 14:17 UTC (permalink / raw)
  To: Saket Sinha; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

Saket,
Thanks for explanation.

Saket Sinha:
> What I am referring here is the topic  <storing metadata in multiple
> places vs  "block device level union">. See DM operates on block
> device/sector, but a stackable =EF=AC=81lesystem operates on =EF=AC=81lesys=
> tem/=EF=AC=81le. My
> point is this that which is the better approach according to the
> kernel maintainers, so that this concept of Unioning gets universally
> accepted and we have a mainline kernel union filesystem.

While I don't know who prefers which approach, generally speaking, if
you get what you want by an existing technology, it must be better to
use it.
Your ".me." approach will surely reduce the consumed blocks in the upper
layer, but it of course contains a new overhead to maintain the
information stored in ".me.".
Additionally, as a result of ".me." approach, the upper layer will
have info as not an ordinary file. I mean, fileA exists on the lower
layer, but its metadata exists on the upper layer. So if a user
(regardless within union or out of union) wants a complete fileA, then
he has to get info from two places and merge them. Such situation looks
similar to "block device level union".

Currently it is unclear which evolution way hepunion will take, but if
you want
- filesystem-type union (instead of mount-type union nor block device
  level union)
- and name-based union (insated of inode-based union)
then the approach is similar to overlayfs's.
So it might be better to make overlayfs as the base of your development.
If supporting NFS branch (or exporting hepunion) is important for you,
then the inode-based solution will be necessary.

J. R. Okajima

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [LSF/MM ATTEND] Stackable Union Filesystem Implementation
  2014-01-09 14:17             ` J. R. Okajima
@ 2014-01-11 17:21               ` Saket Sinha
  0 siblings, 0 replies; 16+ messages in thread
From: Saket Sinha @ 2014-01-11 17:21 UTC (permalink / raw)
  To: J. R. Okajima; +Cc: Jan Kara, linux-fsdevel, linux-mm, lsf-pc

> Currently it is unclear which evolution way hepunion will take, but if
> you want
> - filesystem-type union (instead of mount-type union nor block device
>   level union)
> - and name-based union (insated of inode-based union)
> then the approach is similar to overlayfs's.
> So it might be better to make overlayfs as the base of your development.
> If supporting NFS branch (or exporting hepunion) is important for you,
> then the inode-based solution will be necessary.
>
Thanks for the suggestion. I am looking forward to suggestions like
these from the community so that we can have a universal union
filesystem for mainline linux kernel with most of the use
cases(including Cern's).


Regards,
Saket Sinha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-01-11 17:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-07 10:34 [LSF/MM ATTEND] Stackable Union Filesystem Implementation Saket Sinha
2014-01-07 12:23 ` Jan Kara
2014-01-07 20:04   ` Saket Sinha
2014-01-08  5:10     ` J. R. Okajima
2014-01-08 18:06       ` Saket Sinha
2014-01-09  7:32         ` J. R. Okajima
2014-01-09  9:19           ` Saket Sinha
2014-01-09 14:17             ` J. R. Okajima
2014-01-11 17:21               ` Saket Sinha
2014-01-08 11:16     ` Jan Kara
2014-01-08 18:26       ` Saket Sinha
2014-01-08 21:26         ` Jan Kara
2014-01-09 10:06           ` Saket Sinha
2014-01-07 16:52 ` J. R. Okajima
2014-01-07 20:21   ` Saket Sinha
  -- strict thread matches above, loose matches on Subject: below --
2014-01-07 10:32 Saket Sinha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).