* [RFC PATCH 0/2] Union Mount: Directory listing in glibc
@ 2008-04-29 13:32 bsn.0007
2008-04-29 13:33 ` [RFC PATCH 1/2] Union Mount: glibc readdir support bsn.0007
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: bsn.0007 @ 2008-04-29 13:32 UTC (permalink / raw)
To: libc-alpha
Cc: Jan Blunck, Erez Zadok, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
Hi,
I went through Bharata's RFC post on glibc based Union Mount readdir solution
(http://lkml.org/lkml/2008/3/11/34) and have come up with patches
against glibc to implement the same.
The RFC discussed about the information glibc readdir needs to get about
union mounted directories and I have assumed the following information
to be available from the kernel for this implementation.
- Kernel would return all the dirents (including duplicates and whiteouts)
starting from the topmost directory of the union.
- Indication that this directory is a union mounted directory
I have assumed that kernel would return a "." whiteout as the first
directory entry of the union. This would tell glibc readdir(3) that it is
working with a union mounted directory and it needs to do duplicate
elimination and whiteout suppression. It starts building a dirent cache
for this purpose.
Ulrich had suggested that we could use the fstat call to recognize union
mounts. But looking at the stat structure from stat(2), it was not obvious
as to which field in there could be used for this purpose. Hence for this
prototype implementation, I decided to go with what Al Viro suggested, which
is about using a "." whiteout.
- Indication that kernel is done with returning entries from the topmost
directory.
I have assumed that kernel would return a "." whiteout at the beginning
of each directory of the union. So when glibc gets a 2nd "." whiteout, it
will start performing duplicate elimination.
- Whiteout indication
glibc will depend on dirent->d_type to be set to DT_WHT on a whiteout
file.
With this post, I am sending two patches:
Patch 1. readdir support for union mounted directories.
I am caching the dirent names in a list to aid duplicate elimination.
And this cache is stored in DIRP. For duplicate elimination I am using
strcmp(). I am not sure if this works universally with different types
of filesystems. Any suggestions here would be welcome.
Patch 2. seekdir support.
The seekdir works on the cache maintained by readdir. Since after a
seekdir, it might become necessary for readdir to return dirents from cache
(as against getting them from readdir(2)/getdents(2)), I had to cache the
entire dirent structure in readdir. I understand that this is expensive, but
not sure if this is avoidable if we have to support seekdir.
To support seekdir on a union mounted directory, the seek is applied
to the cache of dirents. The offsets (dirent->d_off) returned by readdir(3) has
been modified to return linearly increasing offsets like 0,1,2,... rather
than returning filesystem-returned offsets. This helps us to have a uniform
seek across all the directories of the union.
With seekdir modified, I had to modify telldir also not to return
filesystem-returned offsets for union mounted directories.
With seekdir support, it becomes necessary in readdir to check if we need
to return dirents from cache. And this adds a bit of overhead to readdir
as we have to do this check for every directory.
Compatibility issues
--------------------
There are many versions of dirent structure in glibc and I have tried my
best to take care of compatibility issues. But I have not really tested
readdir64 or old_readdir64. Also atleast one version of dirent structure
doesn't have d_type field and my whiteout suppression logic depends on it
and uses it in the generic __READDIR routine which gets used by various
version of readdir and I think this would break that readdir version which
uses dirent structure w/o d_type. I will be taking care of such compatibility
issues more cleanly/thoroughly in subsequent posts.
Testing
-------
I have done very minimal testing of these patches on a Intel machine which
uses 32 bit readdir. There might be some corner cases in readdir, seekdir
and telldir which I might not have taken care of and would be happy to fix them
if pointed to. I have tested these patches together with Union Mount
patches for 2.6.24-rc2-mm1. These patches are for glibc-2.7.
I request you to reveiw the patches. Any comments and suggestions are
greatly welcome.
Regards
Nagabhushan
^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-04-29 13:32 [RFC PATCH 0/2] Union Mount: Directory listing in glibc bsn.0007
@ 2008-04-29 13:33 ` bsn.0007
2008-05-01 6:08 ` Ulrich Drepper
2008-04-29 13:34 ` [RFC PATCH 2/2] Union Mount: glibc seekdir support bsn.0007
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: bsn.0007 @ 2008-04-29 13:33 UTC (permalink / raw)
To: libc-alpha
Cc: Jan Blunck, Erez Zadok, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
Enhance readdir to support union mounted directories.
readdir now caches the dirents obtained from readdir(2)/getdents(2)
to support duplicate elimination and whiteout suppression for union
mounted directories.
Signed-off-by: Nagabhushan B S <bsn.0007@gmail.com>
---
sysdeps/unix/closedir.c | 14 +++++++++
sysdeps/unix/dirstream.h | 27 ++++++++++++++++++
sysdeps/unix/opendir.c | 4 ++
sysdeps/unix/readdir.c | 69 +++++++++++++++++++++++++++++++++++++++++++++--
4 files changed, 112 insertions(+), 2 deletions(-)
--- a/sysdeps/unix/closedir.c
+++ b/sysdeps/unix/closedir.c
@@ -32,6 +32,7 @@ int
__closedir (DIR *dirp)
{
int fd;
+ struct union_dir_cache *temp = NULL;
if (dirp == NULL)
{
@@ -49,6 +50,19 @@ __closedir (DIR *dirp)
__libc_lock_fini (dirp->lock);
#endif
+ if (dirp->head != NULL)
+ {
+ while (dirp->head->next != NULL)
+ {
+ temp = dirp->head->next;
+ free (dirp->head);
+ dirp->head = temp;
+ }
+ free (dirp->head);
+ dirp->head = NULL;
+ dirp->current = NULL;
+ }
+
free ((void *) dirp);
return close_not_cancel (fd);
--- a/sysdeps/unix/dirstream.h
+++ b/sysdeps/unix/dirstream.h
@@ -16,6 +16,20 @@
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
+#ifndef _UNION_DIR_CACHE
+#define _UNION_DIR_CACHE 1
+
+#include <limits.h>
+
+struct union_dir_cache
+{
+ char fname[NAME_MAX];
+ struct union_dir_cache *next;
+ struct union_dir_cache *prev;
+};
+
+#endif
+
#ifndef _DIRSTREAM_H
#define _DIRSTREAM_H 1
@@ -23,6 +37,14 @@
#include <bits/libc-lock.h>
+#define DIR_FIRST_DIRENT 1
+#define DIR_UNION_MOUNTED 2
+#define DIR_DUP_ELIM_START 4
+
+#define IS_FIRST_DIRENT(dirp) (dirp->union_dir_status & DIR_FIRST_DIRENT)
+#define IS_DIR_UNION_MOUNTED(dirp) (dirp->union_dir_status & DIR_UNION_MOUNTED)
+#define IS_DUP_ELIM_STARTED(dirp) (dirp->union_dir_status & DIR_DUP_ELIM_START)
+
/* Directory stream type.
The miscellaneous Unix `readdir' implementations read directory data
@@ -40,6 +62,11 @@ struct __dirstream
off_t filepos; /* Position of next entry to read. */
+ /* Cache of dirents to be used with union mounted directories */
+ struct union_dir_cache *head;
+ struct union_dir_cache *current;
+ unsigned char union_dir_status;
+
/* Directory block. */
char data[0] __attribute__ ((aligned (__alignof__ (void*))));
};
--- a/sysdeps/unix/opendir.c
+++ b/sysdeps/unix/opendir.c
@@ -202,6 +202,10 @@ __alloc_dir (int fd, bool close_fd, cons
dirp->size = 0;
dirp->offset = 0;
dirp->filepos = 0;
+ dirp->head = NULL;
+ dirp->current = NULL;
+ dirp->union_dir_status = 1;
+ dirp->union_dir_pos = 0;
return dirp;
}
--- a/sysdeps/unix/readdir.c
+++ b/sysdeps/unix/readdir.c
@@ -38,6 +38,9 @@
DIRENT_TYPE *
__READDIR (DIR *dirp)
{
+ int isdup;
+ struct union_dir_cache *temp = NULL;
+
DIRENT_TYPE *dp;
int saved_errno = errno;
@@ -108,8 +111,70 @@ __READDIR (DIR *dirp)
dirp->filepos += reclen;
#endif
- /* Skip deleted files. */
- } while (dp->d_ino == 0);
+ if (IS_FIRST_DIRENT(dirp))
+ {
+ if (dp->d_type == DT_WHT && !strcmp (dp->d_name, "."))
+ dirp->union_dir_status |= DIR_UNION_MOUNTED;
+ }
+
+ if (IS_DIR_UNION_MOUNTED(dirp) && !IS_FIRST_DIRENT(dirp))
+ {
+ /* Skip a file with 0 inode number, make sure that
+ it is not a whiteout type of file. We can get a "." whiteout
+ with inode number 0 at the beginning of each directory. */
+ if (dp->d_ino == 0 && dp->d_type != DT_WHT)
+ continue;
+
+ if (dp->d_type == DT_WHT && !strcmp (dp->d_name, "."))
+ dirp->union_dir_status |= DIR_DUP_ELIM_START;
+
+ isdup = 0;
+ if (IS_DUP_ELIM_STARTED(dirp))
+ {
+ /* Check if the dirent is already present in our cache */
+ temp = dirp->head;
+ while (temp)
+ {
+ if (!strcmp (temp->fname, dp->d_name))
+ {
+ isdup = 1;
+ break;
+ }
+ temp = temp->next;
+ }
+ }
+
+ if (isdup)
+ {
+ /* Found a duplicate, don't include it in dirent cache */
+ dp->d_ino = 0;
+ continue;
+ }
+
+ temp = (struct union_dir_cache *) malloc (sizeof (struct union_dir_cache));
+ temp->next = NULL;
+ temp->prev = NULL;
+
+
+ if (!dirp->head) /* Reading this dir first time. */
+ {
+ dirp->head = temp;
+ dirp->current = temp;
+ }
+ else
+ {
+ dirp->current->next = temp;
+ temp->prev = dirp->current;
+ dirp->current = dirp->current->next;
+ }
+
+ strcpy(dirp->current->fname, dp->d_name);
+
+ }
+ dirp->union_dir_status &= ~DIR_FIRST_DIRENT;
+
+ /* Skip deleted files. */
+ } while (dp->d_ino == 0 || dp->d_type == DT_WHT);
#ifndef NOT_IN_libc
__libc_lock_unlock (dirp->lock);
^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH 2/2] Union Mount: glibc seekdir support
2008-04-29 13:32 [RFC PATCH 0/2] Union Mount: Directory listing in glibc bsn.0007
2008-04-29 13:33 ` [RFC PATCH 1/2] Union Mount: glibc readdir support bsn.0007
@ 2008-04-29 13:34 ` bsn.0007
2008-04-29 15:21 ` [RFC PATCH 0/2] Union Mount: Directory listing in glibc hooanon05
` (2 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: bsn.0007 @ 2008-04-29 13:34 UTC (permalink / raw)
To: libc-alpha
Cc: Jan Blunck, Erez Zadok, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
seekdir support for union mounted directory.
seekdir, telldir and rewinddir now work on the dirent cache maintained
by readdir.
Signed-off-by: Nagabhushan B S <bsn.0007@gmail.com>
---
sysdeps/unix/dirstream.h | 9 +++++++--
sysdeps/unix/readdir.c | 21 ++++++++++++++++++---
sysdeps/unix/rewinddir.c | 16 ++++++++++++----
sysdeps/unix/seekdir.c | 26 ++++++++++++++++++++++----
sysdeps/unix/sysv/linux/i386/readdir64.c | 2 ++
sysdeps/unix/telldir.c | 5 ++++-
6 files changed, 65 insertions(+), 14 deletions(-)
--- a/sysdeps/unix/dirstream.h
+++ b/sysdeps/unix/dirstream.h
@@ -19,11 +19,15 @@
#ifndef _UNION_DIR_CACHE
#define _UNION_DIR_CACHE 1
-#include <limits.h>
+#include <dirent.h>
+
+#ifndef CACHE_DIRENT_TYPE
+#define CACHE_DIRENT_TYPE struct dirent
+#endif
struct union_dir_cache
{
- char fname[NAME_MAX];
+ CACHE_DIRENT_TYPE dp;
struct union_dir_cache *next;
struct union_dir_cache *prev;
};
@@ -66,6 +70,7 @@ struct __dirstream
struct union_dir_cache *head;
struct union_dir_cache *current;
unsigned char union_dir_status;
+ off_t union_dir_pos;
/* Directory block. */
char data[0] __attribute__ ((aligned (__alignof__ (void*))));
--- a/sysdeps/unix/readdir.c
+++ b/sysdeps/unix/readdir.c
@@ -48,6 +48,21 @@ __READDIR (DIR *dirp)
__libc_lock_lock (dirp->lock);
#endif
+ /* If union mounted directory, check if we can return dirent from cache */
+ if (dirp->head && dirp->current->next)
+ {
+ dirp->current = dirp->current->next;
+ while (dirp->current && dirp->current->dp.d_type == DT_WHT)
+ dirp->current = dirp->current->next;
+
+ if (dirp->current) {
+ dirp->union_dir_pos = dirp->current->dp.d_off;
+#ifndef NOT_IN_libc
+ __libc_lock_unlock (dirp->lock);
+#endif
+ return &dirp->current->dp;
+ }
+ }
do
{
size_t reclen;
@@ -135,7 +150,7 @@ __READDIR (DIR *dirp)
temp = dirp->head;
while (temp)
{
- if (!strcmp (temp->fname, dp->d_name))
+ if (!strcmp (temp->dp.d_name, dp->d_name))
{
isdup = 1;
break;
@@ -168,8 +183,8 @@ __READDIR (DIR *dirp)
dirp->current = dirp->current->next;
}
- strcpy(dirp->current->fname, dp->d_name);
-
+ memcpy(&dirp->current->dp, dp, sizeof(DIRENT_TYPE));
+ dirp->current->dp.d_off = ++dirp->union_dir_pos;
}
dirp->union_dir_status &= ~DIR_FIRST_DIRENT;
--- a/sysdeps/unix/rewinddir.c
+++ b/sysdeps/unix/rewinddir.c
@@ -29,9 +29,17 @@ rewinddir (dirp)
DIR *dirp;
{
__libc_lock_lock (dirp->lock);
- (void) __lseek (dirp->fd, (off_t) 0, SEEK_SET);
- dirp->filepos = 0;
- dirp->offset = 0;
- dirp->size = 0;
+ if (dirp->head)
+ {
+ dirp->current = dirp->head;
+ dirp->union_dir_pos = dirp->current->dp.d_off;
+ }
+ else
+ {
+ (void) __lseek (dirp->fd, (off_t) 0, SEEK_SET);
+ dirp->filepos = 0;
+ dirp->offset = 0;
+ dirp->size = 0;
+ }
__libc_lock_unlock (dirp->lock);
}
--- a/sysdeps/unix/seekdir.c
+++ b/sysdeps/unix/seekdir.c
@@ -30,9 +30,27 @@ seekdir (dirp, pos)
long int pos;
{
__libc_lock_lock (dirp->lock);
- (void) __lseek (dirp->fd, pos, SEEK_SET);
- dirp->size = 0;
- dirp->offset = 0;
- dirp->filepos = pos;
+ if (dirp->head) /* union mounted directory */
+ {
+ if (pos == 0)
+ dirp->current = dirp->head;
+ else
+ {
+ dirp->current = dirp->head;
+ while (dirp->current->next && pos != 1)
+ {
+ dirp->current = dirp->current->next;
+ pos--;
+ }
+ }
+ dirp->union_dir_pos = dirp->current->dp.d_off;
+ }
+ else
+ {
+ (void) __lseek (dirp->fd, pos, SEEK_SET);
+ dirp->size = 0;
+ dirp->offset = 0;
+ dirp->filepos = pos;
+ }
__libc_lock_unlock (dirp->lock);
}
--- a/sysdeps/unix/sysv/linux/i386/readdir64.c
+++ b/sysdeps/unix/sysv/linux/i386/readdir64.c
@@ -16,6 +16,7 @@
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA. */
+#define CACHE_DIRENT_TYPE struct dirent64
#define __READDIR __readdir64
#define __GETDENTS __getdents64
#define DIRENT_TYPE struct dirent64
@@ -34,6 +35,7 @@ versioned_symbol (libc, __readdir64, rea
#include <sysdeps/unix/sysv/linux/i386/olddirent.h>
+#define CACHE_DIRENT_TYPE struct __old_dirent64
#define __READDIR attribute_compat_text_section __old_readdir64
#define __GETDENTS __old_getdents64
#define DIRENT_TYPE struct __old_dirent64
--- a/sysdeps/unix/telldir.c
+++ b/sysdeps/unix/telldir.c
@@ -24,5 +24,8 @@
long int
telldir (DIR *dirp)
{
- return dirp->filepos;
+ if (dirp->head) /* union mounted directory */
+ return dirp->union_dir_pos;
+ else
+ return dirp->filepos;
}
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 13:32 [RFC PATCH 0/2] Union Mount: Directory listing in glibc bsn.0007
2008-04-29 13:33 ` [RFC PATCH 1/2] Union Mount: glibc readdir support bsn.0007
2008-04-29 13:34 ` [RFC PATCH 2/2] Union Mount: glibc seekdir support bsn.0007
@ 2008-04-29 15:21 ` hooanon05
2008-04-29 16:12 ` Jan Blunck
2008-04-29 15:49 ` Erez Zadok
2008-04-29 16:04 ` Jan Blunck
4 siblings, 1 reply; 19+ messages in thread
From: hooanon05 @ 2008-04-29 15:21 UTC (permalink / raw)
To: bsn.0007
Cc: libc-alpha, Jan Blunck, Erez Zadok, linux-kernel, linux-fsdevel,
viro, Christoph Hellwig, Ulrich Drepper, Mingming Cao,
Dave Hansen, Trond Myklebust, bharata, David Woodhouse
Hello Nagabhushan,
bsn.0007@gmail.com:
> I went through Bharata's RFC post on glibc based Union Mount readdir solution
> (http://lkml.org/lkml/2008/3/11/34) and have come up with patches
> against glibc to implement the same.
:::
While I don't have objection against the implementation in userspace,
what will UnionMount handle about rmdir or rename dir?
Those systemcalls need to test whether the dir is *logically* empty or
not in kernel space, don't they?
And I am afraid that UnionMount has to implement the similar thing, but
it never mean to modify glibc is a bad idea.
Junjiro Okajima
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 13:32 [RFC PATCH 0/2] Union Mount: Directory listing in glibc bsn.0007
` (2 preceding siblings ...)
2008-04-29 15:21 ` [RFC PATCH 0/2] Union Mount: Directory listing in glibc hooanon05
@ 2008-04-29 15:49 ` Erez Zadok
2008-04-29 16:16 ` Jan Blunck
2008-04-30 6:07 ` NAGABHUSHAN BS
2008-04-29 16:04 ` Jan Blunck
4 siblings, 2 replies; 19+ messages in thread
From: Erez Zadok @ 2008-04-29 15:49 UTC (permalink / raw)
To: bsn.0007
Cc: libc-alpha, Jan Blunck, Erez Zadok, linux-kernel, linux-fsdevel,
viro, Christoph Hellwig, Ulrich Drepper, Mingming Cao,
Dave Hansen, Trond Myklebust, bharata, David Woodhouse
In message <20080429133201.GA9938@localhost.localdomain>, bsn.0007@gmail.com writes:
> Hi,
>
> I went through Bharata's RFC post on glibc based Union Mount readdir solution
> (http://lkml.org/lkml/2008/3/11/34) and have come up with patches
> against glibc to implement the same.
[...]
The last set of discussions on glibc support ended, as I understood it, with
the glibc people objecting to such "special-purpose" code in glibc. See
<http://lkml.org/lkml/2008/3/11/66>. So has anything changed behind the
scenes, or is this idea unlikely to be merged into glibc any time soon, if
ever. (Personally I'd love to rip out the readdir-related code from unionfs
if glibc supported the same.)
> Patch 1. readdir support for union mounted directories.
> I am caching the dirent names in a list to aid duplicate elimination.
> And this cache is stored in DIRP. For duplicate elimination I am using
> strcmp(). I am not sure if this works universally with different types
> of filesystems. Any suggestions here would be welcome.
[...]
You might consider using a hash table instead of a list; it'll be faster in
case where there are a lot of whiteouts/duplicates to process.
Also, I'll reiterate my previous concern that I think you may need to also
handle "opaque directories". See the discussion in section 5.1 "Creation
and deletion of whiteouts", in the original union mounts paper:
<http://www.usenix.org/publications/library/proceedings/neworl/full_papers/mckusick.a>
Cheers,
Erez.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 13:32 [RFC PATCH 0/2] Union Mount: Directory listing in glibc bsn.0007
` (3 preceding siblings ...)
2008-04-29 15:49 ` Erez Zadok
@ 2008-04-29 16:04 ` Jan Blunck
2008-04-30 6:18 ` NAGABHUSHAN BS
4 siblings, 1 reply; 19+ messages in thread
From: Jan Blunck @ 2008-04-29 16:04 UTC (permalink / raw)
To: bsn.0007
Cc: libc-alpha, Erez Zadok, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
On Tue, Apr 29, bsn.0007@gmail.com wrote:
> The RFC discussed about the information glibc readdir needs to get about
> union mounted directories and I have assumed the following information
> to be available from the kernel for this implementation.
>
> - Kernel would return all the dirents (including duplicates and whiteouts)
> starting from the topmost directory of the union.
>
> - Indication that this directory is a union mounted directory
> I have assumed that kernel would return a "." whiteout as the first
> directory entry of the union. This would tell glibc readdir(3) that it is
> working with a union mounted directory and it needs to do duplicate
> elimination and whiteout suppression. It starts building a dirent cache
> for this purpose.
IIRC the intention was to emit a "." whiteout when "changing" from one
directory to the next. That means when the first directory is completely read
the whiteout is emitted. After that glibc knows to start duplicate
removal.
> Ulrich had suggested that we could use the fstat call to recognize union
> mounts. But looking at the stat structure from stat(2), it was not obvious
> as to which field in there could be used for this purpose. Hence for this
> prototype implementation, I decided to go with what Al Viro suggested, which
> is about using a "." whiteout.
>
> - Indication that kernel is done with returning entries from the topmost
> directory.
> I have assumed that kernel would return a "." whiteout at the beginning
> of each directory of the union. So when glibc gets a 2nd "." whiteout, it
> will start performing duplicate elimination.
See above
> - Whiteout indication
> glibc will depend on dirent->d_type to be set to DT_WHT on a whiteout
> file.
Which makes the new filetype very much visible to the userspace but maybe that
is the price we have to pay.
> Compatibility issues
> --------------------
> There are many versions of dirent structure in glibc and I have tried my
> best to take care of compatibility issues. But I have not really tested
> readdir64 or old_readdir64. Also atleast one version of dirent structure
> doesn't have d_type field and my whiteout suppression logic depends on it
> and uses it in the generic __READDIR routine which gets used by various
> version of readdir and I think this would break that readdir version which
> uses dirent structure w/o d_type. I will be taking care of such compatibility
> issues more cleanly/thoroughly in subsequent posts.
We don't support union mounts on older kernels. Newer kernels return
d_type. So I think we don't have a problem.
Regards,
Jan
--
Jan Blunck <jblunck@suse.de>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 15:21 ` [RFC PATCH 0/2] Union Mount: Directory listing in glibc hooanon05
@ 2008-04-29 16:12 ` Jan Blunck
0 siblings, 0 replies; 19+ messages in thread
From: Jan Blunck @ 2008-04-29 16:12 UTC (permalink / raw)
To: hooanon05
Cc: bsn.0007, libc-alpha, Erez Zadok, linux-kernel, linux-fsdevel,
viro, Christoph Hellwig, Ulrich Drepper, Mingming Cao,
Dave Hansen, Trond Myklebust, bharata, David Woodhouse
On Wed, Apr 30, hooanon05@yahoo.co.jp wrote:
>
> Hello Nagabhushan,
>
> bsn.0007@gmail.com:
> > I went through Bharata's RFC post on glibc based Union Mount readdir solution
> > (http://lkml.org/lkml/2008/3/11/34) and have come up with patches
> > against glibc to implement the same.
> :::
>
> While I don't have objection against the implementation in userspace,
> what will UnionMount handle about rmdir or rename dir?
> Those systemcalls need to test whether the dir is *logically* empty or
> not in kernel space, don't they?
> And I am afraid that UnionMount has to implement the similar thing, but
> it never mean to modify glibc is a bad idea.
For rmdir it is simple: the filesystem that supports whiteouts must know how
to get rid of them again. Since it knows how the whiteouts are implemented it
can do that in an optimized fashion.
The rename story is somehow different. A union directory consists of multiple
directories on different filesystem. Since rename syscall is only working on
one filesystem the rename is crossing devices. Therefore I return -EXDEV. Not
very efficient but really simple. At least this is how my patches implement it.
Regards,
Jan
--
Jan Blunck <jblunck@suse.de>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 15:49 ` Erez Zadok
@ 2008-04-29 16:16 ` Jan Blunck
2008-04-30 6:07 ` NAGABHUSHAN BS
1 sibling, 0 replies; 19+ messages in thread
From: Jan Blunck @ 2008-04-29 16:16 UTC (permalink / raw)
To: Erez Zadok
Cc: bsn.0007, libc-alpha, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
On Tue, Apr 29, Erez Zadok wrote:
> Also, I'll reiterate my previous concern that I think you may need to also
> handle "opaque directories". See the discussion in section 5.1 "Creation
> and deletion of whiteouts", in the original union mounts paper:
Hmm, if kernel is traversing to the next layer in an union and it finds it to
be opaque the kernel knows that this is the "end" of the union. So after
reading that directory it is done.
Regards,
Jan
--
Jan Blunck <jblunck@suse.de>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 15:49 ` Erez Zadok
2008-04-29 16:16 ` Jan Blunck
@ 2008-04-30 6:07 ` NAGABHUSHAN BS
2008-04-30 16:35 ` Erez Zadok
1 sibling, 1 reply; 19+ messages in thread
From: NAGABHUSHAN BS @ 2008-04-30 6:07 UTC (permalink / raw)
To: Erez Zadok
Cc: libc-alpha, Jan Blunck, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
On Tue, Apr 29, 2008 at 9:19 PM, Erez Zadok <ezk@cs.sunysb.edu> wrote:
> In message <20080429133201.GA9938@localhost.localdomain>, bsn.0007@gmail.com writes:
> > Hi,
> >
> > I went through Bharata's RFC post on glibc based Union Mount readdir solution
> > (http://lkml.org/lkml/2008/3/11/34) and have come up with patches
> > against glibc to implement the same.
> [...]
>
> The last set of discussions on glibc support ended, as I understood it, with
> the glibc people objecting to such "special-purpose" code in glibc. See
> <http://lkml.org/lkml/2008/3/11/66>. So has anything changed behind the
> scenes, or is this idea unlikely to be merged into glibc any time soon, if
> ever. (Personally I'd love to rip out the readdir-related code from unionfs
> if glibc supported the same.)
>
>
> > Patch 1. readdir support for union mounted directories.
> > I am caching the dirent names in a list to aid duplicate elimination.
> > And this cache is stored in DIRP. For duplicate elimination I am using
> > strcmp(). I am not sure if this works universally with different types
> > of filesystems. Any suggestions here would be welcome.
> [...]
>
> You might consider using a hash table instead of a list; it'll be faster in
> case where there are a lot of whiteouts/duplicates to process.
>
Yes, hash table was one of the things i had considered to use as a
cache. But i am not completely sure how this would support seekdir.
Any suggestions are greatly welcome.
Regards Nagabhushan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-29 16:04 ` Jan Blunck
@ 2008-04-30 6:18 ` NAGABHUSHAN BS
0 siblings, 0 replies; 19+ messages in thread
From: NAGABHUSHAN BS @ 2008-04-30 6:18 UTC (permalink / raw)
To: Jan Blunck
Cc: libc-alpha, Erez Zadok, linux-kernel, linux-fsdevel, viro,
Christoph Hellwig, Ulrich Drepper, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
On Tue, Apr 29, 2008 at 9:34 PM, Jan Blunck <jblunck@suse.de> wrote:
> On Tue, Apr 29, bsn.0007@gmail.com wrote:
>
> > The RFC discussed about the information glibc readdir needs to get about
> > union mounted directories and I have assumed the following information
> > to be available from the kernel for this implementation.
> >
> > - Kernel would return all the dirents (including duplicates and whiteouts)
> > starting from the topmost directory of the union.
> >
> > - Indication that this directory is a union mounted directory
> > I have assumed that kernel would return a "." whiteout as the first
> > directory entry of the union. This would tell glibc readdir(3) that it is
> > working with a union mounted directory and it needs to do duplicate
> > elimination and whiteout suppression. It starts building a dirent cache
> > for this purpose.
>
> IIRC the intention was to emit a "." whiteout when "changing" from one
> directory to the next. That means when the first directory is completely read
> the whiteout is emitted. After that glibc knows to start duplicate
> removal.
>
Yes the intention was to get a "." whiteout when changing from one
directory to next and thats essentially what i have assumed, for
starting duplicate elimination.
Along with that i have also assumed to get a "." whiteout as the first
directory entry of the union, so as to indicate to glibc that the
directory is a union mounted directory.
Regards
Nagabhushan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-30 6:07 ` NAGABHUSHAN BS
@ 2008-04-30 16:35 ` Erez Zadok
2008-05-01 1:39 ` hooanon05
0 siblings, 1 reply; 19+ messages in thread
From: Erez Zadok @ 2008-04-30 16:35 UTC (permalink / raw)
To: NAGABHUSHAN BS
Cc: Erez Zadok, libc-alpha, Jan Blunck, linux-kernel, linux-fsdevel,
viro, Christoph Hellwig, Ulrich Drepper, Mingming Cao,
Dave Hansen, Trond Myklebust, bharata, David Woodhouse
In message <c810d5090804292307y2884b169t5c27b50fb7c94c30@mail.gmail.com>, "NAGABHUSHAN BS" writes:
> On Tue, Apr 29, 2008 at 9:19 PM, Erez Zadok <ezk@cs.sunysb.edu> wrote:
> > In message <20080429133201.GA9938@localhost.localdomain>, bsn.0007@gmail.com writes:
[...]
> > You might consider using a hash table instead of a list; it'll be
> > faster in case where there are a lot of whiteouts/duplicates to
> > process.
> >
>
> Yes, hash table was one of the things i had considered to use as a
> cache. But i am not completely sure how this would support seekdir.
> Any suggestions are greatly welcome.
>
> Regards Nagabhushan
Here's one idea:
1. Your main data structure is a linear array of struct dirents. You
perform seekdir/readdir on this linear structure, which should be
trivial. Each time you find a dirent which definitely needs to go into
this structure, you append to it. If you're about to run out of space in
it, you can realloc it 2x its previous size (a common technique). This
structure remains cached in memory for as long as the directory is open.
Optimization 1: you can keep it cached past close(fd), b/c recursive
programs like /bin/find may close and reopen a directory several times.
But you'll need to determine if the directory's mtime had changed and if
so, purge the cached content and reconstruct it.
Optimization 2: if you can do a stat(2) on each directory in the union,
and sum their total size, that'll provide you with an upper bound on the
size of your dirent cache. In that case, you can forego the realloc I
mentioned, and just malloc one large array at once. This could be more
efficient than realloc'ing for two reasons: (a) realloc may have to
memcpy data, which you can avoid; and (b) the rest of the malloc'ed
array, near its end, may be unused, but as long as you don't touch it,
it'll be one large-ish extent of virtual memory for which physical page
frames won't be actually needed. Even better, you can realloc() that
large malloc'ed area to cut back on its size to exactly what you need.
2. During the construction of the dirent array above, you keep two hash
tables:
2a. HT1 is used for duplicate elimination. It contains POINTERS ONLY (or
offsets) into the dirent cache buffer. You add entries into it each
time you append a name to the dirent cache array. This HT1 allows you
to quickly find out if a string name had been seen before. When you're
done constructing the dirent cache, free(HT1).
2b. HT2 is used for whiteouts. It contains the actual strings of whited-out
entries you've seen, which you add each time you find a whiteout. You
look this HT2 each time you need to determine if an entry you're about
to add to the dirent cache had been whited-out or not. When you're done
constructing the dirent cache, free(HT2).
Hope you can use this as a starting point.
Cheers,
Erez.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/2] Union Mount: Directory listing in glibc
2008-04-30 16:35 ` Erez Zadok
@ 2008-05-01 1:39 ` hooanon05
0 siblings, 0 replies; 19+ messages in thread
From: hooanon05 @ 2008-05-01 1:39 UTC (permalink / raw)
To: Erez Zadok
Cc: NAGABHUSHAN BS, libc-alpha, Jan Blunck, linux-kernel,
linux-fsdevel, viro, Christoph Hellwig, Ulrich Drepper,
Mingming Cao, Dave Hansen, Trond Myklebust, bharata,
David Woodhouse
Erez Zadok:
> Here's one idea:
>
> 1. Your main data structure is a linear array of struct dirents. You
:::
All these points are very similar to the actual implementation in AUFS,
except 'Optimization 2: decide memory size by stat().'
- the size may not be enough for storing every dirent, it depends upon
the filesystem's implementation.
- the size may be a block size which can be larger than page size even
if it is a empty dir, also it depends upon the filesystem.
Junjiro Okajima
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-04-29 13:33 ` [RFC PATCH 1/2] Union Mount: glibc readdir support bsn.0007
@ 2008-05-01 6:08 ` Ulrich Drepper
2008-05-06 4:21 ` Bharata B Rao
0 siblings, 1 reply; 19+ messages in thread
From: Ulrich Drepper @ 2008-05-01 6:08 UTC (permalink / raw)
To: bsn.0007
Cc: libc-alpha, Jan Blunck, Erez Zadok, linux-kernel, linux-fsdevel,
viro, Christoph Hellwig, Mingming Cao, Dave Hansen,
Trond Myklebust, bharata, David Woodhouse
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Before anything further is discussed I need to know how you expect to
handle NFS? I don't see how, with a userlevel readdir implementation,
you can support NFS.
- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
iEYEARECAAYFAkgZXmkACgkQ2ijCOnn/RHSFfwCgj/8c4C267owRAcKw//OF7z5E
kgkAn0chMraZqjmzFcWsQVgN4l4ZrD/X
=VXir
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-05-01 6:08 ` Ulrich Drepper
@ 2008-05-06 4:21 ` Bharata B Rao
2008-05-06 5:46 ` hooanon05
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Bharata B Rao @ 2008-05-06 4:21 UTC (permalink / raw)
To: Ulrich Drepper
Cc: bsn.0007, libc-alpha, Jan Blunck, Erez Zadok, linux-kernel,
linux-fsdevel, viro, Christoph Hellwig, Mingming Cao, Dave Hansen,
Trond Myklebust, David Woodhouse
On Wed, Apr 30, 2008 at 11:08:41PM -0700, Ulrich Drepper wrote:
>
> Before anything further is discussed I need to know how you expect to
> handle NFS? I don't see how, with a userlevel readdir implementation,
> you can support NFS.
At the client side, I don't see a problem with unioning NFS with other
FS. Kernel would still return all dirents of the union including those
from NFS and glibc readdir would be able to handle them appropriately.
Or am I missing something ?
At the server side, I can't see how NFS server could export a union.
I don't see how this could be sanely done with Union Mount. Erez, how
does Unionfs handle this ?
Is dis-allowing NFS-exporting of unions an option ?
Regards,
Bharata.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-05-06 4:21 ` Bharata B Rao
@ 2008-05-06 5:46 ` hooanon05
2008-05-06 13:10 ` David Newall
2008-05-09 20:05 ` Erez Zadok
2 siblings, 0 replies; 19+ messages in thread
From: hooanon05 @ 2008-05-06 5:46 UTC (permalink / raw)
To: bharata
Cc: Ulrich Drepper, bsn.0007, libc-alpha, Jan Blunck, Erez Zadok,
linux-kernel, linux-fsdevel, viro, Christoph Hellwig,
Mingming Cao, Dave Hansen, Trond Myklebust, David Woodhouse
Bharata B Rao:
> At the server side, I can't see how NFS server could export a union.
> I don't see how this could be sanely done with Union Mount. Erez, how
> does Unionfs handle this ?
>
> Is dis-allowing NFS-exporting of unions an option ?
As a note, there were many requests for aufs to support NFS-exporting.
And it was already implemented. The encode/decode_fh functions in aufs
behave like sub-nfsd.
Junjiro Okajima
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-05-06 4:21 ` Bharata B Rao
2008-05-06 5:46 ` hooanon05
@ 2008-05-06 13:10 ` David Newall
2008-05-12 3:43 ` Bharata B Rao
2008-05-09 20:05 ` Erez Zadok
2 siblings, 1 reply; 19+ messages in thread
From: David Newall @ 2008-05-06 13:10 UTC (permalink / raw)
To: bharata
Cc: Ulrich Drepper, bsn.0007, libc-alpha, Jan Blunck, Erez Zadok,
linux-kernel, linux-fsdevel, viro, Christoph Hellwig,
Mingming Cao, Dave Hansen, Trond Myklebust, David Woodhouse
Bharata B Rao wrote:
> Is dis-allowing NFS-exporting of unions an option ?
Is it that simple? What about a union further in an NFS export? What
about union-mounting after the tree has been NFS-exported? If the
kernel won't fully merge unions then FS servers have to do whatever it
is that's being proposed for glibc.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-05-06 4:21 ` Bharata B Rao
2008-05-06 5:46 ` hooanon05
2008-05-06 13:10 ` David Newall
@ 2008-05-09 20:05 ` Erez Zadok
2 siblings, 0 replies; 19+ messages in thread
From: Erez Zadok @ 2008-05-09 20:05 UTC (permalink / raw)
To: bharata
Cc: Ulrich Drepper, bsn.0007, libc-alpha, Jan Blunck, Erez Zadok,
linux-kernel, linux-fsdevel, viro, Christoph Hellwig,
Mingming Cao, Dave Hansen, Trond Myklebust, David Woodhouse
In message <20080506042117.GA29298@in.ibm.com>, Bharata B Rao writes:
> On Wed, Apr 30, 2008 at 11:08:41PM -0700, Ulrich Drepper wrote:
> >
> > Before anything further is discussed I need to know how you expect to
> > handle NFS? I don't see how, with a userlevel readdir implementation,
> > you can support NFS.
>
> At the client side, I don't see a problem with unioning NFS with other
> FS. Kernel would still return all dirents of the union including those
> from NFS and glibc readdir would be able to handle them appropriately.
> Or am I missing something ?
>
> At the server side, I can't see how NFS server could export a union.
> I don't see how this could be sanely done with Union Mount. Erez, how
> does Unionfs handle this ?
[...]
The ODF version of unionfs stores, among other things, persistent inode
numbers in a small/special /odf partition. If the inums aren't persistent,
they could get flushed out of memory, making it very difficult to
reconstruct the inode w/ the same inum later on.
Erez.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-05-06 13:10 ` David Newall
@ 2008-05-12 3:43 ` Bharata B Rao
2008-05-12 3:49 ` Erez Zadok
0 siblings, 1 reply; 19+ messages in thread
From: Bharata B Rao @ 2008-05-12 3:43 UTC (permalink / raw)
To: David Newall
Cc: Ulrich Drepper, bsn.0007, libc-alpha, Jan Blunck, Erez Zadok,
linux-kernel, linux-fsdevel, viro, Christoph Hellwig,
Mingming Cao, Dave Hansen, Trond Myklebust, David Woodhouse
On Tue, May 06, 2008 at 10:40:28PM +0930, David Newall wrote:
> Bharata B Rao wrote:
> > Is dis-allowing NFS-exporting of unions an option ?
>
> Is it that simple? What about a union further in an NFS export? What
> about union-mounting after the tree has been NFS-exported?
Ok I am realizing that it is not simple. I am actually reading up NFS
code to understand how it supports crossing of mountpoints. I see that
one can mount a filesystem within a NFS export and NFS server would be
able to cross over to that filesystem and provide those contents to the
client. Acutally Union Mount maintains union stack very similar to the
mount stack, but here we walk the union stack down. So I am checking if
something similar to mountpoint crossing would be able to solve the
problem of fetching all the contents of the NFS-exported union.
Answering my question to Erez that I raised in another thread, I see
that Unionfs doesn't set the export_operations in it's superblock and
hence I believe unions created by Unionfs aren't NFS-exportable as of
know. Erez, correct me if I got it wrong.
Regards,
Bharata.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 1/2] Union Mount: glibc readdir support
2008-05-12 3:43 ` Bharata B Rao
@ 2008-05-12 3:49 ` Erez Zadok
0 siblings, 0 replies; 19+ messages in thread
From: Erez Zadok @ 2008-05-12 3:49 UTC (permalink / raw)
To: bharata
Cc: David Newall, Ulrich Drepper, bsn.0007, libc-alpha, Jan Blunck,
Erez Zadok, linux-kernel, linux-fsdevel, viro, Christoph Hellwig,
Mingming Cao, Dave Hansen, Trond Myklebust, David Woodhouse
In message <20080512034341.GA10541@in.ibm.com>, Bharata B Rao writes:
> Answering my question to Erez that I raised in another thread, I see
> that Unionfs doesn't set the export_operations in it's superblock and
> hence I believe unions created by Unionfs aren't NFS-exportable as of
> know. Erez, correct me if I got it wrong.
Yes, the unionfs in -mm isn't nfs exportable safely, b/c it doesn't have
persistent inode numbers. Our unionfs-odf version has persistent inode
numbers and is exportable (see http://unionfs.filesystems.org/).
> Regards,
> Bharata.
Erez.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-05-12 3:57 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-29 13:32 [RFC PATCH 0/2] Union Mount: Directory listing in glibc bsn.0007
2008-04-29 13:33 ` [RFC PATCH 1/2] Union Mount: glibc readdir support bsn.0007
2008-05-01 6:08 ` Ulrich Drepper
2008-05-06 4:21 ` Bharata B Rao
2008-05-06 5:46 ` hooanon05
2008-05-06 13:10 ` David Newall
2008-05-12 3:43 ` Bharata B Rao
2008-05-12 3:49 ` Erez Zadok
2008-05-09 20:05 ` Erez Zadok
2008-04-29 13:34 ` [RFC PATCH 2/2] Union Mount: glibc seekdir support bsn.0007
2008-04-29 15:21 ` [RFC PATCH 0/2] Union Mount: Directory listing in glibc hooanon05
2008-04-29 16:12 ` Jan Blunck
2008-04-29 15:49 ` Erez Zadok
2008-04-29 16:16 ` Jan Blunck
2008-04-30 6:07 ` NAGABHUSHAN BS
2008-04-30 16:35 ` Erez Zadok
2008-05-01 1:39 ` hooanon05
2008-04-29 16:04 ` Jan Blunck
2008-04-30 6:18 ` NAGABHUSHAN BS
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).