public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* UBIFS seekdir()/telldir() issue
@ 2008-05-26 13:53 Artem Bityutskiy
  2008-05-28 15:45 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Artem Bityutskiy @ 2008-05-26 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hunter Adrian, David Woodhouse, Linux Kernel Mailing List

The problem
~~~~~~~~~~~

telldir() returns an opaque 32-bit value which should be enough
to seek to the position later and continue readdir()-ing.

UBIFS stores direntries in a B-tree with 64-bit keys like:
| Parent Ino (32-bit) | key type (3 bits) | name hash (29 bits) |

The name hash may have collisions. Obviously, this does not
quite fit to the telldir()/seekdir() model.

Note, this is very similar to reiserfs and reiser4 approach.

Current UBIFS implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At the moment UBIFS stores the hash value in file->f_pos, so
telldir() basically returns direntry name hash values. And this
does not quite work in case of hash collisions - seekdir() may
not always seek correctly, which is rare though.

However, consecutive readdir() calls work fine because we store
full name in file->private_data which helps to deal with
collisions.

Naturally, readdir() returns direntries in hash order.

It seems there are not much applications which depend on telldir()
and seekdir(), so this does not look too bad, at least in embedded
world. But this means NFS will not really work because it needs
"absolute" seekdir() implementation.

Possible approach
~~~~~~~~~~~~~~~~~

Short investigation showed that other file-systems have separate
trees which are needed just for the sake of seekdir(). Well, it is
sad that FSes have to do this just because of not very good
interface, but such is live.

In UBIFS we are very reluctant to maintain any additional on-flash
tree which would emulate traditional Unix "directory is an array
of direntries" view. It is just too much overhead for embedded
systems. And it would add more complexity to an already complex
file-system.

Instead, we would like to do the following.

1. Assign a 32-bit unique id for each direntry. Each directory
inode has its own space of the ids.
2. readdir() stores the id of the next direntry to return in
file->f_pos, which means telldir() will return the unique ids,
not hashes as it is currently done.
3. However, readdir() still walks direntries in hash, and
it stores the key and full name of the next entry in
file->private_data. This makes consecutive readdir() calls
work correctly and efficiently.
4. When seeking, UBIFS havs to scan the main B-tree and
look up direntry with the required ID. This is not efficient.

This approach gives full seekdir() support, except of one case
described below.

Problems with this approach would be:
1. Slow seeks. But we think that actually rare applications do
seekdir(), at least in embedded world. Comments?

2. If the direntry we are trying to seek to was _deleted_, we'll be
unable to gracefully start from the closest next one. This is simply
because readdir() returns entries in hash order, not in id order.
The best UBIFS would be able to do in this case is to seek to the
beginning of the directory.

IOW, sequence like this would be problematic:

dir = opendir();
pos = telldir(dir);
dent = readdri(dir);
unlink(dent->d_name);
closedir(dir);
dir = opendir();
seekdir(pos); // Here we do not know where to seek

But we think this is also quite rare use-case and it should not
hurt to seek to the beginning in these cases. Comments?

Thanks
 
-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: UBIFS seekdir()/telldir() issue
  2008-05-26 13:53 UBIFS seekdir()/telldir() issue Artem Bityutskiy
@ 2008-05-28 15:45 ` Jan Kara
  2008-05-30 10:57   ` Adrian Hunter
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2008-05-28 15:45 UTC (permalink / raw)
  To: Artem Bityutskiy
  Cc: Christoph Hellwig, Hunter Adrian, David Woodhouse,
	Linux Kernel Mailing List

  Hi,

> The problem
> ~~~~~~~~~~~
> 
> telldir() returns an opaque 32-bit value which should be enough
> to seek to the position later and continue readdir()-ing.
> 
> UBIFS stores direntries in a B-tree with 64-bit keys like:
> | Parent Ino (32-bit) | key type (3 bits) | name hash (29 bits) |
> 
> The name hash may have collisions. Obviously, this does not
> quite fit to the telldir()/seekdir() model.
> 
> Note, this is very similar to reiserfs and reiser4 approach.
> 
> Current UBIFS implementation
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> At the moment UBIFS stores the hash value in file->f_pos, so
> telldir() basically returns direntry name hash values. And this
> does not quite work in case of hash collisions - seekdir() may
> not always seek correctly, which is rare though.
  Yes, this is a classical problem of seekdir implementations :).

> However, consecutive readdir() calls work fine because we store
> full name in file->private_data which helps to deal with
> collisions.
> 
> Naturally, readdir() returns direntries in hash order.
> 
> It seems there are not much applications which depend on telldir()
> and seekdir(), so this does not look too bad, at least in embedded
> world. But this means NFS will not really work because it needs
> "absolute" seekdir() implementation.
> 
> Possible approach
> ~~~~~~~~~~~~~~~~~
> 
> Short investigation showed that other file-systems have separate
> trees which are needed just for the sake of seekdir(). Well, it is
> sad that FSes have to do this just because of not very good
> interface, but such is live.
> 
> In UBIFS we are very reluctant to maintain any additional on-flash
> tree which would emulate traditional Unix "directory is an array
> of direntries" view. It is just too much overhead for embedded
> systems. And it would add more complexity to an already complex
> file-system.
> 
> Instead, we would like to do the following.
> 
> 1. Assign a 32-bit unique id for each direntry. Each directory
> inode has its own space of the ids.
> 2. readdir() stores the id of the next direntry to return in
> file->f_pos, which means telldir() will return the unique ids,
> not hashes as it is currently done.
> 3. However, readdir() still walks direntries in hash, and
> it stores the key and full name of the next entry in
> file->private_data. This makes consecutive readdir() calls
> work correctly and efficiently.
> 4. When seeking, UBIFS havs to scan the main B-tree and
> look up direntry with the required ID. This is not efficient.
> 
> This approach gives full seekdir() support, except of one case
> described below.
> 
> Problems with this approach would be:
> 1. Slow seeks. But we think that actually rare applications do
> seekdir(), at least in embedded world. Comments?
> 
> 2. If the direntry we are trying to seek to was _deleted_, we'll be
> unable to gracefully start from the closest next one. This is simply
> because readdir() returns entries in hash order, not in id order.
> The best UBIFS would be able to do in this case is to seek to the
> beginning of the directory.
> 
> IOW, sequence like this would be problematic:
> 
> dir = opendir();
> pos = telldir(dir);
> dent = readdri(dir);
> unlink(dent->d_name);
> closedir(dir);
> dir = opendir();
> seekdir(pos); // Here we do not know where to seek
> 
> But we think this is also quite rare use-case and it should not
> hurt to seek to the beginning in these cases. Comments?
  The sequence you write above is actually incorrect I think. Noone
guarantees that the cookie returned by telldir() is valid after
closedir(). What is a bigger (and quite common) problem is, if somebody
uses readdir/telldir/seekdir while someone else creates/deletes files in
the directory. The standard implies in this case that subsequent
readdir should return all the files which were not touched (or all files
after position set by seekdir if used...).


									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: UBIFS seekdir()/telldir() issue
  2008-05-28 15:45 ` Jan Kara
@ 2008-05-30 10:57   ` Adrian Hunter
  2008-06-02  8:14     ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Adrian Hunter @ 2008-05-30 10:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Artem Bityutskiy, Christoph Hellwig, David Woodhouse,
	Linux Kernel Mailing List

Jan Kara wrote:
>   The sequence you write above is actually incorrect I think. Noone
> guarantees that the cookie returned by telldir() is valid after
> closedir(). What is a bigger (and quite common) problem is, if somebody
> uses readdir/telldir/seekdir while someone else creates/deletes files in
> the directory. The standard implies in this case that subsequent
> readdir should return all the files which were not touched (or all files
> after position set by seekdir if used...).

Not according to this:

http://www.opengroup.org/onlinepubs/009695399/functions/readdir.html

	"If a file is removed from or added to the directory after the
	most recent call to opendir() or rewinddir(), whether a
	subsequent call to readdir() returns an entry for that file
	is unspecified."



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: UBIFS seekdir()/telldir() issue
  2008-05-30 10:57   ` Adrian Hunter
@ 2008-06-02  8:14     ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2008-06-02  8:14 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Artem Bityutskiy, Christoph Hellwig, David Woodhouse,
	Linux Kernel Mailing List

On Fri 30-05-08 13:57:45, Adrian Hunter wrote:
> Jan Kara wrote:
>>   The sequence you write above is actually incorrect I think. Noone
>> guarantees that the cookie returned by telldir() is valid after
>> closedir(). What is a bigger (and quite common) problem is, if somebody
>> uses readdir/telldir/seekdir while someone else creates/deletes files in
>> the directory. The standard implies in this case that subsequent
>> readdir should return all the files which were not touched (or all files
>> after position set by seekdir if used...).
>
> Not according to this:
>
> http://www.opengroup.org/onlinepubs/009695399/functions/readdir.html
>
> 	"If a file is removed from or added to the directory after the
> 	most recent call to opendir() or rewinddir(), whether a
> 	subsequent call to readdir() returns an entry for that file
> 	is unspecified."
  This is exactly what I meant. Maybe phrased it wrongly in my email :)

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-06-02  8:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-26 13:53 UBIFS seekdir()/telldir() issue Artem Bityutskiy
2008-05-28 15:45 ` Jan Kara
2008-05-30 10:57   ` Adrian Hunter
2008-06-02  8:14     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox