All of lore.kernel.org
 help / color / mirror / Atom feed
* FIFO files
@ 2003-06-17 16:21 Cyrille Chepelov
  2003-06-17 16:24 ` Carl-Daniel Hailfinger
  2003-06-17 22:42 ` Pierre Abbat
  0 siblings, 2 replies; 21+ messages in thread
From: Cyrille Chepelov @ 2003-06-17 16:21 UTC (permalink / raw)
  To: reiserfs-list

Hi all,

I recently encountered a need for such a contraption as file-backed FIFOs.
These are files where one process can append records at one end, and one
other process can read records from the beginning of the file, chopping off
the head of the file once it's not needed anymore. As I had to implement
something right now on existing systems, I worked around the general absence
for such a facility, but still, I'm wondering how difficult it would to
write a reiserfs4 plug-in which would allow one to do the equivalent of a
truncate() but truncating what's between the position 0 and the current
cursor (effectively shifting all subsequent file positions), rather than
truncating what's between the cursor and the end of file.
(network considerations are explicitly out of the scope of this musing)

	-- Cyrille

-- 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 16:21 FIFO files Cyrille Chepelov
@ 2003-06-17 16:24 ` Carl-Daniel Hailfinger
  2003-06-17 16:32   ` Hans Reiser
  2003-06-17 17:07   ` Cyrille Chepelov
  2003-06-17 22:42 ` Pierre Abbat
  1 sibling, 2 replies; 21+ messages in thread
From: Carl-Daniel Hailfinger @ 2003-06-17 16:24 UTC (permalink / raw)
  To: Cyrille Chepelov; +Cc: reiserfs-list

Hi Cyrille,

Cyrille Chepelov wrote:
> 
> I recently encountered a need for such a contraption as file-backed FIFOs.
> These are files where one process can append records at one end, and one
> other process can read records from the beginning of the file, chopping off
> the head of the file once it's not needed anymore. As I had to implement
> something right now on existing systems, I worked around the general absence

This facility has been present in linux for years. Just google for "sparse
files".

Carl-Daniel
-- 
http://www.hailfinger.org/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 16:24 ` Carl-Daniel Hailfinger
@ 2003-06-17 16:32   ` Hans Reiser
  2003-06-17 17:00     ` Carl-Daniel Hailfinger
  2003-06-17 17:07   ` Cyrille Chepelov
  1 sibling, 1 reply; 21+ messages in thread
From: Hans Reiser @ 2003-06-17 16:32 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger; +Cc: Cyrille Chepelov, reiserfs-list

Carl-Daniel Hailfinger wrote:

>Hi Cyrille,
>
>Cyrille Chepelov wrote:
>  
>
>>I recently encountered a need for such a contraption as file-backed FIFOs.
>>These are files where one process can append records at one end, and one
>>other process can read records from the beginning of the file, chopping off
>>the head of the file once it's not needed anymore. As I had to implement
>>something right now on existing systems, I worked around the general absence
>>    
>>
>
>This facility has been present in linux for years. Just google for "sparse
>files".
>
>Carl-Daniel
>  
>
No, he is resetting the location of the first byte of the file.

-- 
Hans



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 16:32   ` Hans Reiser
@ 2003-06-17 17:00     ` Carl-Daniel Hailfinger
  2003-06-18  6:17       ` Oleg Drokin
  0 siblings, 1 reply; 21+ messages in thread
From: Carl-Daniel Hailfinger @ 2003-06-17 17:00 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Cyrille Chepelov, reiserfs-list

Hans Reiser wrote:
> Carl-Daniel Hailfinger wrote:
>> Cyrille Chepelov wrote:
>> 
>>> I recently encountered a need for such a contraption as file-backed
>>>  FIFOs. These are files where one process can append records at one
>>> end, and one other process can read records from the beginning of
>>> the file, chopping off the head of the file once it's not needed
>>> anymore. As I had to implement something right now on existing
>>> systems, I worked around the general absence
>> 
>> This facility has been present in linux for years. Just google for 
>> "sparse files".
> 
> No, he is resetting the location of the first byte of the file.

Yes, but if I understood Cyrille correctly, her/his(?) needs should be
completely satisfied by sparse files.
One process can append records at the end, and one other process can read
records from the beginning of the file, marking the read data empty. The
disk usage is the same as with Cyrille's solution, but the races are avoided.
The only reason I know for wanting to chop off the head of a file is
trying to keep the file small.


Carl-Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 16:24 ` Carl-Daniel Hailfinger
  2003-06-17 16:32   ` Hans Reiser
@ 2003-06-17 17:07   ` Cyrille Chepelov
  2003-06-17 17:39     ` Chris Dukes
  2003-06-17 19:30     ` Hans Reiser
  1 sibling, 2 replies; 21+ messages in thread
From: Cyrille Chepelov @ 2003-06-17 17:07 UTC (permalink / raw)
  To: reiserfs-list


Le Tue, Jun 17, 2003, à 06:24:19PM +0200, Carl-Daniel Hailfinger a écrit:

> This facility has been present in linux for years. Just google for "sparse
> files".

Yes sparse files would do. However, after a sizeable use of the facility,
you'd end up with a giant hole at the beginning, whatever queued data, and
EOF. 
What I want to do is the ability to just remove the hole in front (just as
when there is no point in having a hole at the end of the file, you just
truncate that hole out), so that seek(SEEK_BEGIN,0) positions me at the
beginning of useful data (or EOF if my queue is empty).

	-- Cyrille

-- 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 17:07   ` Cyrille Chepelov
@ 2003-06-17 17:39     ` Chris Dukes
  2003-06-17 17:54       ` Cyrille Chepelov
  2003-06-17 19:30     ` Hans Reiser
  1 sibling, 1 reply; 21+ messages in thread
From: Chris Dukes @ 2003-06-17 17:39 UTC (permalink / raw)
  To: Cyrille Chepelov; +Cc: reiserfs-list

On Tue, Jun 17, 2003 at 07:07:20PM +0200, Cyrille Chepelov wrote:
> 
> Le Tue, Jun 17, 2003, à 06:24:19PM +0200, Carl-Daniel Hailfinger a écrit:
> 
> > This facility has been present in linux for years. Just google for "sparse
> > files".
> 
> Yes sparse files would do. However, after a sizeable use of the facility,
> you'd end up with a giant hole at the beginning, whatever queued data, and
> EOF. 
> What I want to do is the ability to just remove the hole in front (just as
> when there is no point in having a hole at the end of the file, you just
> truncate that hole out), so that seek(SEEK_BEGIN,0) positions me at the
> beginning of useful data (or EOF if my queue is empty).

A few thoughts.
1) Does it really need to be a single file?  Could a directory of files
for each "transaction" suffice?
2) Can the buffer be bounded?  If so, why not use a classic ring buffer?
3) If it can't be bounded, must it be simple file operations, or could
you use something like sleepy cat's db4 or mysql?

Come to think of it, if sparse files are used you could have a classic ring
buffer that is effectively unbounded.

-- 
Chris Dukes
I tried being reasonable once--I didn't like it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 17:39     ` Chris Dukes
@ 2003-06-17 17:54       ` Cyrille Chepelov
  0 siblings, 0 replies; 21+ messages in thread
From: Cyrille Chepelov @ 2003-06-17 17:54 UTC (permalink / raw)
  To: reiserfs-list

Le Tue, Jun 17, 2003, à 06:39:46PM +0100, Chris Dukes a écrit:

> A few thoughts.
> 1) Does it really need to be a single file?  Could a directory of files
> for each "transaction" suffice?

In theory, yes, in practice we got bitten going that road (especially since
while I'm wondering whether head-truncate could be added as a plug-in on
reiserfs4, the application is really running on <dirty>NTFS</dirty>).

> 2) Can the buffer be bounded?  If so, why not use a classic ring buffer?
	no				

> 3) If it can't be bounded, must it be simple file operations, or could
> you use something like sleepy cat's db4 or mysql?

Yes it must be simple file operations, so no a database engine is out of
question (DB4 might be simple enough, but sounds like massive overkill).

> Come to think of it, if sparse files are used you could have a classic ring
> buffer that is effectively unbounded.

True, as long as we are 100% sure the offsets aren't hitting some
implementation limit. 

(given the patterns of use of the FIFOish buffer and the underlying OS, the
way I developed it, it's effectively a ring buffer with no bound, and a
"magic behaviour" that when the queue is empty, both pointers are reset to
the first position and the file is truncated. The situation where the hole
in front is significantly large is transient enough that making the file
sparse doesn't warrant the effort, waiting for a complete drain and the
eventual truncate takes care of space concerns).

The goal of my initial post was not to get someone to help me out of a
problem, but to learn whether the planned reiserfs4 plug-in system would
allow one to implement heterodox behaviours.

(it is true that a combination of sparse files + pointer-reset-on-drain
should bring a better solution, by removing the need for non-portable
extensions).

	-- Cyrille

-- 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 17:07   ` Cyrille Chepelov
  2003-06-17 17:39     ` Chris Dukes
@ 2003-06-17 19:30     ` Hans Reiser
  2003-06-17 19:43       ` Cyrille Chepelov
  1 sibling, 1 reply; 21+ messages in thread
From: Hans Reiser @ 2003-06-17 19:30 UTC (permalink / raw)
  To: Cyrille Chepelov; +Cc: reiserfs-list

Cyrille Chepelov wrote:

>Le Tue, Jun 17, 2003, à 06:24:19PM +0200, Carl-Daniel Hailfinger a écrit:
>
>  
>
>>This facility has been present in linux for years. Just google for "sparse
>>files".
>>    
>>
>
>Yes sparse files would do. However, after a sizeable use of the facility,
>you'd end up with a giant hole at the beginning, whatever queued data, and
>EOF. 
>What I want to do is the ability to just remove the hole in front (just as
>when there is no point in having a hole at the end of the file, you just
>truncate that hole out), so that seek(SEEK_BEGIN,0) positions me at the
>beginning of useful data (or EOF if my queue is empty).
>
>	-- Cyrille
>
>  
>
So in answer, yes, this is quite feasible and reasonable to do.  I will 
be likely to accept such a patch, though you should send a detailed 
design first to reiserfs-list before writing code.

-- 
Hans



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 19:30     ` Hans Reiser
@ 2003-06-17 19:43       ` Cyrille Chepelov
  0 siblings, 0 replies; 21+ messages in thread
From: Cyrille Chepelov @ 2003-06-17 19:43 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

Le Tue, Jun 17, 2003, à 11:30:57PM +0400, Hans Reiser a écrit:

> So in answer, yes, this is quite feasible and reasonable to do.  I will 
> be likely to accept such a patch, though you should send a detailed 
> design first to reiserfs-list before writing code.

I don't know whether I'll have the time to tackle such a project (at least
not in the immediate future), but I'm happy to know that you judge that
feasible. It seems to me wise to wait for a released version of the fs
before proceeding anyway.

Once I'm ready to give that a spin, I will start by discussing the technical
aspects here.

	-- Cyrille

-- 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 16:21 FIFO files Cyrille Chepelov
  2003-06-17 16:24 ` Carl-Daniel Hailfinger
@ 2003-06-17 22:42 ` Pierre Abbat
  1 sibling, 0 replies; 21+ messages in thread
From: Pierre Abbat @ 2003-06-17 22:42 UTC (permalink / raw)
  To: reiserfs-list

On Tuesday 17 June 2003 12:21, Cyrille Chepelov wrote:
> Hi all,
>
> I recently encountered a need for such a contraption as file-backed FIFOs.
> These are files where one process can append records at one end, and one
> other process can read records from the beginning of the file, chopping off
> the head of the file once it's not needed anymore. As I had to implement
> something right now on existing systems, I worked around the general
> absence for such a facility, but still, I'm wondering how difficult it
> would to write a reiserfs4 plug-in which would allow one to do the
> equivalent of a truncate() but truncating what's between the position 0 and
> the current cursor (effectively shifting all subsequent file positions),
> rather than truncating what's between the cursor and the end of file.
> (network considerations are explicitly out of the scope of this musing)

There's a program called "cupyvei" which allows one process to append records 
to a file, and any number of other processes to read independently from the 
beginning. It isn't unlimited though; it's a circular buffer, and the size is 
specified when you create it. Is that of any interest?

phma

-- 
.i toljundi do .ibabo mi'afra tu'a do
.ibabo damba do .ibabo do jinga
.icu'u la ma'atman.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-17 17:00     ` Carl-Daniel Hailfinger
@ 2003-06-18  6:17       ` Oleg Drokin
  2003-06-18 10:20         ` Ragnar Kjørstad
  0 siblings, 1 reply; 21+ messages in thread
From: Oleg Drokin @ 2003-06-18  6:17 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger; +Cc: Hans Reiser, Cyrille Chepelov, reiserfs-list

Hello!

On Tue, Jun 17, 2003 at 07:00:25PM +0200, Carl-Daniel Hailfinger wrote:
> Yes, but if I understood Cyrille correctly, her/his(?) needs should be
> completely satisfied by sparse files.
> One process can append records at the end, and one other process can read
> records from the beginning of the file, marking the read data empty. The

Hm, I am sorry, but this part is unclear.
I do not remember any API to create holes in existing files in place of already
present data.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18  6:17       ` Oleg Drokin
@ 2003-06-18 10:20         ` Ragnar Kjørstad
  2003-06-18 11:40           ` Carl-Daniel Hailfinger
  0 siblings, 1 reply; 21+ messages in thread
From: Ragnar Kjørstad @ 2003-06-18 10:20 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Carl-Daniel Hailfinger, Hans Reiser, Cyrille Chepelov,
	reiserfs-list

On Wed, Jun 18, 2003 at 10:17:21AM +0400, Oleg Drokin wrote:
> Hello!
> 
> On Tue, Jun 17, 2003 at 07:00:25PM +0200, Carl-Daniel Hailfinger wrote:
> > Yes, but if I understood Cyrille correctly, her/his(?) needs should be
> > completely satisfied by sparse files.
> > One process can append records at the end, and one other process can read
> > records from the beginning of the file, marking the read data empty. The
> 
> Hm, I am sorry, but this part is unclear.
> I do not remember any API to create holes in existing files in place of already
> present data.

There isn't one.

XFS uses an ioctl to punch holes, dm_punch_hole, but it's prototype is
specific to XDSM.

Once multiple filesystems implement the same functionality it makes sense 
to have a common interface - maybe a new entry in struct
inode_operations, and a corresponding systemcall?

Something like:
int punch_hole(const char *path, off_t offset, size_t len);
int fpunch_hole(int fd, off_t offset, size_t len);


The reader of the file would have to punch the holes after reading the
data. IMHO punch_hole should not "move" the contents of the file, so the
data is still available at the same offsets. If one rather would like
the data to be available at the beginning of the file at all times
something else is needed, maybe:

int truncate_start(const char *path, off_t offset);
int ftruncate_start(int fd, off_t offset);

that truncates the beginning of the file, moving the data from pos
offset to pos 0.


But punch_hole is much more generic, and could be useful in cases like
HSM and others. Basicly I think punch_hole is just something missing
from the linux VFS - logically it should be there. It should be
relatively easy to implement on all filesystems supporting sparse files,
as the end result is the same.

Truncate_start is another beast. One some filesystems like ext2/3 it
would not be possible to implement it effectively for large files, and
there are problems with offsets that are not multiple of the blocksize.


-- 
Ragnar Kjørstad
Zet.no

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 10:20         ` Ragnar Kjørstad
@ 2003-06-18 11:40           ` Carl-Daniel Hailfinger
  2003-06-18 13:21             ` Hans Reiser
                               ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Carl-Daniel Hailfinger @ 2003-06-18 11:40 UTC (permalink / raw)
  To: Ragnar Kjørstad
  Cc: Oleg Drokin, Hans Reiser, Cyrille Chepelov, reiserfs-list,
	linux-fsdevel

It might be more appropriate to continue discussing this at linux-fsdevel.

Ragnar Kjørstad wrote:
> On Wed, Jun 18, 2003 at 10:17:21AM +0400, Oleg Drokin wrote:
> 
>>Hello!
>>
>>On Tue, Jun 17, 2003 at 07:00:25PM +0200, Carl-Daniel Hailfinger wrote:
>>
>>>Yes, but if I understood Cyrille correctly, her/his(?) needs should be
>>>completely satisfied by sparse files.
>>>One process can append records at the end, and one other process can read
>>>records from the beginning of the file, marking the read data empty. The
>>
>>Hm, I am sorry, but this part is unclear.
>>I do not remember any API to create holes in existing files in place of already
>>present data.
> 
> 
> There isn't one.
> 
> XFS uses an ioctl to punch holes, dm_punch_hole, but it's prototype is
> specific to XDSM.
> 
> Once multiple filesystems implement the same functionality it makes sense 
> to have a common interface - maybe a new entry in struct
> inode_operations, and a corresponding systemcall?
> 
> Something like:
> int punch_hole(const char *path, off_t offset, size_t len);
> int fpunch_hole(int fd, off_t offset, size_t len);

make_hole or zap_range would be alternative names.


> The reader of the file would have to punch the holes after reading the
> data. IMHO punch_hole should not "move" the contents of the file, so the
> data is still available at the same offsets. If one rather would like
> the data to be available at the beginning of the file at all times
> something else is needed, maybe:
> 
> int truncate_start(const char *path, off_t offset);
> int ftruncate_start(int fd, off_t offset);
> 
> that truncates the beginning of the file, moving the data from pos
> offset to pos 0.
> 
> 
> But punch_hole is much more generic, and could be useful in cases like
> HSM and others. Basicly I think punch_hole is just something missing
> from the linux VFS - logically it should be there. It should be
> relatively easy to implement on all filesystems supporting sparse files,
> as the end result is the same.
> 
> Truncate_start is another beast. One some filesystems like ext2/3 it
> would not be possible to implement it effectively for large files, and
> there are problems with offsets that are not multiple of the blocksize.


Regards,
Carl-Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 11:40           ` Carl-Daniel Hailfinger
@ 2003-06-18 13:21             ` Hans Reiser
  2003-06-18 13:53             ` David Woodhouse
  2003-06-18 15:34             ` Bryan Henderson
  2 siblings, 0 replies; 21+ messages in thread
From: Hans Reiser @ 2003-06-18 13:21 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: Ragnar Kjørstad, Oleg Drokin, Cyrille Chepelov,
	reiserfs-list, linux-fsdevel

Carl-Daniel Hailfinger wrote:

>It might be more appropriate to continue discussing this at linux-fsdevel.
>
>Ragnar Kjørstad wrote:
>  
>
>>On Wed, Jun 18, 2003 at 10:17:21AM +0400, Oleg Drokin wrote:
>>
>>    
>>
>>>Hello!
>>>
>>>On Tue, Jun 17, 2003 at 07:00:25PM +0200, Carl-Daniel Hailfinger wrote:
>>>
>>>      
>>>
>>>>Yes, but if I understood Cyrille correctly, her/his(?) needs should be
>>>>completely satisfied by sparse files.
>>>>One process can append records at the end, and one other process can read
>>>>records from the beginning of the file, marking the read data empty. The
>>>>        
>>>>
>>>Hm, I am sorry, but this part is unclear.
>>>I do not remember any API to create holes in existing files in place of already
>>>present data.
>>>      
>>>
>>There isn't one.
>>
>>XFS uses an ioctl to punch holes, dm_punch_hole, but it's prototype is
>>specific to XDSM.
>>
>>Once multiple filesystems implement the same functionality it makes sense 
>>to have a common interface - maybe a new entry in struct
>>inode_operations, and a corresponding systemcall?
>>
>>Something like:
>>int punch_hole(const char *path, off_t offset, size_t len);
>>int fpunch_hole(int fd, off_t offset, size_t len);
>>    
>>
>
>make_hole or zap_range would be alternative names.
>
>
>  
>
>>The reader of the file would have to punch the holes after reading the
>>data. IMHO punch_hole should not "move" the contents of the file, so the
>>data is still available at the same offsets. If one rather would like
>>the data to be available at the beginning of the file at all times
>>something else is needed, maybe:
>>
>>int truncate_start(const char *path, off_t offset);
>>int ftruncate_start(int fd, off_t offset);
>>
>>that truncates the beginning of the file, moving the data from pos
>>offset to pos 0.
>>
>>
>>But punch_hole is much more generic, and could be useful in cases like
>>HSM and others. Basicly I think punch_hole is just something missing
>>from the linux VFS - logically it should be there. It should be
>>relatively easy to implement on all filesystems supporting sparse files,
>>as the end result is the same.
>>
>>Truncate_start is another beast. One some filesystems like ext2/3 it
>>would not be possible to implement it effectively for large files, and
>>there are problems with offsets that are not multiple of the blocksize.
>>    
>>
>
>
>Regards,
>Carl-Daniel
>
>
>
>  
>
I am generally friendly to patches to punch holes in reiser4 files if 
anyone finds the time to write such a thing (perhaps 
filename/..punch/start/length).

-- 
Hans



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 11:40           ` Carl-Daniel Hailfinger
  2003-06-18 13:21             ` Hans Reiser
@ 2003-06-18 13:53             ` David Woodhouse
  2003-06-18 14:28               ` Matthew Wilcox
  2003-06-18 15:34             ` Bryan Henderson
  2 siblings, 1 reply; 21+ messages in thread
From: David Woodhouse @ 2003-06-18 13:53 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: Ragnar Kjørstad, Oleg Drokin, Hans Reiser, Cyrille Chepelov,
	reiserfs-list, linux-fsdevel

On Wed, 2003-06-18 at 12:40, Carl-Daniel Hailfinger wrote:
> >>I do not remember any API to create holes in existing files in place of already
> >>present data.
> > 
> > There isn't one.
> > 
> > XFS uses an ioctl to punch holes, dm_punch_hole, but it's prototype is
> > specific to XDSM.
> > 
> > Once multiple filesystems implement the same functionality it makes sense 
> > to have a common interface - maybe a new entry in struct
> > inode_operations, and a corresponding systemcall?

I want to implement this for JFFS2 but refused to do it with an ioctl;
it can be done through something non-fs-specific or not at all. As soon
as there's a non-bletcherous way to expose it to the user, I'll be
implementing it.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 13:53             ` David Woodhouse
@ 2003-06-18 14:28               ` Matthew Wilcox
  2003-06-18 15:23                 ` Russell Coker
  2003-06-18 15:45                 ` Bryan Henderson
  0 siblings, 2 replies; 21+ messages in thread
From: Matthew Wilcox @ 2003-06-18 14:28 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Carl-Daniel Hailfinger, Ragnar Kjørstad, Oleg Drokin,
	Hans Reiser, Cyrille Chepelov, reiserfs-list, linux-fsdevel

On Wed, Jun 18, 2003 at 02:53:07PM +0100, David Woodhouse wrote:
> I want to implement this for JFFS2 but refused to do it with an ioctl;
> it can be done through something non-fs-specific or not at all. As soon
> as there's a non-bletcherous way to expose it to the user, I'll be
> implementing it.

irix uses an fcntl to do it, but i see no reason to copy bad interface
design.  a new syscall would seem appropriate.

has anyone analysed this for races?  are they the same as the ones for
truncate?  (i suspect they aren't).  what are the semantics of another
thread or process reading from a block that's been punched out?  what are
the semantics of another thread or process writing to a block that gets
punched out?  what are the semantics of another thread or process that
has this region of the file mmaped?  what are the semantics if this
thread has it mmaped?  what if two threads try to punch out overlapping
regions simultaneously?  what if a region is being truncated and punched
out at the same time?

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 14:28               ` Matthew Wilcox
@ 2003-06-18 15:23                 ` Russell Coker
  2003-06-18 15:40                   ` David Woodhouse
  2003-06-18 15:51                   ` Bryan Henderson
  2003-06-18 15:45                 ` Bryan Henderson
  1 sibling, 2 replies; 21+ messages in thread
From: Russell Coker @ 2003-06-18 15:23 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: reiserfs-list, linux-fsdevel

On Thu, 19 Jun 2003 00:28, Matthew Wilcox wrote:
> has anyone analysed this for races?  are they the same as the ones for
> truncate?  (i suspect they aren't).  what are the semantics of another
> thread or process reading from a block that's been punched out?  what are
> the semantics of another thread or process writing to a block that gets
> punched out?  what are the semantics of another thread or process that
> has this region of the file mmaped?  what are the semantics if this
> thread has it mmaped?  what if two threads try to punch out overlapping
> regions simultaneously?  what if a region is being truncated and punched
> out at the same time?

In what way are they different from that of truncate?

It seems to me that punching a hole in a file is actually easier than 
truncating a file as we don't have to change the length.  The issues of 
mmap'd files and writing to a block that's being punched should be the same 
as if the file was truncated.

The only thing that punching a hole has to handle that truncating doesn't is 
the corner case of punching a hole that does not align on block boundaries, 
in which case zeros have to be written to the parts that aren't block 
aligned, and for consistency they probably have to be written atomically.

As for truncating and punching at the same time, surely that would just be 
equivalent to a truncate to whichever gives the shortest file and both system 
calls would return success.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/    Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 11:40           ` Carl-Daniel Hailfinger
  2003-06-18 13:21             ` Hans Reiser
  2003-06-18 13:53             ` David Woodhouse
@ 2003-06-18 15:34             ` Bryan Henderson
  2 siblings, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2003-06-18 15:34 UTC (permalink / raw)
  To: Carl-Daniel Hailfinger
  Cc: Cyrille Chepelov, Oleg Drokin, linux-fsdevel, Hans Reiser,
	Ragnar Kjørstad, reiserfs-list





>> XFS uses an ioctl to punch holes, dm_punch_hole, but it's prototype is
>> specific to XDSM.
>
>make_hole or zap_range would be alternative names.

The concept is a good one, and a separate system call, defined in a
filesystem-type-independent way, is better than an ioctl or fcntl, but the
interface should not be defined in terms of making holes.  At the highest
level, a file is just a stream of bytes.  A continuous stream.  So what
this interface should be seen as, and named as, is something that simply
clears a region of the file to zeroes.  In a particular implementation, if
it makes sense, this can be implemented by deallocating blocks that back
that region of the file.

Note that in some filesystem types, there may not be any concept of
allocated blocks or sparse vs dense files.

AIX has a system call for this and it is called fclear().  I like that
better than any alternative I've seen.

Bryan Henderson                         IBM Almaden Research Center
San Jose CA                             Filesystems


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 15:23                 ` Russell Coker
@ 2003-06-18 15:40                   ` David Woodhouse
  2003-06-18 15:51                   ` Bryan Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2003-06-18 15:40 UTC (permalink / raw)
  To: Russell Coker; +Cc: Matthew Wilcox, reiserfs-list, linux-fsdevel

On Wed, 2003-06-18 at 16:23, Russell Coker wrote:
> In what way are they different from that of truncate?

truncate? Surely the semantics are equivalent to writing all zeroes, not
truncate?

> It seems to me that punching a hole in a file is actually easier than 
> truncating a file as we don't have to change the length.  The issues of 
> mmap'd files and writing to a block that's being punched should be the same 
> as if the file was truncated.

Except with truncate it's hard, and for 'punch' (or fclear()) I'd
suggest that you're allowed to just refrain from freeing the backing
store for the page in question if it's mapped shared writable. 

It should be considered identical to mmapping /dev/zero and writing from
it, the _only_ exception being that it hints to the underlying file
system that it _may_ want to free the backing store for the range in
question.

> The only thing that punching a hole has to handle that truncating doesn't is 
> the corner case of punching a hole that does not align on block boundaries, 
> in which case zeros have to be written to the parts that aren't block 
> aligned, and for consistency they probably have to be written atomically.

We don't even bother with atomicity in write() do we? We do a page at a
time and if reads happen while a writer is between pages, they see that
the write() isn't atomic. 

> As for truncating and punching at the same time, surely that would just be 
> equivalent to a truncate to whichever gives the shortest file and both system 
> calls would return success.

Either the truncate happens first and the punch then extends i_size to
the last byte which was punched, or the punch happens first and the file
is subsequently truncated.


-- 
dwmw2


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 14:28               ` Matthew Wilcox
  2003-06-18 15:23                 ` Russell Coker
@ 2003-06-18 15:45                 ` Bryan Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2003-06-18 15:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Carl-Daniel Hailfinger, Cyrille Chepelov, David Woodhouse,
	Oleg Drokin, linux-fsdevel, linux-fsdevel-owner, Hans Reiser,
	Ragnar Kjørstad, reiserfs-list





The semantics are really not controversial.  They are the same as a write()
of zeroes.

>what are the semantics of another
>thread or process reading from a block that's been punched out?

It should read zeroes.

>what are
>the semantics of another thread or process writing to a block that gets
>punched out?

If it writes after the punching, the data it writes ends up in the file.
If it writes before the punching, a hole ends up in the file.  It can't
write at the same time as the punching because of the atomic write
guarantee.

>what are the semantics of another thread or process that
>has this region of the file mmaped?

It should see zeroes.

>what are the semantics if this thread has it mmaped?

Same.

>what if two threads try to punch out overlapping
>regions simultaneously?

The union of the two regions gets cleared (but there might be fewer blocks
actually removed from the file than if the entire region were punched at
once, owing to implementation realities).

>what if a region is being truncated and punched
>out at the same time?

Same.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: FIFO files
  2003-06-18 15:23                 ` Russell Coker
  2003-06-18 15:40                   ` David Woodhouse
@ 2003-06-18 15:51                   ` Bryan Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2003-06-18 15:51 UTC (permalink / raw)
  To: Russell Coker
  Cc: linux-fsdevel, linux-fsdevel-owner, reiserfs-list, Matthew Wilcox





>It seems to me that punching a hole in a file is actually easier than
>truncating a file as we don't have to change the length.  The issues of
>mmap'd files and writing to a block that's being punched should be the
same
>as if the file was truncated.

Except that in most Linux kernels, a VM interface specifically for
truncating is exported for use by loadable modules, whereas there isn't an
interface to remove pages from the middle of a file cache.

There ought to be, though.  I consider this a gaping hole in the Linux VM
interface.  Proposals to fill it in have been floated recently on
linux-kernel.

Bryan Henderson                            IBM Almaden Research Center
San Jose CA                                Filesystems


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-06-18 15:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-17 16:21 FIFO files Cyrille Chepelov
2003-06-17 16:24 ` Carl-Daniel Hailfinger
2003-06-17 16:32   ` Hans Reiser
2003-06-17 17:00     ` Carl-Daniel Hailfinger
2003-06-18  6:17       ` Oleg Drokin
2003-06-18 10:20         ` Ragnar Kjørstad
2003-06-18 11:40           ` Carl-Daniel Hailfinger
2003-06-18 13:21             ` Hans Reiser
2003-06-18 13:53             ` David Woodhouse
2003-06-18 14:28               ` Matthew Wilcox
2003-06-18 15:23                 ` Russell Coker
2003-06-18 15:40                   ` David Woodhouse
2003-06-18 15:51                   ` Bryan Henderson
2003-06-18 15:45                 ` Bryan Henderson
2003-06-18 15:34             ` Bryan Henderson
2003-06-17 17:07   ` Cyrille Chepelov
2003-06-17 17:39     ` Chris Dukes
2003-06-17 17:54       ` Cyrille Chepelov
2003-06-17 19:30     ` Hans Reiser
2003-06-17 19:43       ` Cyrille Chepelov
2003-06-17 22:42 ` Pierre Abbat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.