From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from main.gmane.org ([80.91.229.2] helo=ciao.gmane.org) by canuck.infradead.org with esmtps (Exim 4.43 #1 (Red Hat Linux)) id 1DECz6-0007Us-Li for linux-mtd@lists.infradead.org; Wed, 23 Mar 2005 16:01:49 -0500 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1DECsa-00081O-Mq for linux-mtd@lists.infradead.org; Wed, 23 Mar 2005 21:55:12 +0100 Received: from halhoupro3.halliburton.com ([64.154.26.251]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 23 Mar 2005 21:55:04 +0100 Received: from sergei.sharonov by halhoupro3.halliburton.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 23 Mar 2005 21:55:04 +0100 To: linux-mtd@lists.infradead.org From: Sergei Sharonov Date: Wed, 23 Mar 2005 20:50:52 +0000 (UTC) Message-ID: References: <4241396B.D689EB32@st.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: news Subject: Re: atomic file operations List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Estelle, thanks, appreciate your help. > > Sergei Sharonov wrote: > > Is a write of 1024 bytes atomic? > > Does it relate to the page size in any way? BTW I am using NAND and the > > page may vary between 512 and 2048 bytes depending on a device. > > No write operation is guaranteed to be atomic. Have a look > at jffs2_write_inode_range in write.c : if there is not enough > space in the current block for the whole data, it may be split > into several chunks. Additionally write ops that overlap a > cache page boundary (not a flash page) are always split at > the page limit. That means that one write may have several CRCs corresponding to splinter chunks? > If you want to have atomic writes, you could: > 1) Mandatorily: ensure that your application will not > issue write ops which overlap a page boundary. > You should not tweak the JFFS2 code to write such > overlapping nodes, otherwise you must also tweak > the GC and it gets difficult. > 2) Either tweak jffs2_write_inode_range to forbid > splitting data which does not overlap a page boundary > or adjust JFFS2_MIN_DATA_LEN to reserve enough > space (difficult to estimate maybe if you have > compression...). > > The above tweaking should ensure that an input buffer > is written to JFFS2 FS as a single CRC-protected > data node. Ok, got that. Does not seem like a promissing idea considering how fast jffs2 evolves and therefore how bad forking would be. Thansk for the suggestion anyway. > You should be aware that on NAND flash JFFS2 uses > a (nand flash) page buffer (wbuf.c), which is flushed > only on fsync/sync/umount. So even though your write > ops will be atomic (with above code tweaks), > there is no guarantee that a buffer is effectively > committed to flash when write() returns, because the > end of the data node may remain in the buffer. > If you want that also, you can tweak JFFS2 again > by requiring a wbuf flush after each "atomic write", > or you can have your application call fsync after > each write. Beg pardon if it is FAQ, but if I open the file with O_SYNC flag, wouldn't that guarantee synchronous write that does not return untill all the data is in flash? > > Is file rename atomic? > See jffs2_rename in dir.c. There are two steps: > make the new hard link, remove the old hard link. > You may end up with two names for the same inode if > there is a powerdown, so no it is not atomic. Could not see that comming. Usualy people assume rename operation atomic. > > Second issue is: How badly these small chunks will affect my mount time? > There have been previous threads about this. > Some people proposed some (application-side) workaround, > you can find it in the archive or maybe someone will point > it to you. I believe I saw a proposal to save small chunks as separate files, then append them as a temp file and rename temp file to real log file. The problems are (1) the log file is huge (2) rename is not atomic per your reply. Sergei Sharonov