From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Shouldn't backend devices for VMX domain disks be opened with O_DIRECT? Date: Thu, 02 Feb 2006 20:50:28 -0600 Message-ID: <43E2C4F4.1060509@us.ibm.com> References: <43E27DA3.80405@us.ibm.com> <20060202224106.GC17266@vienna.egenera.com> <43E29F27.10009@us.ibm.com> <1138934528.4374.13.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1138934528.4374.13.camel@localhost.localdomain> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Stephen Tweedie Cc: Steve Dobbelstein , "Philip R. Auld" , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Stephen Tweedie wrote: >Hi, > >On Thu, 2006-02-02 at 18:09 -0600, Anthony Liguori wrote: > > > >>Referring to the original question, which has been quoted away, >>journaling doesn't require that data be written to disk per-say but that >>writes occur in a particular order. A journal is always recoverable >>given that writes occur in the expected order. >> >> > >Sure... it's *internally* consistent, maybe. But you need more than >that. You need guarantees that things are on disk, else external >consistency guarantees will be broken. > > Ok, this is certainly correct (but not the original point). >Consider things like sendmail fsync()ing a spool file before telling the >sender that the email has been accepted. After that acknowledgement, >the sender can delete the mail from its queues knowing that the >recipient MTA definitely has the data, and even if it crashes, the mail >won't be lost. Databases frequently have similar consistency >requirements. If a power failure loses writes that you have told the >domU have completed --- even if you maintain write ordering --- then you >*are* putting application correctness at risk, there's no doubt about >it. > > Ok, this is a good argument for using O_SYNC. >Fortunately, that's just what blkback is doing --- it's using submit_bio >to submit the write IOs without waiting for completion, and is using the >bio's bi_end_io callback to process the IO completion once it is hard on >disk. > > Yup, the question here is with the device model which doesn't use the block frontend/backend. Would O_DIRECT be helpful over O_SYNC? Regards, Anthony Liguori >--Stephen > > > > >