From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bolkhovitin Subject: Re: RDMA power failure write atomicity Date: Fri, 11 Mar 2016 17:54:02 -0800 Message-ID: <56E376BA.6090603@vlnb.net> References: <56E20734.4030208@vlnb.net> <0E25BAE6-9091-4B28-A2A9-2F41BD97145A@gmail.com> <56E36247.7060605@vlnb.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Anuj Kalia Cc: Asgeir Eiriksson , "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: linux-rdma@vger.kernel.org Anuj Kalia wrote on 03/11/2016 05:14 PM: > There are several factors that make this problem hard. For many moder= n > servers, DMA data is written to last level cache via DDIO, i.e., it > will not go to the NVDIMM unless the remote CPU flushes the cache / > cache lines. On servers where data is written to DRAM (or to an NVDIM= M > attached to memory bus), the data can (probably) still be buffered by > the CPU's memory controller. It sounds to me that then it should have the regular CPU write atomicit= y properties, i.e. on 64-bit Intel: 8 bytes with 8 bytes alignment. > I am not sure how much control RDMA NICs have over these factors. > AFAIK, there is no PCIe command to flush either cache lines or memory > controller buffers, so flushing to DRAM this is beyond what RDMA NICs > can currently accomplish. Yes, but flushing data is beyond my question, which is only about what = type of pattern of eventual data you can see on power failure, with or without flushing= =2E If there are no any minimal power failure atomicity guarantees, it woul= d mean effectively disable any write-in place into NVRAM/PMEM, because you can= end up with mixed old and new, hence, corrupted data. You would not be able to ever= atomically switch a pointer, so ever classical "write in new location, flush, than= switch pointer to the new data" approach would not work anymore. As result, value of w= hat is proposed in draft-Talpey-rdma-commit-00.txt (which is very good proposal) would = be significantly lower, because, unless I'm missing something, the only available use ca= se for RDMA writes bypassing remote CPU that would withstand is logs replication wi= th each entry protected by a checksum, so on recovery after power failure you can fig= ure out the last corrupted record. However, records compaction would still be done via r= emote CPU (no bypassing), because only CPU can power failure atomically switch pointe= rs in NVRAM/PMEM. So, it seems to me that something minimal, like 8 bytes, must be define= d. I wonder, maybe it has already been defined. Looks like, not. Thanks, Vlad > --Anuj (rdma_guy) >=20 > On Fri, Mar 11, 2016 at 7:26 PM, Vladislav Bolkhovitin = wrote: >> I'm aware of this proposal. Unfortunately, it is quite orthogonal to= my question, >> because it is about how to ensure persistence of RDMA writes. Atomic= ity it is >> mentioning as well as general RDMA atomicity is atomicity with regar= d of parallel >> commands acting on the same locations. However, I'm asking about pow= er failure >> atomicity, which is something different. >> >> For instance, you are doing RDMA WRITE of 10 bytes of data. If a pow= er failure happen >> while this operation is in progress, what data will end up on the ta= rget location? All >> 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes= old? Power failure >> atomicity I mean is guarantee that the data either old, or new, neve= r mix of old and >> new data. >> >> Thanks, >> Vlad >> >> Asgeir Eiriksson wrote on 03/10/2016 05:33 PM: >>> Vladislav, >>> >>> This is an area of active R&D >>> >>> You might be interested in the following (at ietf.org): >>> >>> Title : RDMA Durable Write Commit >>> Authors : Tom Talpey >>> Jim Pinkerton >>> <> >>> Filename : draft-talpey-rdma-commit-00.txt >>> Pages : 24 >>> Date : 2016-02-19 >>> >>> Regards, >>> >>> =E2=80=98Asgeir >>> >>> >>>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin = wrote: >>>> >>>> Hello, >>>> >>>> I'm currently considering to use NVDIMM behind RDMA and wonder wha= t is RDMA power >>>> failure write atomicity? I mean, what is minimal size and alignmen= t guaranteed to be >>>> written atomically in face of power failure (or some other similar= failure), i.e. >>>> either written in full, or not written at all? >>>> >>>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. I= s there anything like >>>> this for RDMA? Or different vendors/implementation have so differe= nt expectations and >>>> promises, so you can not assume anything >1 byte? >>>> >>>> I can't find such info anywhere. >>>> >>>> Thanks, >>>> Vlad >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma= " in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma"= in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" i= n the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html