* RDMA power failure write atomicity
@ 2016-03-10 23:45 Vladislav Bolkhovitin
[not found] ` <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Vladislav Bolkhovitin @ 2016-03-10 23:45 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hello,
I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
written atomically in face of power failure (or some other similar failure), i.e.
either written in full, or not written at all?
For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
this for RDMA? Or different vendors/implementation have so different expectations and
promises, so you can not assume anything >1 byte?
I can't find such info anywhere.
Thanks,
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org>]
* Re: RDMA power failure write atomicity [not found] ` <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org> @ 2016-03-11 1:33 ` Asgeir Eiriksson [not found] ` <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Asgeir Eiriksson @ 2016-03-11 1:33 UTC (permalink / raw) To: Vladislav Bolkhovitin; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Vladislav, This is an area of active R&D You might be interested in the following (at ietf.org): Title : RDMA Durable Write Commit Authors : Tom Talpey Jim Pinkerton <> Filename : draft-talpey-rdma-commit-00.txt Pages : 24 Date : 2016-02-19 Regards, ‘Asgeir > On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote: > > Hello, > > I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power > failure write atomicity? I mean, what is minimal size and alignment guaranteed to be > written atomically in face of power failure (or some other similar failure), i.e. > either written in full, or not written at all? > > For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like > this for RDMA? Or different vendors/implementation have so different expectations and > promises, so you can not assume anything >1 byte? > > I can't find such info anywhere. > > Thanks, > Vlad > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: RDMA power failure write atomicity [not found] ` <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2016-03-12 0:26 ` Vladislav Bolkhovitin [not found] ` <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Vladislav Bolkhovitin @ 2016-03-12 0:26 UTC (permalink / raw) To: Asgeir Eiriksson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question, because it is about how to ensure persistence of RDMA writes. Atomicity it is mentioning as well as general RDMA atomicity is atomicity with regard of parallel commands acting on the same locations. However, I'm asking about power failure atomicity, which is something different. For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen while this operation is in progress, what data will end up on the target location? All 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure atomicity I mean is guarantee that the data either old, or new, never mix of old and new data. Thanks, Vlad Asgeir Eiriksson wrote on 03/10/2016 05:33 PM: > Vladislav, > > This is an area of active R&D > > You might be interested in the following (at ietf.org): > > Title : RDMA Durable Write Commit > Authors : Tom Talpey > Jim Pinkerton > <> > Filename : draft-talpey-rdma-commit-00.txt > Pages : 24 > Date : 2016-02-19 > > Regards, > > ‘Asgeir > > >> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote: >> >> Hello, >> >> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power >> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be >> written atomically in face of power failure (or some other similar failure), i.e. >> either written in full, or not written at all? >> >> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like >> this for RDMA? Or different vendors/implementation have so different expectations and >> promises, so you can not assume anything >1 byte? >> >> I can't find such info anywhere. >> >> Thanks, >> Vlad -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org>]
* Re: RDMA power failure write atomicity [not found] ` <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org> @ 2016-03-12 1:14 ` Anuj Kalia [not found] ` <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Anuj Kalia @ 2016-03-12 1:14 UTC (permalink / raw) To: Vladislav Bolkhovitin Cc: Asgeir Eiriksson, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org There are several factors that make this problem hard. For many modern servers, DMA data is written to last level cache via DDIO, i.e., it will not go to the NVDIMM unless the remote CPU flushes the cache / cache lines. On servers where data is written to DRAM (or to an NVDIMM attached to memory bus), the data can (probably) still be buffered by the CPU's memory controller. I am not sure how much control RDMA NICs have over these factors. AFAIK, there is no PCIe command to flush either cache lines or memory controller buffers, so flushing to DRAM this is beyond what RDMA NICs can currently accomplish. --Anuj (rdma_guy) On Fri, Mar 11, 2016 at 7:26 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote: > I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question, > because it is about how to ensure persistence of RDMA writes. Atomicity it is > mentioning as well as general RDMA atomicity is atomicity with regard of parallel > commands acting on the same locations. However, I'm asking about power failure > atomicity, which is something different. > > For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen > while this operation is in progress, what data will end up on the target location? All > 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure > atomicity I mean is guarantee that the data either old, or new, never mix of old and > new data. > > Thanks, > Vlad > > Asgeir Eiriksson wrote on 03/10/2016 05:33 PM: >> Vladislav, >> >> This is an area of active R&D >> >> You might be interested in the following (at ietf.org): >> >> Title : RDMA Durable Write Commit >> Authors : Tom Talpey >> Jim Pinkerton >> <> >> Filename : draft-talpey-rdma-commit-00.txt >> Pages : 24 >> Date : 2016-02-19 >> >> Regards, >> >> ‘Asgeir >> >> >>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote: >>> >>> Hello, >>> >>> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power >>> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be >>> written atomically in face of power failure (or some other similar failure), i.e. >>> either written in full, or not written at all? >>> >>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like >>> this for RDMA? Or different vendors/implementation have so different expectations and >>> promises, so you can not assume anything >1 byte? >>> >>> I can't find such info anywhere. >>> >>> Thanks, >>> Vlad > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA power failure write atomicity [not found] ` <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-03-12 1:54 ` Vladislav Bolkhovitin 0 siblings, 0 replies; 5+ messages in thread From: Vladislav Bolkhovitin @ 2016-03-12 1:54 UTC (permalink / raw) To: Anuj Kalia Cc: Asgeir Eiriksson, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Anuj Kalia wrote on 03/11/2016 05:14 PM: > There are several factors that make this problem hard. For many modern > servers, DMA data is written to last level cache via DDIO, i.e., it > will not go to the NVDIMM unless the remote CPU flushes the cache / > cache lines. On servers where data is written to DRAM (or to an NVDIMM > attached to memory bus), the data can (probably) still be buffered by > the CPU's memory controller. It sounds to me that then it should have the regular CPU write atomicity properties, i.e. on 64-bit Intel: 8 bytes with 8 bytes alignment. > I am not sure how much control RDMA NICs have over these factors. > AFAIK, there is no PCIe command to flush either cache lines or memory > controller buffers, so flushing to DRAM this is beyond what RDMA NICs > can currently accomplish. Yes, but flushing data is beyond my question, which is only about what type of pattern of eventual data you can see on power failure, with or without flushing. If there are no any minimal power failure atomicity guarantees, it would mean effectively disable any write-in place into NVRAM/PMEM, because you can end up with mixed old and new, hence, corrupted data. You would not be able to ever atomically switch a pointer, so ever classical "write in new location, flush, than switch pointer to the new data" approach would not work anymore. As result, value of what is proposed in draft-Talpey-rdma-commit-00.txt (which is very good proposal) would be significantly lower, because, unless I'm missing something, the only available use case for RDMA writes bypassing remote CPU that would withstand is logs replication with each entry protected by a checksum, so on recovery after power failure you can figure out the last corrupted record. However, records compaction would still be done via remote CPU (no bypassing), because only CPU can power failure atomically switch pointers in NVRAM/PMEM. So, it seems to me that something minimal, like 8 bytes, must be defined. I wonder, maybe it has already been defined. Looks like, not. Thanks, Vlad > --Anuj (rdma_guy) > > On Fri, Mar 11, 2016 at 7:26 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote: >> I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question, >> because it is about how to ensure persistence of RDMA writes. Atomicity it is >> mentioning as well as general RDMA atomicity is atomicity with regard of parallel >> commands acting on the same locations. However, I'm asking about power failure >> atomicity, which is something different. >> >> For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen >> while this operation is in progress, what data will end up on the target location? All >> 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure >> atomicity I mean is guarantee that the data either old, or new, never mix of old and >> new data. >> >> Thanks, >> Vlad >> >> Asgeir Eiriksson wrote on 03/10/2016 05:33 PM: >>> Vladislav, >>> >>> This is an area of active R&D >>> >>> You might be interested in the following (at ietf.org): >>> >>> Title : RDMA Durable Write Commit >>> Authors : Tom Talpey >>> Jim Pinkerton >>> <> >>> Filename : draft-talpey-rdma-commit-00.txt >>> Pages : 24 >>> Date : 2016-02-19 >>> >>> Regards, >>> >>> ‘Asgeir >>> >>> >>>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst-d+Crzxg7Rs0@public.gmane.org> wrote: >>>> >>>> Hello, >>>> >>>> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power >>>> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be >>>> written atomically in face of power failure (or some other similar failure), i.e. >>>> either written in full, or not written at all? >>>> >>>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like >>>> this for RDMA? Or different vendors/implementation have so different expectations and >>>> promises, so you can not assume anything >1 byte? >>>> >>>> I can't find such info anywhere. >>>> >>>> Thanks, >>>> Vlad >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-12 1:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-10 23:45 RDMA power failure write atomicity Vladislav Bolkhovitin
[not found] ` <56E20734.4030208-d+Crzxg7Rs0@public.gmane.org>
2016-03-11 1:33 ` Asgeir Eiriksson
[not found] ` <0E25BAE6-9091-4B28-A2A9-2F41BD97145A-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-12 0:26 ` Vladislav Bolkhovitin
[not found] ` <56E36247.7060605-d+Crzxg7Rs0@public.gmane.org>
2016-03-12 1:14 ` Anuj Kalia
[not found] ` <CADPSxAh-Fn8cHekiXeBa6+1rNDM=N0y5wQHDFtM4BxxH0wjzBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-12 1:54 ` Vladislav Bolkhovitin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox