* RDMA and memory ordering
@ 2013-11-10 10:46 Anuj Kalia
[not found] ` <CADPSxAhAGYZude8CM65-UDvfiPscStgcNsAfs=2XBbntg-wL0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Anuj Kalia @ 2013-11-10 10:46 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi.
I am running a server which essentially does the following operations in a loop:
A[i].value = counter; //It's actually something else
asm volatile ("" : : : "memory");
asm volatile("mfence" ::: "memory");
A[i].counter = counter;
printf("%d %d\n", A[i].value, A[i].counter);
counter ++;
Basically, I want a fresh value of A[i].counter to indicate a fresh A[i].value.
I have a remote client which reads the struct A[i] from the server
(via RDMA) in a loop. Sometimes in the value that the client reads,
A[i].counter is larger than A[i].value. i.e., I see the newer value of
A[i].counter but A[i].value corresponds to a previous iteration of the
server's loop.
How can this happen in the presence of memory barriers? With barriers,
A[i].counter should be updated later and therefore should always be
smaller than A[i].value.
Thanks for your help!
Anuj Kalia,
Carnegie Mellon University
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread[parent not found: <CADPSxAhAGYZude8CM65-UDvfiPscStgcNsAfs=2XBbntg-wL0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <CADPSxAhAGYZude8CM65-UDvfiPscStgcNsAfs=2XBbntg-wL0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-12 10:16 ` Gabriele Svelto [not found] ` <5281FFF9.5070705-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Gabriele Svelto @ 2013-11-12 10:16 UTC (permalink / raw) To: Anuj Kalia, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Anuj, On 10/11/2013 11:46, Anuj Kalia wrote: > How can this happen in the presence of memory barriers? With barriers, > A[i].counter should be updated later and therefore should always be > smaller than A[i].value. memory barriers such as mfence synchronize memory operations from the point of view of CPUs only. Practically this means that the stores you wrote might go out to memory in a different order than what the processor sees and external devices such as a PCIe HCAs might thus see a different ordering even in the presence of memory barriers. To ensure that an external devices sees your store in the order you meant you will need some form of external barrier though I do not know if it is possible at all in userspace and besides it will be a fragile solution. Instead I would suggest you to use verbs atomic operations such as IBV_WR_ATOMIC_CMP_AND_SWP and IBV_WR_ATOMIC_FETCH_AND_ADD to implement what you have in mind. Gabriele -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <5281FFF9.5070705-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <5281FFF9.5070705-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-11-12 10:31 ` Anuj Kalia 2013-11-12 18:31 ` Jason Gunthorpe 2013-11-13 18:23 ` Gabriele Svelto 0 siblings, 2 replies; 16+ messages in thread From: Anuj Kalia @ 2013-11-12 10:31 UTC (permalink / raw) To: Gabriele Svelto; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Hi Gabriel, Thanks for your reply. That makes sense. This way, we have no consistency between the CPU's view and the HCA's view - it all depends when the cache gets flushed to RAM. However, if the HCA performs reads from L3 cache, then everything should be consistent, right? While ordering the writes, I think we can assume that they are ordered till the cache hierarchy (with no guarantees for when they appear in RAM). Ido Shamai (@Mellanox) told me that RDMA writes go to L3 cache. This, plus on-chip memory controllers make me think that reads should come from L3 cache too. I believe the atomic operations would be a lot more expensive than reads/writes. I'm targetting maximum performance so I don't want to look that way yet. --Anuj On Tue, Nov 12, 2013 at 6:16 AM, Gabriele Svelto <gabriele.svelto-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hi Anuj, > > > On 10/11/2013 11:46, Anuj Kalia wrote: >> >> How can this happen in the presence of memory barriers? With barriers, >> A[i].counter should be updated later and therefore should always be >> smaller than A[i].value. > > > memory barriers such as mfence synchronize memory operations from the point > of view of CPUs only. Practically this means that the stores you wrote might > go out to memory in a different order than what the processor sees and > external devices such as a PCIe HCAs might thus see a different ordering > even in the presence of memory barriers. > > To ensure that an external devices sees your store in the order you meant > you will need some form of external barrier though I do not know if it is > possible at all in userspace and besides it will be a fragile solution. > > Instead I would suggest you to use verbs atomic operations such as > IBV_WR_ATOMIC_CMP_AND_SWP and IBV_WR_ATOMIC_FETCH_AND_ADD to implement what > you have in mind. > > Gabriele -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RDMA and memory ordering 2013-11-12 10:31 ` Anuj Kalia @ 2013-11-12 18:31 ` Jason Gunthorpe [not found] ` <20131112183142.GB6639-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2013-11-13 18:23 ` Gabriele Svelto 1 sibling, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2013-11-12 18:31 UTC (permalink / raw) To: Anuj Kalia Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Tue, Nov 12, 2013 at 06:31:04AM -0400, Anuj Kalia wrote: > That makes sense. This way, we have no consistency between the CPU's > view and the HCA's view - it all depends when the cache gets flushed > to RAM. What you are talking about is firmly in undefined territory. You might be able to get something to work today, but tomorrows CPUs and HCAs might mess it up. You will never reliably get the guarentee you desired with the scheme you have. Even with two CPUs it is not going to happen. > I have a remote client which reads the struct A[i] from the server > (via RDMA) in a loop. Sometimes in the value that the client reads, > A[i].counter is larger than A[i].value. i.e., I see the newer value of > A[i].counter but A[i].value corresponds to a previous iteration of the > server's loop. This is a fundamental mis-understanding of what FENCE does, it just makes the writes happen in-order, it doesn't alter the reader side CPU1 CPU2 read avalue value = counter FENCE a.counter = counter read a.counter value < counter CPU1 CPU2 a.value = counter read a.value FENCE a.counter = counter read a.coutner value < counter CPU1 CPU2 a.value = counter FENCE read a.value < SCHEDULE > a.counter = counter read a.coutner value < counter etc. This stuff is hard, if you want a crazy scheme to be reliable you need to have really detailed understanding of what is actually being guarenteed. > However, if the HCA performs reads from L3 cache, then everything > should be consistent, right? While ordering the writes, I think we > can No. The cache makes no difference. Fundamentally you aren't atomically writing cache lines. You are writing single values. 99% of the time it might look like atomic cache line writes, but there is a 1% where that assumption will break. Probably the best you can do is a collision detect scheme: uint64_t counter void data[]; writer counter++ FENCE data = [.....]; FENCE counter++ reader: read counter if counter % 2 == 1: retry read data read counter if counter != last_counter: retry But even something as simple as that probably has scary races - I only thought about it for a few moments. :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20131112183142.GB6639-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <20131112183142.GB6639-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-11-12 20:59 ` Anuj Kalia [not found] ` <CADPSxAgF1CAiYoYbxbCON4NCD-tH8cAsJFRtECkTGJJQC4MXCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Anuj Kalia @ 2013-11-12 20:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [Included missed conversation with Jason at end]. Jason, Thanks again. So we conclude there is nothing like an atomic cacheline read. Then my current design is a dud. But there should be 8 byte atomicity, right? I think I can leverage that to get what I want. This part is interesting (from Jason's reply): "If you burst read from the HCA value and counter then the result is undefined, you don't know if counter was read before value, or the other way around." Is there a way of knowing the order in which they are read - for example, I heard in a talk that there is a left-to-right ordering when a HCA reads a contiguous buffer. This could be totally architecture specific, for example, I just want the answer for Mellanox ConnectX-3 cards. I think I can check this experimentally, but a definitive answer would be great. --Anuj [Conversation with Jason follows] Jason, Thanks a lot for your reply. I think I understand that the RDMA reader will not see the ordering in the updates to A[i].value and A[i].counter if they are in different L3 cache lines. But what are the guarantees when they are in the same cache line? For example, 32 bit processors have atomic 32 bit loads and stores i.e. memory operations to the same 32 bit (aligned) word are linearizable. On 12 Nov 2013 13:31, "Jason Gunthorpe" <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > > On Tue, Nov 12, 2013 at 06:31:04AM -0400, Anuj Kalia wrote: > > > That makes sense. This way, we have no consistency between the CPU's > > view and the HCA's view - it all depends when the cache gets flushed > > to RAM. > > What you are talking about is firmly in undefined territory. You might > be able to get something to work today, but tomorrows CPUs and HCAs > might mess it up. > > You will never reliably get the guarentee you desired with the scheme > you have. Even with two CPUs it is not going to happen. > > > I have a remote client which reads the struct A[i] from the server > > (via RDMA) in a loop. Sometimes in the value that the client reads, > > A[i].counter is larger than A[i].value. i.e., I see the newer value of > > A[i].counter but A[i].value corresponds to a previous iteration of the > > server's loop. > > > > This is a fundamental mis-understanding of what FENCE does, it just > makes the writes happen in-order, it doesn't alter the reader side > > CPU1 CPU2 > read avalue > value = counter > FENCE > a.counter = counter > read a.counter > > value < counter > That's right - thanks for the detailed explanation! However, I'm assuming that the HCA performs atomic cacheline reads (I don't have a lot of basis for this assumption and it would be great if someone could tell me more about it). If that is true, 'read a.value' and 'read a.counter' are not 2 separate operations. Instead, there is one 'read cacheline(a)' - that should provide a snapshot of a's state at CPU1. > > > CPU1 CPU2 > a.value = counter > read a.value > FENCE > a.counter = counter > read a.coutner > > value < counter > > > CPU1 CPU2 > a.value = counter > FENCE > read a.value > < SCHEDULE > > a.counter = counter > read a.coutner > > value < counter > > etc. > > This stuff is hard, if you want a crazy scheme to be reliable you need > to have really detailed understanding of what is actually being > guarenteed. > > > However, if the HCA performs reads from L3 cache, then everything > > should be consistent, right? While ordering the writes, I think we > > can > > No. The cache makes no difference. Fundamentally you aren't atomically > writing cache lines. You are writing single values. I was not assuming atomic writes to the entire cacheline - I was only assuming that the ordering imposed by mfence is preserved in cache - the write to 'a.value' appears in the cache hierarchy before the write to 'a.counter'. > > 99% of the time it might look like atomic cache line writes, but there > is a 1% where that assumption will break. > > Probably the best you can do is a collision detect scheme: > > uint64_t counter > void data[]; > > writer > counter++ > FENCE > data = [.....]; > FENCE > counter++ > > reader: > read counter > if counter % 2 == 1: retry > read data > read counter > if counter != last_counter: retry > > But even something as simple as that probably has scary races - I only > thought about it for a few moments. :) > > Jason So I guess my primary question is this now: does the HCA perform atomic cacheline reads (wrt other CPU operations to the same cacheline)? On Tue, Nov 12, 2013 at 03:18:35PM -0400, Anuj Kalia wrote: > Jason, > > Thanks a lot for your reply. > > I think I understand that the RDMA reader will not see the ordering This isn't just RDMA, CPU to CPU coherency is the same. To be honest, your test doesn't really show anything, the reads and writes can be interleaved in any way, and value >, == < counter are all valid outcomes. What the fence gives you is this: Read counter, then value. FENCE ensures that value >= counter. If you burst read from the HCA value and counter then the result is undefined, you don't know if counter was read before value, or the other way around. > in the updates to A[i].value and A[i].counter if they are in > different L3 cache lines. But what are the guarantees when they are > in the same cache line? Cache lines make no difference. They are not really modeled as part of the coherency API the processor presents. Two nearby writes in the instruction stream might be merged into an atomic cache line update, or they might not. You have no control over this > That's right - thanks for the detailed explanation! However, I'm > assuming that the HCA performs atomic cacheline reads (I don't have > a lot of basis for this assumption and it would be great if someone > could tell me more about it). If that is true, 'read a.value' and > 'read a.counter' are not 2 separate operations. Instead, there is > one 'read cacheline(a)' - that should provide a snapshot of a's > state at CPU1. That is an implementation detail, there is no architectural guarantee. I don't think any current implementations provides atomic cacheline reads. > I was not assuming atomic writes to the entire cacheline - I was > only assuming that the ordering imposed by mfence is preserved in > cache - the write to 'a.value' appears in the cache hierarchy before > the write to 'a.counter'. mfence preserves the ordering, but there is no such thing as an atomic cache line read or write. So the only way to see the ordering created by mfence is with two non-burst reads, strongly ordered in time. (Note, transactional memory extensions create something that looks an awful lot like an atomic cache line write. However that stuff is still really new so not alot of info on how it co-exists with DMA/etc) On Tue, Nov 12, 2013 at 2:31 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Tue, Nov 12, 2013 at 06:31:04AM -0400, Anuj Kalia wrote: > >> That makes sense. This way, we have no consistency between the CPU's >> view and the HCA's view - it all depends when the cache gets flushed >> to RAM. > > What you are talking about is firmly in undefined territory. You might > be able to get something to work today, but tomorrows CPUs and HCAs > might mess it up. > > You will never reliably get the guarentee you desired with the scheme > you have. Even with two CPUs it is not going to happen. > >> I have a remote client which reads the struct A[i] from the server >> (via RDMA) in a loop. Sometimes in the value that the client reads, >> A[i].counter is larger than A[i].value. i.e., I see the newer value of >> A[i].counter but A[i].value corresponds to a previous iteration of the >> server's loop. > > This is a fundamental mis-understanding of what FENCE does, it just > makes the writes happen in-order, it doesn't alter the reader side > > CPU1 CPU2 > read avalue > value = counter > FENCE > a.counter = counter > read a.counter > > value < counter > > > CPU1 CPU2 > a.value = counter > read a.value > FENCE > a.counter = counter > read a.coutner > > value < counter > > > CPU1 CPU2 > a.value = counter > FENCE > read a.value > < SCHEDULE > > a.counter = counter > read a.coutner > > value < counter > > etc. > > This stuff is hard, if you want a crazy scheme to be reliable you need > to have really detailed understanding of what is actually being > guarenteed. > >> However, if the HCA performs reads from L3 cache, then everything >> should be consistent, right? While ordering the writes, I think we >> can > > No. The cache makes no difference. Fundamentally you aren't atomically > writing cache lines. You are writing single values. > > 99% of the time it might look like atomic cache line writes, but there > is a 1% where that assumption will break. > > Probably the best you can do is a collision detect scheme: > > uint64_t counter > void data[]; > > writer > counter++ > FENCE > data = [.....]; > FENCE > counter++ > > reader: > read counter > if counter % 2 == 1: retry > read data > read counter > if counter != last_counter: retry > > But even something as simple as that probably has scary races - I only > thought about it for a few moments. :) > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CADPSxAgF1CAiYoYbxbCON4NCD-tH8cAsJFRtECkTGJJQC4MXCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <CADPSxAgF1CAiYoYbxbCON4NCD-tH8cAsJFRtECkTGJJQC4MXCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-12 21:11 ` Jason Gunthorpe [not found] ` <20131112211123.GA29132-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2013-11-12 21:11 UTC (permalink / raw) To: Anuj Kalia Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Tue, Nov 12, 2013 at 04:59:19PM -0400, Anuj Kalia wrote: > Thanks again. So we conclude there is nothing like an atomic cacheline > read. Then my current design is a dud. But there should be 8 byte > atomicity, right? I think I can leverage that to get what I want. 64 bit CPUs do have 64 bit atomic stores, so you can rely on DMAs seeing only values you've written and not some combination of old/new bits. > This part is interesting (from Jason's reply): > "If you burst read from the HCA value and counter then the result is > undefined, you don't know if counter was read before value, or the > other way around." > Is there a way of knowing the order in which they are read - for > example, I heard in a talk that there is a left-to-right ordering > when So, this I don't know. I don't think anyone has ever had a need to look into that, it is certainly not defined. What you are asking is how does memory write ordering interact with a burst read. > a HCA reads a contiguous buffer. This could be totally architecture > specific, for example, I just want the answer for Mellanox ConnectX-3 > cards. I think I can check this experimentally, but a definitive > answer would be great. The talk you heard about left-to-write ordering was probably in the context of DMA burst writes and MPI polling. In this case the DMA would write DDDDDP, and the MPI would poll on P. Once P is written it assumes that D is visible. This is undefined in general, but ensured in some cases on Intel and Mellanox. I'm not sure if D and P have to be in the same cache line, but you probably need a fence after reading P.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20131112211123.GA29132-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <20131112211123.GA29132-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-11-13 6:55 ` Anuj Kalia [not found] ` <CADPSxAhzmaut9s9L1fv5urhzX8xKU9GbL6z1TkOX3FuM4NUsww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Anuj Kalia @ 2013-11-13 6:55 UTC (permalink / raw) To: Jason Gunthorpe Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Tue, Nov 12, 2013 at 5:11 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Tue, Nov 12, 2013 at 04:59:19PM -0400, Anuj Kalia wrote: > >> Thanks again. So we conclude there is nothing like an atomic cacheline >> read. Then my current design is a dud. But there should be 8 byte >> atomicity, right? I think I can leverage that to get what I want. > > 64 bit CPUs do have 64 bit atomic stores, so you can rely on DMAs > seeing only values you've written and not some combination of old/new > bits. That's a relief :). >> This part is interesting (from Jason's reply): >> "If you burst read from the HCA value and counter then the result is >> undefined, you don't know if counter was read before value, or the >> other way around." > >> Is there a way of knowing the order in which they are read - for >> example, I heard in a talk that there is a left-to-right ordering >> when > > So, this I don't know. I don't think anyone has ever had a need to > look into that, it is certainly not defined. What you are asking is > how does memory write ordering interact with a burst read. OK. I'll do some experiments to figure out the order in which cacheline words are read by the HCA. I'll post my findings if they're interesting. >> a HCA reads a contiguous buffer. This could be totally architecture >> specific, for example, I just want the answer for Mellanox ConnectX-3 >> cards. I think I can check this experimentally, but a definitive >> answer would be great. > > The talk you heard about left-to-write ordering was probably in the > context of DMA burst writes and MPI polling. > > In this case the DMA would write DDDDDP, and the MPI would poll on > P. Once P is written it assumes that D is visible. The talk wasn't about MPI but you're right. It was about RDMA writes and CPU polls. Thanks for making that clear. I don't know what you meant by burst writes: do you mean several RDMA writes or one large write? I'm concered with the order in which data is written out in one large RDMA write (I'm concerned with RDMA reads too). For example, if I read/write 64 bytes addressed from "buf" to "buf+64", does [buf, buf+7] get read/written first or does [buf+56, buf+63]? I guess now is the time I run lots of micro experiments. Thanks a lot for the help everyone. > This is undefined in general, but ensured in some cases on Intel and > Mellanox. I'm not sure if D and P have to be in the same cache line, > but you probably need a fence after reading P.. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CADPSxAhzmaut9s9L1fv5urhzX8xKU9GbL6z1TkOX3FuM4NUsww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <CADPSxAhzmaut9s9L1fv5urhzX8xKU9GbL6z1TkOX3FuM4NUsww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-13 18:09 ` Jason Gunthorpe [not found] ` <20131113180915.GA6597-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2013-11-13 18:09 UTC (permalink / raw) To: Anuj Kalia Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Nov 13, 2013 at 02:55:53AM -0400, Anuj Kalia wrote: > I don't know what you meant by burst writes: do you mean several RDMA > writes or one large write? I'm concered with the order in which data A RDMA write will be split up by the HCA into a burst of PCI MemoryWr operations. > I guess now is the time I run lots of micro experiments. Thanks a lot > for the help everyone. Carefull, experiments can't prove that order is guranteed to be present, they can only show if it certainly isn't. Intel hardware is very good at hiding ordering issues 99% of the time, but in many cases there can be a stress'd condition that will show a different result. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20131113180915.GA6597-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <20131113180915.GA6597-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-11-14 5:12 ` Anuj Kalia [not found] ` <CADPSxAiepGuzWYXjyDxnSzER5MqL57fZ9mh83SLwV461PwZO3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Anuj Kalia @ 2013-11-14 5:12 UTC (permalink / raw) To: Jason Gunthorpe Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Jason, Thanks again :). I found another similar thread: http://www.spinics.net/lists/linux-rdma/msg02709.html. The conclusion there was that although Infiniband specs don't specify any ordering of writes, many people assume left-to-right ordering anyway. There is no mention of reads though. So I did the micro experiments and I found that although writes follow the left-right ordering, reads do not. More details follow: 1. Write ordering experiment: 1.a. In the nth iteration, a client writes a buffer containing C ~ 1024 integers (each equal to 'n') to the server. The client sleeps for 2000 us between iterations. 1.b. The server busily polls for a change to the Cth integer. When the Cth integer changes from i to i+1, it checks if the entire buffer is equal to i+1. The check always passes (I've tried over 15 million checks). The test fails if the polled integer is not the rightmost integer. 2. Read ordering experiment: 2.a. In the nth iteration, the server writes 'n' to C ~ 1024 integers in a local buffer. The server does the write in reverse order (starting from index C-1). It then sleeps for 2000 us. 2.b. The client continuously reads the buffer. When the Cth integer in the read sink changes from i to i+1, it checks if all the integers in the buffer are i+1. This check fails (although rarely). This shows that reads are NOT ordered left to right. The read pattern that I'd expect is HHHH...HHHH (where H corresponds to i+1). However, I can see patterns like HH..LLLLL...HH (L corresponds to i). This is wrong because we don't expect i's to be lingering around after the first integer has become i+1 (under the false assumption that reads happen left-to-right). Curiously, whenever there are stale i's, they are always such that the contiguous chunk of i's would fit inside a cacheline. I'm seeing 16 i's and 48 i's usually. 2.c. The check always succeeds if C is 16 (the buffer fits inside a cacheline). I've done 15 million checks, will do much more tonight. So, another question: why are the reads unordered while the writes are ordered? I think by now we can assume write ordering (my experiments + MVAPICH uses it). Can the PCI reorder the reads issued by the HCA? On Wed, Nov 13, 2013 at 2:09 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Wed, Nov 13, 2013 at 02:55:53AM -0400, Anuj Kalia wrote: > >> I don't know what you meant by burst writes: do you mean several RDMA >> writes or one large write? I'm concered with the order in which data > > A RDMA write will be split up by the HCA into a burst of PCI MemoryWr > operations. > >> I guess now is the time I run lots of micro experiments. Thanks a lot >> for the help everyone. > > Carefull, experiments can't prove that order is guranteed to be > present, they can only show if it certainly isn't. Aah, unfortunately that's true. However, I ran experiments anyway. If people have been assuming an ordering on writes, I guess I can check if reads are ordered too. > Intel hardware is very good at hiding ordering issues 99% of the time, > but in many cases there can be a stress'd condition that will show a > different result. Hmm.. I'm willing to run billions of iterations of the test. That should give some confidence. > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CADPSxAiepGuzWYXjyDxnSzER5MqL57fZ9mh83SLwV461PwZO3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <CADPSxAiepGuzWYXjyDxnSzER5MqL57fZ9mh83SLwV461PwZO3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-14 19:05 ` Jason Gunthorpe [not found] ` <20131114190514.GB21549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Jason Gunthorpe @ 2013-11-14 19:05 UTC (permalink / raw) To: Anuj Kalia Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, Nov 14, 2013 at 01:12:55AM -0400, Anuj Kalia wrote: > So, another question: why are the reads unordered while the writes are > ordered? I think by now we can assume write ordering (my experiments + > MVAPICH uses it). Can the PCI reorder the reads issued by the HCA? Without fencing there is no gurantee in what order things are made visible, and the CPU will flush its write buffers however it likes. The PCI subsystem can also re-order reads however it likes, that is part of the PCI spec. In a 2 socket system don't be surprised if cache lines on different sockets complete out of order. Think of this as a classic multi-threaded race condition, and not related to PCI. If you do the same test using 2 threads you probably get the same results. > > Intel hardware is very good at hiding ordering issues 99% of the time, > > but in many cases there can be a stress'd condition that will show a > > different result. > Hmm.. I'm willing to run billions of iterations of the test. That > should give some confidence. Not really, repeating the same test billions of times is not comprehensive. You need to stress the system in all sorts of different ways to see different behavior. For instance, in a 2 socket system there are likely all sorts of crazy sensitivities that depend on which socket the memory lives, which socket holds the newest cacheline, which socket has an old line, which socket is connected directly to the HCA, etc. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <20131114190514.GB21549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <20131114190514.GB21549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2013-11-14 19:33 ` Anuj Kalia [not found] ` <CADPSxAg0k5SuxCX=3CMNV8-xME55p3iL4BMqnq0ji---kN6ZEg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Anuj Kalia @ 2013-11-14 19:33 UTC (permalink / raw) To: Jason Gunthorpe Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Jason, I just got an email saying that Mellanox does infact use an ordering for reads and writes. So I think we can blame the CPU or the PCI for the unordered reads. On Thu, Nov 14, 2013 at 3:05 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Thu, Nov 14, 2013 at 01:12:55AM -0400, Anuj Kalia wrote: > >> So, another question: why are the reads unordered while the writes are >> ordered? I think by now we can assume write ordering (my experiments + >> MVAPICH uses it). Can the PCI reorder the reads issued by the HCA? > > Without fencing there is no gurantee in what order things are made > visible, and the CPU will flush its write buffers however it likes. I'm using fencing in the read experiment. The code at the server looks like this: while(1) { for(i = 0; i < EXTENT_CAPACITY; i++) { ptr[EXTENT_CAPACITY - i - 1] = iter; asm volatile ("" : : : "memory"); asm volatile("mfence" ::: "memory"); } iter ++; usleep(2000 + (rand() % 200)); } > The PCI subsystem can also re-order reads however it likes, that is > part of the PCI spec. In a 2 socket system don't be surprised if cache > lines on different sockets complete out of order. > Think of this as a classic multi-threaded race condition, and not > related to PCI. If you do the same test using 2 threads you probably > get the same results. > The PCI explanation sounds good. However, with a fence after every update, I don't think multiple sockets will be a problem. >> > Intel hardware is very good at hiding ordering issues 99% of the time, >> > but in many cases there can be a stress'd condition that will show a >> > different result. > >> Hmm.. I'm willing to run billions of iterations of the test. That >> should give some confidence. > > Not really, repeating the same test billions of times is not > comprehensive. You need to stress the system in all sorts of > different ways to see different behavior. Hmm.. It's not really the same test. My server sleeps for a randomly chosen large duration between updates. If the test passes for many iterations, we can assume that we've tested a lot of interleavings. But yes, that doesn't give 100% confidence. > For instance, in a 2 socket system there are likely all sorts of crazy > sensitivities that depend on which socket the memory lives, which > socket holds the newest cacheline, which socket has an old line, which > socket is connected directly to the HCA, etc. Again, does that matter with fences? With a fence after every update, there is a real time ordering for when the updates appear in the cache hierarchy regardless of the socket. > > Jason Regards, Anuj -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CADPSxAg0k5SuxCX=3CMNV8-xME55p3iL4BMqnq0ji---kN6ZEg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <CADPSxAg0k5SuxCX=3CMNV8-xME55p3iL4BMqnq0ji---kN6ZEg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-14 19:47 ` Anuj Kalia 0 siblings, 0 replies; 16+ messages in thread From: Anuj Kalia @ 2013-11-14 19:47 UTC (permalink / raw) To: Jason Gunthorpe Cc: Gabriele Svelto, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org I should do the experiment with 2 processes however.. On Thu, Nov 14, 2013 at 3:33 PM, Anuj Kalia <anujkaliaiitd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Jason, > > I just got an email saying that Mellanox does infact use an ordering > for reads and writes. So I think we can blame the CPU or the PCI for > the unordered reads. > > On Thu, Nov 14, 2013 at 3:05 PM, Jason Gunthorpe > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: >> On Thu, Nov 14, 2013 at 01:12:55AM -0400, Anuj Kalia wrote: >> >>> So, another question: why are the reads unordered while the writes are >>> ordered? I think by now we can assume write ordering (my experiments + >>> MVAPICH uses it). Can the PCI reorder the reads issued by the HCA? >> >> Without fencing there is no gurantee in what order things are made >> visible, and the CPU will flush its write buffers however it likes. > I'm using fencing in the read experiment. The code at the server looks > like this: > > while(1) { > for(i = 0; i < EXTENT_CAPACITY; i++) { > ptr[EXTENT_CAPACITY - i - 1] = iter; > asm volatile ("" : : : "memory"); > asm volatile("mfence" ::: "memory"); > } > iter ++; > usleep(2000 + (rand() % 200)); > } > >> The PCI subsystem can also re-order reads however it likes, that is >> part of the PCI spec. In a 2 socket system don't be surprised if cache >> lines on different sockets complete out of order. >> Think of this as a classic multi-threaded race condition, and not >> related to PCI. If you do the same test using 2 threads you probably >> get the same results. >> > The PCI explanation sounds good. > However, with a fence after every update, I don't think multiple > sockets will be a problem. >>> > Intel hardware is very good at hiding ordering issues 99% of the time, >>> > but in many cases there can be a stress'd condition that will show a >>> > different result. >> >>> Hmm.. I'm willing to run billions of iterations of the test. That >>> should give some confidence. >> >> Not really, repeating the same test billions of times is not >> comprehensive. You need to stress the system in all sorts of >> different ways to see different behavior. > Hmm.. It's not really the same test. My server sleeps for a randomly > chosen large duration between updates. If the test passes for many > iterations, we can assume that we've tested a lot of interleavings. > But yes, that doesn't give 100% confidence. >> For instance, in a 2 socket system there are likely all sorts of crazy >> sensitivities that depend on which socket the memory lives, which >> socket holds the newest cacheline, which socket has an old line, which >> socket is connected directly to the HCA, etc. > Again, does that matter with fences? With a fence after every update, > there is a real time ordering for when the updates appear in the cache > hierarchy regardless of the socket. >> >> Jason > > Regards, > Anuj -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RDMA and memory ordering 2013-11-12 10:31 ` Anuj Kalia 2013-11-12 18:31 ` Jason Gunthorpe @ 2013-11-13 18:23 ` Gabriele Svelto [not found] ` <5283C3B2.6010106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 16+ messages in thread From: Gabriele Svelto @ 2013-11-13 18:23 UTC (permalink / raw) To: Anuj Kalia; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 12/11/2013 11:31, Anuj Kalia wrote: > I believe the atomic operations would be a lot more expensive than > reads/writes. I'm targetting maximum performance so I don't want to > look that way yet. This sounds like premature optimization to me which as you know is the root of all evil :) Try using the atomic primitives, they have been designed specifically for this kind of scenario, and then measure their performance in the real world before spending time on optimizing something that might just be fast enough for your purposes (and far more robust). If you're already polling your CQs those operations will be *very* fast. Gabriele -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <5283C3B2.6010106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <5283C3B2.6010106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-11-14 2:11 ` Anuj Kalia 0 siblings, 0 replies; 16+ messages in thread From: Anuj Kalia @ 2013-11-14 2:11 UTC (permalink / raw) To: Gabriele Svelto; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Nov 13, 2013 at 2:23 PM, Gabriele Svelto <gabriele.svelto-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On 12/11/2013 11:31, Anuj Kalia wrote: >> >> I believe the atomic operations would be a lot more expensive than >> reads/writes. I'm targetting maximum performance so I don't want to >> look that way yet. > > > This sounds like premature optimization to me which as you know is the root > of all evil :) > Try using the atomic primitives, they have been designed specifically for > this kind of scenario, and then measure their performance in the real world > before spending time on optimizing something that might just be fast enough > for your purposes (and far more robust). If you're already polling your CQs > those operations will be *very* fast. > I'm working on a project where I'm trying to extract the maximum IOPS from a server for an application. If atomic operations are even 2-X slower than RDMA writes (which I'd expect because they involve a read and a write), I can't use them. However, it would be interesting to find their performance. I'll try that. Thanks! > Gabriele -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: RDMA and memory ordering
@ 2013-11-11 23:13 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388CF721E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Hefty, Sean @ 2013-11-11 23:13 UTC (permalink / raw)
To: Anuj Kalia, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> I am running a server which essentially does the following operations in a
> loop:
>
> A[i].value = counter; //It's actually something else
> asm volatile ("" : : : "memory");
> asm volatile("mfence" ::: "memory");
> A[i].counter = counter;
> printf("%d %d\n", A[i].value, A[i].counter);
> counter ++;
>
> Basically, I want a fresh value of A[i].counter to indicate a fresh
> A[i].value.
>
> I have a remote client which reads the struct A[i] from the server
> (via RDMA) in a loop. Sometimes in the value that the client reads,
> A[i].counter is larger than A[i].value. i.e., I see the newer value of
> A[i].counter but A[i].value corresponds to a previous iteration of the
> server's loop.
>
> How can this happen in the presence of memory barriers? With barriers,
> A[i].counter should be updated later and therefore should always be
> smaller than A[i].value.
>
> Thanks for your help!
It seems possible for a remote read to start retrieving memory before an update, such that A[i].value is read and placed on the wire, the server modifies the memory, and then A[i].counter is read and placed on the wire. It may depend on how large the data is that's being read and the RDMA read implementation.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread[parent not found: <1828884A29C6694DAF28B7E6B8A8237388CF721E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: RDMA and memory ordering [not found] ` <1828884A29C6694DAF28B7E6B8A8237388CF721E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2013-11-12 7:28 ` Anuj Kalia 0 siblings, 0 replies; 16+ messages in thread From: Anuj Kalia @ 2013-11-12 7:28 UTC (permalink / raw) To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Sean, Thanks for your reply. Sorry for the duplicate email! Your argument is correct if the structs are large. In my case, the array A[] contains 32 byte structs that don't span multiple L3 cachelines (ensured via memalign). I've heard that RDMA reads happen at L3 cacheline granularity - in that case, A[i] will be read once and placed on the wire. Then we could see partial writes to A[i].value or A[i].counter, but we'll never see a completed update to A[i].counter before the corresponding update to A[i].value. I had parallel questions that could help me understand the issue better: 1. I am I right in assuming that RDMA reads happen from the remote host's L3 cache? My processors are from the AMD Opteron 6200 series. The argument I heard in favor of this is that 'modern' processors have on-chip memory controllers, so DMA reads always come from the L3 cache. 2. Are reads from the L3 cache always consistent with L1 and L2 cache? i.e. can some update be cached inside L1 cache so that an L3 read sees an old value? I think that this doesn't happen or I would be seeing lots of stale reads. 3. When we do an RDMA write, is there an order in which the bytes get written? For example, I heard during a talk that there is a left-to-right ordering i.e. the lower addressed bytes get written before higher addressed bytes. Is this correct? In general, can I read more about the hardware aspects of RDMA somewhere? --Anuj On Mon, Nov 11, 2013 at 7:13 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: >> I am running a server which essentially does the following operations in a >> loop: >> >> A[i].value = counter; //It's actually something else >> asm volatile ("" : : : "memory"); >> asm volatile("mfence" ::: "memory"); >> A[i].counter = counter; >> printf("%d %d\n", A[i].value, A[i].counter); >> counter ++; >> >> Basically, I want a fresh value of A[i].counter to indicate a fresh >> A[i].value. >> >> I have a remote client which reads the struct A[i] from the server >> (via RDMA) in a loop. Sometimes in the value that the client reads, >> A[i].counter is larger than A[i].value. i.e., I see the newer value of >> A[i].counter but A[i].value corresponds to a previous iteration of the >> server's loop. >> >> How can this happen in the presence of memory barriers? With barriers, >> A[i].counter should be updated later and therefore should always be >> smaller than A[i].value. >> >> Thanks for your help! > > It seems possible for a remote read to start retrieving memory before an update, such that A[i].value is read and placed on the wire, the server modifies the memory, and then A[i].counter is read and placed on the wire. It may depend on how large the data is that's being read and the RDMA read implementation. > > - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-11-14 19:47 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-10 10:46 RDMA and memory ordering Anuj Kalia
[not found] ` <CADPSxAhAGYZude8CM65-UDvfiPscStgcNsAfs=2XBbntg-wL0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-12 10:16 ` Gabriele Svelto
[not found] ` <5281FFF9.5070705-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-11-12 10:31 ` Anuj Kalia
2013-11-12 18:31 ` Jason Gunthorpe
[not found] ` <20131112183142.GB6639-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-12 20:59 ` Anuj Kalia
[not found] ` <CADPSxAgF1CAiYoYbxbCON4NCD-tH8cAsJFRtECkTGJJQC4MXCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-12 21:11 ` Jason Gunthorpe
[not found] ` <20131112211123.GA29132-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-13 6:55 ` Anuj Kalia
[not found] ` <CADPSxAhzmaut9s9L1fv5urhzX8xKU9GbL6z1TkOX3FuM4NUsww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-13 18:09 ` Jason Gunthorpe
[not found] ` <20131113180915.GA6597-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-14 5:12 ` Anuj Kalia
[not found] ` <CADPSxAiepGuzWYXjyDxnSzER5MqL57fZ9mh83SLwV461PwZO3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-14 19:05 ` Jason Gunthorpe
[not found] ` <20131114190514.GB21549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-11-14 19:33 ` Anuj Kalia
[not found] ` <CADPSxAg0k5SuxCX=3CMNV8-xME55p3iL4BMqnq0ji---kN6ZEg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-14 19:47 ` Anuj Kalia
2013-11-13 18:23 ` Gabriele Svelto
[not found] ` <5283C3B2.6010106-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-11-14 2:11 ` Anuj Kalia
-- strict thread matches above, loose matches on Subject: below --
2013-11-11 23:13 Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237388CF721E-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2013-11-12 7:28 ` Anuj Kalia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox