From mboxrd@z Thu Jan 1 00:00:00 1970 From: Luke Gorrie Subject: Re: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization Date: Tue, 27 Jan 2015 14:57:44 +0100 Message-ID: References: <1421632414-10027-1-git-send-email-zhihong.wang@intel.com> Reply-To: snabb-devel@googlegroups.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7ba96ed89b8703050da2a32c Cc: "dev@dpdk.org" To: "snabb-devel@googlegroups.com" Return-path: Sender: snabb-devel@googlegroups.com In-Reply-To: List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , List-Id: dev.dpdk.org --047d7ba96ed89b8703050da2a32c Content-Type: text/plain; charset=UTF-8 Hi again John, Thank you for the patient answers :-) Thank you for pointing this out: I was mistakenly testing your Sandy Bridge code on Haswell (lacking -DRTE_MACHINE_CPUFLAG_AVX2). Correcting that, your code is both the fastest and the smallest in my humble micro benchmarking tests. Looks like you have done great work! You probably knew that already :-) but thank you for walking me through it. The code compiles to 745 bytes of object code (smaller than glibc 2.20 memcpy) and cachebenches like this: Memory Copy Library Cache Test C Size Nanosec MB/sec % Chnge ------- ------- ------- ------- 256 0.01 97587.60 1.00 384 0.01 97628.83 1.00 512 0.01 97613.95 1.00 768 0.01 147811.44 0.66 1024 0.01 158938.68 0.93 1536 0.01 168487.49 0.94 2048 0.01 174278.83 0.97 3072 0.01 156922.58 1.11 4096 0.01 145811.59 1.08 6144 0.01 157388.27 0.93 8192 0.01 149616.95 1.05 12288 0.01 149064.26 1.00 16384 0.01 107895.06 1.38 the key difference from my perspective is that glibc 2.20 memcpy performance goes way down for >= 2048 bytes when they switch from vector moves to string moves, while your code stays consistent. I will take it for a spin in a real application. Cheers, -Luke -- You received this message because you are subscribed to the Google Groups "Snabb Switch development" group. To unsubscribe from this group and stop receiving emails from it, send an email to snabb-devel+unsubscribe@googlegroups.com. To post to this group, send an email to snabb-devel@googlegroups.com. Visit this group at http://groups.google.com/group/snabb-devel. --047d7ba96ed89b8703050da2a32c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: base64 PGRpdiBkaXI9Imx0ciI+PGRpdj5IaSBhZ2FpbiBKb2huLDwvZGl2PjxkaXY+PGJyPjwvZGl2Pjxk aXY+VGhhbmsgeW91IGZvciB0aGUgcGF0aWVudCBhbnN3ZXJzIDotKTwvZGl2PjxkaXY+PGJyPjwv ZGl2PjxkaXY+VGhhbmsgeW91IGZvciBwb2ludGluZyB0aGlzIG91dDogSSB3YXMgbWlzdGFrZW5s eSB0ZXN0aW5nIHlvdXIgU2FuZHkgQnJpZGdlIGNvZGUgb24gSGFzd2VsbCAobGFja2luZyAtRFJU RV9NQUNISU5FX0NQVUZMQUdfQVZYMikuPC9kaXY+PGRpdj48YnI+PC9kaXY+PGRpdj5Db3JyZWN0 aW5nIHRoYXQsIHlvdXIgY29kZSBpcyBib3RoIHRoZSBmYXN0ZXN0IGFuZCB0aGUgc21hbGxlc3Qg aW4gbXkgaHVtYmxlIG1pY3JvIGJlbmNobWFya2luZyB0ZXN0cy48L2Rpdj48ZGl2Pjxicj48L2Rp dj48ZGl2Pkxvb2tzIGxpa2UgeW91IGhhdmUgZG9uZSBncmVhdCB3b3JrISBZb3UgcHJvYmFibHkg a25ldyB0aGF0IGFscmVhZHkgOi0pIGJ1dCB0aGFuayB5b3UgZm9yIHdhbGtpbmcgbWUgdGhyb3Vn aCBpdC48L2Rpdj48ZGl2Pjxicj48L2Rpdj48ZGl2PlRoZSBjb2RlIGNvbXBpbGVzIHRvIDc0NSBi eXRlcyBvZiBvYmplY3QgY29kZSAoc21hbGxlciB0aGFuIGdsaWJjIDIuMjAgbWVtY3B5KSBhbmQg Y2FjaGViZW5jaGVzIGxpa2UgdGhpczo8L2Rpdj48ZGl2Pjxicj48L2Rpdj48Zm9udCBmYWNlPSJt b25vc3BhY2UsIG1vbm9zcGFjZSI+wqAgwqAgwqAgwqAgwqAgwqAgwqAgwqAgTWVtb3J5IENvcHkg TGlicmFyeSBDYWNoZSBUZXN0PGJyPjxicj5DIFNpemUgwqAgwqAgwqAgwqAgwqBOYW5vc2VjIMKg IMKgIMKgIMKgIE1CL3NlYyDCoCDCoCDCoCDCoCDCoCUgQ2huZ2U8YnI+LS0tLS0tLSDCoCDCoCDC oCDCoCAtLS0tLS0tIMKgIMKgIMKgIMKgIC0tLS0tLS0gwqAgwqAgwqAgwqAgLS0tLS0tLTxicj4y NTYgwqAgwqAgwqAgwqAgwqAgwqAgMC4wMSDCoCDCoCDCoCDCoCDCoCDCoDk3NTg3LjYwIMKgIMKg IMKgIMKgMS4wMCDCoCDCoCDCoCDCoCDCoCA8YnI+Mzg0IMKgIMKgIMKgIMKgIMKgIMKgIDAuMDEg wqAgwqAgwqAgwqAgwqAgwqA5NzYyOC44MyDCoCDCoCDCoCDCoDEuMDAgwqAgwqAgwqAgwqAgwqAg PGJyPjUxMiDCoCDCoCDCoCDCoCDCoCDCoCAwLjAxIMKgIMKgIMKgIMKgIMKgIMKgOTc2MTMuOTUg wqAgwqAgwqAgwqAxLjAwIMKgIMKgIMKgIMKgIMKgIDxicj43NjggwqAgwqAgwqAgwqAgwqAgwqAg MC4wMSDCoCDCoCDCoCDCoCDCoCDCoDE0NzgxMS40NCDCoCDCoCDCoCAwLjY2IMKgIMKgIMKgIMKg IMKgIDxicj4xMDI0IMKgIMKgIMKgIMKgIMKgIMKgMC4wMSDCoCDCoCDCoCDCoCDCoCDCoDE1ODkz OC42OCDCoCDCoCDCoCAwLjkzIMKgIMKgIMKgIMKgIMKgIDxicj4xNTM2IMKgIMKgIMKgIMKgIMKg IMKgMC4wMSDCoCDCoCDCoCDCoCDCoCDCoDE2ODQ4Ny40OSDCoCDCoCDCoCAwLjk0IMKgIMKgIMKg IMKgIMKgIDxicj4yMDQ4IMKgIMKgIMKgIMKgIMKgIMKgMC4wMSDCoCDCoCDCoCDCoCDCoCDCoDE3 NDI3OC44MyDCoCDCoCDCoCAwLjk3IMKgIMKgIMKgIMKgIMKgIDxicj4zMDcyIMKgIMKgIMKgIMKg IMKgIMKgMC4wMSDCoCDCoCDCoCDCoCDCoCDCoDE1NjkyMi41OCDCoCDCoCDCoCAxLjExIMKgIMKg IMKgIMKgIMKgIDxicj40MDk2IMKgIMKgIMKgIMKgIMKgIMKgMC4wMSDCoCDCoCDCoCDCoCDCoCDC oDE0NTgxMS41OSDCoCDCoCDCoCAxLjA4IMKgIMKgIMKgIMKgIMKgIDxicj42MTQ0IMKgIMKgIMKg IMKgIMKgIMKgMC4wMSDCoCDCoCDCoCDCoCDCoCDCoDE1NzM4OC4yNyDCoCDCoCDCoCAwLjkzIMKg IMKgIMKgIMKgIMKgIDxicj44MTkyIMKgIMKgIMKgIMKgIMKgIMKgMC4wMSDCoCDCoCDCoCDCoCDC oCDCoDE0OTYxNi45NSDCoCDCoCDCoCAxLjA1IMKgIMKgIMKgIMKgIMKgIDxicj4xMjI4OCDCoCDC oCDCoCDCoCDCoCAwLjAxIMKgIMKgIMKgIMKgIMKgIMKgMTQ5MDY0LjI2IMKgIMKgIMKgIDEuMDAg wqAgwqAgwqAgwqAgwqAgPGJyPjE2Mzg0IMKgIMKgIMKgIMKgIMKgIDAuMDEgwqAgwqAgwqAgwqAg wqAgwqAxMDc4OTUuMDYgwqAgwqAgwqAgMS4zOCDCoCDCoCDCoCDCoCDCoCA8L2ZvbnQ+PGRpdj48 YnI+PC9kaXY+PGRpdj50aGUga2V5IGRpZmZlcmVuY2UgZnJvbSBteSBwZXJzcGVjdGl2ZSBpcyB0 aGF0IGdsaWJjIDIuMjAgbWVtY3B5IHBlcmZvcm1hbmNlIGdvZXMgd2F5IGRvd24gZm9yICZndDs9 IDIwNDggYnl0ZXMgd2hlbiB0aGV5IHN3aXRjaCBmcm9tIHZlY3RvciBtb3ZlcyB0byBzdHJpbmcg bW92ZXMsIHdoaWxlIHlvdXIgY29kZSBzdGF5cyBjb25zaXN0ZW50LjwvZGl2PjxkaXY+PGJyPjwv ZGl2PjxkaXY+SSB3aWxsIHRha2UgaXQgZm9yIGEgc3BpbiBpbiBhIHJlYWwgYXBwbGljYXRpb24u PC9kaXY+PGRpdj48YnI+PC9kaXY+PGRpdj5DaGVlcnMsPGJyPjwvZGl2PjxkaXY+LUx1a2U8L2Rp dj48ZGl2IGNsYXNzPSJnbWFpbF9leHRyYSI+PGRpdiBjbGFzcz0iZ21haWxfcXVvdGUiPjxkaXY+ PGJyPjwvZGl2PjxkaXY+PGJyPjwvZGl2PjwvZGl2PjwvZGl2PjwvZGl2Pg0KDQo8cD48L3A+DQoN Ci0tIDxiciAvPgpZb3UgcmVjZWl2ZWQgdGhpcyBtZXNzYWdlIGJlY2F1c2UgeW91IGFyZSBzdWJz Y3JpYmVkIHRvIHRoZSBHb29nbGUgR3JvdXBzICZxdW90O1NuYWJiIFN3aXRjaCBkZXZlbG9wbWVu dCZxdW90OyBncm91cC48YnIgLz4KVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGdyb3VwIGFuZCBz dG9wIHJlY2VpdmluZyBlbWFpbHMgZnJvbSBpdCwgc2VuZCBhbiBlbWFpbCB0byA8YSBocmVmPSJt YWlsdG86c25hYmItZGV2ZWwrdW5zdWJzY3JpYmVAZ29vZ2xlZ3JvdXBzLmNvbSI+c25hYmItZGV2 ZWwrdW5zdWJzY3JpYmVAZ29vZ2xlZ3JvdXBzLmNvbTwvYT4uPGJyIC8+ClRvIHBvc3QgdG8gdGhp cyBncm91cCwgc2VuZCBlbWFpbCB0byA8YSBocmVmPSJtYWlsdG86c25hYmItZGV2ZWxAZ29vZ2xl Z3JvdXBzLmNvbSI+c25hYmItZGV2ZWxAZ29vZ2xlZ3JvdXBzLmNvbTwvYT4uPGJyIC8+ClZpc2l0 IHRoaXMgZ3JvdXAgYXQgPGEgaHJlZj0iaHR0cDovL2dyb3Vwcy5nb29nbGUuY29tL2dyb3VwL3Nu YWJiLWRldmVsIj5odHRwOi8vZ3JvdXBzLmdvb2dsZS5jb20vZ3JvdXAvc25hYmItZGV2ZWw8L2E+ LjxiciAvPgo= --047d7ba96ed89b8703050da2a32c--