From mboxrd@z Thu Jan 1 00:00:00 1970 From: William Allen Simpson Subject: Re: [net-next-2.6 PATCH RFC] TCPCT part 1d: generate Responder Cookie Date: Mon, 02 Nov 2009 05:39:14 -0500 Message-ID: <4AEEB6D2.7090505@gmail.com> References: <4AEAC763.4070200@gmail.com> <4AED86AD.6010906@gmail.com> <4AEDCD7C.2010403@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Linux Kernel Developers , Linux Kernel Network Developers To: Eric Dumazet Return-path: In-Reply-To: <4AEDCD7C.2010403@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Eric Dumazet wrote: > William Allen Simpson a =E9crit : >> Since October 4th, I've repeatedly asked publicly for assistance wit= h these >> Linux-specific memory locking constructs and cryptography. I've als= o sent >> private messages. No help has been forthcoming. None. Nada. >> >> At this point, I've spent weeks re-spinning code that I'd understood= was >> approved a year ago. The whole project should have been finished by= now! >=20 > Your messages on netdev are two weeks old, not one year, and came dur= ing > LKS. Many developpers were busy in Japan. >=20 Thank you for your helpful comments. I'm not familiar with LKS. Is th= at a month long project? The message in question was "query: per cpu hash pool spinlock", sent: "Sun, 04 Oct 2009 06:37:39 -0400". My initial query for this series was "query: adding a sysctl", sent: "Fri, 02 Oct 2009 00:00:05 -0400". The earlier RFC approval was circa 1 year, 11 weeks, 2 days, 23 hours a= go: http://article.gmane.org/gmane.linux.network/102779 >> So, I'll try a larger audience. Could somebody take a look at my us= age of >> read and write locking? >> >> NB, I'm trying to port some 15-year-old fairly simple and straightfo= rward >> (single cpu) code from the KA9Q cooperative multitasking platform. >> >> I've examined existing code used for syncookies and TCP MD5 authenti= cators. >> Neither meets my needs, as this secret is updated every few minutes.= Both >> have very different approaches. They are rarely used. My code will= be >> used on the order of tens of thousands of connections per second. >> >> Moreover, it seems to my naive eye that the syncookie per cpu code s= imply >> doesn't work properly. The workspace is allocated per cpu, but the = cpu >> could change during the extensive SHA1 computations. Bug? >> The (buggy?) syncookie code that I'm avoiding/replacing is: static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], ipv4_cookie_scratch); static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16= dport, u32 count, int c) { __u32 *tmp =3D __get_cpu_var(ipv4_cookie_scratch); memcpy(tmp + 4, syncookie_secret[c], sizeof(syncookie_secret[c])); tmp[0] =3D (__force u32)saddr; tmp[1] =3D (__force u32)daddr; tmp[2] =3D ((__force u32)sport << 16) + (__force u32)dport; tmp[3] =3D count; sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5); return tmp[17]; } It appears to me that the operations could be interrupted at any time, = and the fairly expensive sha transform (or probably any of the assignments) could be corrupted? That is, cpu0 grabs a scratch area, copies some data into it, interrupt= s, cpu0 comes along again with another packet, points into the same worksp= ace, mashes the data, while cpu1 completes the previous operation with the old tmp pointer on the stack. Worse, this is called twice, each time getting tmp, and mashing the dat= a, and at the same time others are calling it twice more for verification. Since syncookies only occur under stress, the code isn't well tested, a= nd the only symptom would be the returned packet would be dropped after failing to verify. Since this only happens when lots of packets are arriving, dropped packets probably aren't noticed. However, that would be unacceptable for my code. > This patch looks fine, but I dont see how this new function is used. >=20 It is used at the places in the previous numbered patch (part 1c) that currently say, "secret recipe not yet implemented". And perhaps to replace the syncookie code mentioned above. > Some points : >=20 > 1) We are working hard to remove rwlocks from network stack, so pleas= e dont > add a new one.=20 Botheration! I wish that was *documented* somewhere, perhaps in: Documentation/spinlocks.txt > add a new one. You probably can use a seqlock or RCU, or a server han= dling=20 > 10.000 connections request per second on many NIC will hit this rwloc= k. >=20 Yep. Nobody in the tcp code is using them, but RCU looks interesting, = and may be well suited. I'll read up on it. > 2)=20 >=20 > } else if (unlikely(time_after(jiffy, tcp_secret_primary->expires)))= { > get_random_bytes(secrets, sizeof(secrets)); >=20 > write_lock(&tcp_secret_locker); >=20 > It would be better to first get the lock, then get random_bytes, in o= rder > not wasting entropy. >=20 I'd put that outside the locked section as it's considerably more expen= sive (hashing) than a memcpy. Entropy is never "wasted", as good cryptograp= hic random functions never reveal underlying entropy. I know there were so= me papers a few years ago that found flaws in Linux random functions, but surely that's been fixed. I assume you folks are using Yarrow or its successors by now. In any case, perhaps this RCU thingy obviates that issue. >=20 > 3) If you change secret ever 600 seconds, it might be better to use a= timer > so that you dont have to check expiration and this logic at each SYN = packet. > (Disociate the lookup (read-only, done many time per second) from the= updates > (trigerred by a timer every 600 secs)) >=20 Part of the logic is not to update the generation secret until a new pa= cket arrives, allowing maximal entropy to be gathered beforehand. This is especially worrisome for the first packets after booting. It takes int= o account inter-packet arrival timing, and expiring after MSL. So, a tim= er isn't feasible in this instance. > (Not counting you'll probably need to use a similar lookup algo for t= he ACK > packet coming from client) >=20 Yes. A timer could be used to expire and clear old verification secret= s, and I'll look into that in the future ACK patch. Thanks again.