From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [PATCH 04/16] lib/utf8norm.c: reduce the size of utf8data[] Date: Mon, 6 Oct 2014 08:52:44 +1100 Message-ID: <20141005215244.GE12693@dastard> References: <20141003214758.GY1865@sgi.com> <20141003215455.GC1865@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org, olaf@sgi.com, xfs@oss.sgi.com To: Ben Myers Return-path: Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:34165 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751734AbaJEVws (ORCPT ); Sun, 5 Oct 2014 17:52:48 -0400 Content-Disposition: inline In-Reply-To: <20141003215455.GC1865@sgi.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, Oct 03, 2014 at 04:54:55PM -0500, Ben Myers wrote: > From: Olaf Weber > > Remove the Hangul decompositions from the utf8data trie, and do > algorithmic decomposition to calculate them on the fly. To store > the decomposition the caller of utf8lookup()/utf8nlookup() must > provide a 12-byte buffer, which is used to synthesize a leaf with > the decomposition. Trie size is reduced from 245kB to 90kB. > > This change also contains a number of robustness fixes to the > trie generator mkutf8data.c. Please separate out the robustness fixes or merge them back into the original patch. e.g. Bulk renaming of code like this: > static int > -utf8key(unsigned int key, char keyval[]) > -{ > - int keylen; > - > - if (key < 0x80) { > - keyval[0] = key; > - keylen = 1; > - } else if (key < 0x800) { > - keyval[1] = key & UTF8_V_MASK; > - keyval[1] |= UTF8_N_BITS; > - key >>= UTF8_V_SHIFT; .... > +utf8encode(char *str, unsigned int val) > +{ > + int len; > + > + if (val < 0x80) { > + str[0] = val; > + len = 1; > + } else if (val < 0x800) { > + str[1] = val & UTF8_V_MASK; > + str[1] |= UTF8_N_BITS; > + val >>= UTF8_V_SHIFT; Doesn't belong in a patch that introduces special hangul character handling.... Cheers, Dave. -- Dave Chinner david@fromorbit.com