From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Haggerty Subject: Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing Date: Tue, 24 Mar 2015 17:06:12 +0100 Message-ID: <55118B74.1030201@alum.mit.edu> References: <1426608016-2978-1-git-send-email-mhagger@alum.mit.edu> <20150319052620.GA30645@peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: Jeff King , git@vger.kernel.org To: Junio C Hamano , Jeff King X-From: git-owner@vger.kernel.org Tue Mar 24 17:06:47 2015 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YaRLp-00019K-A5 for gcvg-git-2@plane.gmane.org; Tue, 24 Mar 2015 17:06:45 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754498AbbCXQGk (ORCPT ); Tue, 24 Mar 2015 12:06:40 -0400 Received: from alum-mailsec-scanner-1.mit.edu ([18.7.68.12]:62704 "EHLO alum-mailsec-scanner-1.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753359AbbCXQGS (ORCPT ); Tue, 24 Mar 2015 12:06:18 -0400 X-AuditID: 1207440c-f79376d00000680a-8b-55118b762d37 Received: from outgoing-alum.mit.edu (OUTGOING-ALUM.MIT.EDU [18.7.68.33]) by alum-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id BC.51.26634.67B81155; Tue, 24 Mar 2015 12:06:14 -0400 (EDT) Received: from [192.168.69.130] (p4FC9745A.dip0.t-ipconnect.de [79.201.116.90]) (authenticated bits=0) (User authenticated as mhagger@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.13.8/8.12.4) with ESMTP id t2OG6CfX011426 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Tue, 24 Mar 2015 12:06:13 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.5.0 In-Reply-To: X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprFKsWRmVeSWpSXmKPExsUixO6iqFvWLRhq8P+VokXXlW4mi4beK8wW P1p6mC0e35rA5MDi8ax3D6PH3PuWHhcvKXt83iQXwBLFbZOUWFIWnJmep2+XwJ2xfMlupoI5 4hV/T3WyNzBuE+xi5OSQEDCRuN+1kgnCFpO4cG89WxcjF4eQwGVGif1X9jJDOOeZJJb8PMkO UsUroC1xftIXxi5GDg4WAVWJTyeFQMJsAroSi3qawQaJCgRJvLz1lxGiXFDi5MwnLCC2iICj xIkH11lBbGYBI4lZN86A2cIC4RJL781mgtg1hVHi14cTYAlOAWuJ61samSEa9CR2XP8F1Swv 0bx1NvMERoFZSHbMQlI2C0nZAkbmVYxyiTmlubq5iZk5xanJusXJiXl5qUW6hnq5mSV6qSml mxgh4cyzg/HbOplDjAIcjEo8vAFLBEKFWBPLiitzDzFKcjApifJOdxIMFeJLyk+pzEgszogv Ks1JLT7EKMHBrCTC+7wdKMebklhZlVqUD5OS5mBREudVXaLuJySQnliSmp2aWpBaBJOV4eBQ kuDV7wJqFCxKTU+tSMvMKUFIM3FwggznkhIpTs1LSS1KLC3JiAfFanwxMFpBUjxAe81B2nmL CxJzgaIQracYFaXEeR92AiUEQBIZpXlwY2FJ6hWjONCXwrzhIO08wAQH1/0KaDAT0OBz+Xwg g0sSEVJSDYxZG1r7Ew69bnknxWD5+Tw3k8yiPdJXZGeX2kYzCCzIUP2v+zhw77HrAs2zRN/8 mmO0yuXDxD6jIq8ay5a/z3v21+2+ku6Xd43D24KFcen5jKrppsX5l3ZFFj/+xWn0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On 03/19/2015 08:32 AM, Junio C Hamano wrote: > Jeff King writes: > >> I wonder how much of the boilerplate in the parse_* functions could be >> factored out to use a uintmax_t, with the caller just providing the >> range. That would make it easier to add new types like off_t, and >> possibly even constrained types (e.g., an integer from 0 to 100). On the >> other hand, you mentioned to me elsewhere that there may be some bugs in >> the range-checks of config.c's integer parsing. I suspect they are >> related to exactly this kind of refactoring, so perhaps writing things >> out is best. > > I like this idea very well. I wonder if we can implement the family > of > > parse_{type}(const char *, unsigned int flags, > const char **endptr, {type} *result) > > functions by calls a helper that internally deals with the numbers > in uintmax_t, and then checks if the value fits within the possible > range of the *result before returning. > > int parse_i(const char *str, unsigned flags, > const char **endptr, int *result) { > uintmax_t val; > int sign = 1, status; > if (*str == '-') { > sign = -1; > str++; > } > status = parse_num(str, flags, endptr, &val, INT_MAX); > if (status < 0) > return status; > *result = sign < 0 ? -val : val; > return 0; > } > > (assuming the last parameter to parse_num is used to check if the > result fits within that range). Or that may be easier and cleaner > to be done in the caller with or something like that: > > status = parse_num(str, flags, endptr, &val); > if (status < 0) > return status; > if (INT_MAX <= val * sign || val * sign <= INT_MIN) { > errno = ERANGE; > return -1; > } > > If we wanted to, we may even be able to avoid duplication of > boilerplate by wrapping the above pattern within a macro, > parameterizing the TYPE_{MIN,MAX} and using token pasting, to > expand to four necessary result types. > > There is no reason for the implementation of the parse_num() helper > to be using strtoul(3) or strtoull(3); its behaviour will be under > our total control. It can become fancier by enriching the flags > bits (e.g. allow scaling factor, etc.) only once and all variants > for various result types will get the same feature. Parsing numbers is not rocket science, but there are a lot of pitfalls, especially around overflow. It's even harder to write such code via macros and the result is less readable. This patch series is mostly about finding a reasonable API and whipping the callers into shape. That seems ambitious enough for me. I'd rather stick with boring wrappers for now and lean on strtol()/strtoul() to do the dirty work. It will be easy for a motivated person to change the implementation later. Michael -- Michael Haggerty mhagger@alum.mit.edu