From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Sunshine Subject: Re: [PATCH v5 1/1] http: Add Accept-Language header if possible Date: Wed, 3 Dec 2014 14:31:40 -0500 Message-ID: References: <1405792730-13539-1-git-send-email-eungjun.yi@navercorp.com> <1417522356-24212-1-git-send-email-eungjun.yi@navercorp.com> <1417522356-24212-2-git-send-email-eungjun.yi@navercorp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Git List , Yi EungJun , Jeff King , Peter Krefting , Junio C Hamano To: Yi EungJun X-From: git-owner@vger.kernel.org Wed Dec 03 20:31:47 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XwFeM-00025u-0k for gcvg-git-2@plane.gmane.org; Wed, 03 Dec 2014 20:31:46 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751949AbaLCTbm (ORCPT ); Wed, 3 Dec 2014 14:31:42 -0500 Received: from mail-yk0-f176.google.com ([209.85.160.176]:63718 "EHLO mail-yk0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751390AbaLCTbl (ORCPT ); Wed, 3 Dec 2014 14:31:41 -0500 Received: by mail-yk0-f176.google.com with SMTP id q200so7196849ykb.21 for ; Wed, 03 Dec 2014 11:31:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=3aklQSG36nbtCGRY7fF6pFuu4shd7HTySbCWfSB6YBk=; b=OtbYn3o3KO5Pg034lzlVWJs7vnS+MK40+JDO0hNjVWDoJ6f121h9y1etgjLApKJOAU P0x9mQO5aBcr0dEYRwA8awlbDoWwly3TrcgnMoMUR4KCDW//KPIza0uPOyzWWeVEIgtM 8OLVbeRtlAncVsZRMOyu9FnpIsEEtGE3+RCagOgrL+9TmiVZEGETeuXmz4RHnpQU3K5e aHT8bzsRwIYDt61K9vNhFnmIDq4YTRcwd16jhLDdorwFbNJtXAUMY8kQxzT8gWqhzDNo UJQWjypaxQ1UXpcHgpPBDchKqvgaVLmCrnPW54zAOYXwS/jsvMywVIFAB5AaynvjjWdR ZXzw== X-Received: by 10.236.206.74 with SMTP id k50mr8312906yho.180.1417635100719; Wed, 03 Dec 2014 11:31:40 -0800 (PST) Received: by 10.170.68.68 with HTTP; Wed, 3 Dec 2014 11:31:40 -0800 (PST) In-Reply-To: <1417522356-24212-2-git-send-email-eungjun.yi@navercorp.com> X-Google-Sender-Auth: 8tVhjpebZUhMmuH69SBsDOpH0Qk Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Tue, Dec 2, 2014 at 7:12 AM, Yi EungJun wrote: > From: Yi EungJun > > Add an Accept-Language header which indicates the user's preferred > languages defined by $LANGUAGE, $LC_ALL, $LC_MESSAGES and $LANG. > > Examples: > LANGUAGE= -> "" > LANGUAGE=ko:en -> "Accept-Language: ko, en;q=0.9, *;q=0.1" > LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *;q=0.1" > LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *;q=0.1" > > This gives git servers a chance to display remote error messages in > the user's preferred language. > > Limit the number of languages to 1,000 because q-value must not be > smaller than 0.001, and limit the length of Accept-Language header to > 4,000 bytes for some HTTP servers which cannot accept such long header. > > Signed-off-by: Yi EungJun > --- > diff --git a/http.c b/http.c > index 040f362..69624af 100644 > --- a/http.c > +++ b/http.c > @@ -68,6 +68,8 @@ static struct curl_slist *no_pragma_header; > > static struct active_request_slot *active_queue_head; > > +static struct strbuf *cached_accept_language; Is there a reason this needs to be a pointer to a strbuf rather than just a strbuf? That is wouldn't this work? static struct strbuf cached_accept_language = STRBUF_INIT; > size_t fread_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_) > { > size_t size = eltsize * nmemb; > @@ -515,6 +517,9 @@ void http_cleanup(void) > cert_auth.password = NULL; > } > ssl_cert_password_required = 0; > + > + if (cached_accept_language) > + strbuf_release(cached_accept_language); Junio already mentioned that this is leaking the memory of the strbuf struct itself which was xmalloc()'d by get_accept_language(). > } > > struct active_request_slot *get_active_slot(void) > @@ -986,6 +991,149 @@ static void extract_content_type(struct strbuf *raw, struct strbuf *type, > strbuf_addstr(charset, "ISO-8859-1"); > } > > +/* > + * Guess the user's preferred languages from the value in LANGUAGE environment > + * variable and LC_MESSAGES locale category. > + * > + * The result can be a colon-separated list like "ko:ja:en". > + */ > +static const char *get_preferred_languages(void) > +{ > + const char *retval; > + > + retval = getenv("LANGUAGE"); > + if (retval && *retval) > + return retval; > + > + retval = setlocale(LC_MESSAGES, NULL); > + if (retval && *retval && > + strcmp(retval, "C") && > + strcmp(retval, "POSIX")) > + return retval; > + > + return NULL; Mental note: This function will never return an empty string "", even if LANGUAGE or LC_MESSAGES is empty. > +} > + > +/* > + * Get an Accept-Language header which indicates user's preferred languages. > + * > + * Examples: > + * LANGUAGE= -> "" > + * LANGUAGE=ko:en -> "Accept-Language: ko, en; q=0.9, *; q=0.1" > + * LANGUAGE=ko_KR.UTF-8:sr@latin -> "Accept-Language: ko-KR, sr; q=0.9, *; q=0.1" > + * LANGUAGE=ko LANG=en_US.UTF-8 -> "Accept-Language: ko, *; q=0.1" > + * LANGUAGE= LANG=en_US.UTF-8 -> "Accept-Language: en-US, *; q=0.1" > + * LANGUAGE= LANG=C -> "" > + */ > +static struct strbuf *get_accept_language(void) I find this API a bit strange. Use of strbuf to construct the returned string is an implementation detail of this function. From the caller's point of view, it should just be receiving a constant string: one which it needs neither to modify nor free. Also, if the caller were to modify the returned strbuf for some reason, then that modification would impact all future calls to get_accept_language() since the strbuf is 'static' and not recomputed. Instead, I would expect the declaration to be: static const char *get_accept_language(void) > +{ > + const char *lang_begin, *pos; > + int q, max_q; > + int num_langs; > + int decimal_places; > + int is_codeset_or_modifier = 0; > + char q_format[32]; > + const int MAX_LANGS = 1000; > + const int MAX_SIZE_OF_HEADER = 4000; > + int last_size = 0; > + > + if (cached_accept_language) > + return cached_accept_language; > + > + cached_accept_language = xmalloc(sizeof(struct strbuf)); > + strbuf_init(cached_accept_language, 0); > + lang_begin = get_preferred_languages(); Mental note: lang_begin will never be the empty string "". > + /* Don't add Accept-Language header if no language is preferred. */ > + if (!lang_begin) > + return cached_accept_language; > + > + /* Count number of preferred lang_begin to decide precision of q-factor. */ > + for (num_langs = 1, pos = lang_begin; *pos; pos++) > + if (*pos == ':') > + num_langs++; > + > + /* Decide the precision for q-factor on number of preferred lang_begin. */ > + num_langs += 1; /* for '*' */ > + > + if (MAX_LANGS < num_langs) > + num_langs = MAX_LANGS; > + > + for (max_q = 1, decimal_places = 0; > + max_q < num_langs; > + decimal_places++, max_q *= 10); > + > + sprintf(q_format, ";q=0.%%0%dd", decimal_places); > + > + q = max_q; > + > + strbuf_addstr(cached_accept_language, "Accept-Language: "); > + > + for (pos = lang_begin; ; pos++) { > + if (*pos == ':' || !*pos) { > + /* Ignore if this character is the first one. */ > + if (pos == lang_begin) > + continue; If lang_begin were ever to point at an empty string "", then this logic would access memory beyond the end-of-string. Since get_preferred_languages() won't return an empty string, this case is safe, but it's not obvious to the casual reader and some person modifying the code in the future might not notice this restriction. I personally would feel more comfortable if the no-empty-string assumption was documented formally with an assert(*lang_begin) or die() before entering the loop; or if the code was rewritten to be less fragile. > + is_codeset_or_modifier = 0; > + > + /* Put a q-factor only if it is less than 1.0. */ > + if (q < max_q) > + strbuf_addf(cached_accept_language, q_format, q); > + > + if (q > 1) > + q--; > + > + last_size = cached_accept_language->len; > + > + /* NULL pos means this is the last language. */ Nit: This feels somewhat backward. It sounds like the comment is explaining the 'then' part of the 'if' rather than the 'else' part. Perhaps rephrase like this: /* non-NULL pos means more languages */ or, better yet, just drop the comment since the code is self-explanatory. > + if (*pos) > + strbuf_addstr(cached_accept_language, ", "); > + else > + break; > + > + } else if (is_codeset_or_modifier) > + continue; > + else if (*pos == '.' || *pos == '@') /* Remove .codeset and @modifier. */ > + is_codeset_or_modifier = 1; > + else > + strbuf_addch(cached_accept_language, *pos == '_' ? '-' : *pos); > + > + if (cached_accept_language->len > MAX_SIZE_OF_HEADER) { Mental note: Here you respect MAX_SIZE_OF_HEADER. > + strbuf_remove(cached_accept_language, last_size, > + cached_accept_language->len - last_size); > + break; > + } > + } > + > + /* Don't add Accept-Language header if no language is preferred. */ Is this comment correct? The "Accept-Language:" header was already added to cached_accept_language much earlier in the function. > + if (q >= max_q) { > + return cached_accept_language; > + } Style: Unnecessary braces. > + /* Add '*' with minimum q-factor greater than 0.0. */ > + strbuf_addstr(cached_accept_language, ", *"); > + strbuf_addf(cached_accept_language, q_format, 1); Here you don't respect MAX_SIZE_OF_HEADER. > + > + return cached_accept_language; > +} > + > /* http_request() targets */ > #define HTTP_REQUEST_STRBUF 0 > #define HTTP_REQUEST_FILE 1 > @@ -998,6 +1146,7 @@ static int http_request(const char *url, > struct slot_results results; > struct curl_slist *headers = NULL; > struct strbuf buf = STRBUF_INIT; > + struct strbuf *accept_language; > int ret; > > slot = get_active_slot(); > @@ -1023,6 +1172,11 @@ static int http_request(const char *url, > fwrite_buffer); > } > > + accept_language = get_accept_language(); > + > + if (accept_language && accept_language->len > 0) > + headers = curl_slist_append(headers, accept_language->buf); > + > strbuf_addstr(&buf, "Pragma:"); > if (options && options->no_cache) > strbuf_addstr(&buf, " no-cache");