All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fetch2/wget: fallback to GET if HEAD is rejected in checkstatus()
@ 2016-01-20 13:14 Ross Burton
  2016-01-20 15:48 ` Aníbal Limón
  0 siblings, 1 reply; 2+ messages in thread
From: Ross Burton @ 2016-01-20 13:14 UTC (permalink / raw)
  To: bitbake-devel

The core change here is to fall back to GET requests if HEAD is rejected in the
checkstatus() method, as you can't do a HEAD on Amazon S3 (used by Github
archives).  This meant removing the monkey patch that the default method was GET
and adding a fixed redirect handler that doesn't reset to GET.

Also, change the way the opener is constructed from an if/elif cluster to a
conditionally constructed list.

Signed-off-by: Ross Burton <ross.burton@intel.com>
---
 bitbake/lib/bb/fetch2/wget.py | 65 ++++++++++++++++++++++++++++---------------
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py
index c8c6d5c..dafa068 100644
--- a/bitbake/lib/bb/fetch2/wget.py
+++ b/bitbake/lib/bb/fetch2/wget.py
@@ -235,38 +235,57 @@ class Wget(FetchMethod):
 
             return exported
 
-        def head_method(self):
-            return "HEAD"
-
+        class HTTPMethodFallback(urllib2.BaseHandler):
+            """
+            Fallback to GET if HEAD is not allowed (405 HTTP error)
+            """
+            def http_error_405(self, req, fp, code, msg, headers):
+                fp.read()
+                fp.close()
+
+                newheaders = dict((k,v) for k,v in req.headers.items()
+                                  if k.lower() not in ("content-length", "content-type"))
+                return self.parent.open(urllib2.Request(req.get_full_url(),
+                                                        headers=newheaders,
+                                                        origin_req_host=req.get_origin_req_host(),
+                                                        unverifiable=True))
+
+            """
+            Some servers (e.g. GitHub archives, hosted on Amazon S3) return 403
+            Forbidden when they actually mean 405 Method Not Allowed.
+            """
+            http_error_403 = http_error_405
+
+        class FixedHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
+            """
+            urllib2.HTTPRedirectHandler resets the method to GET on redirect,
+            when we want to follow redirects using the original method.
+            """
+            def redirect_request(self, req, fp, code, msg, headers, newurl):
+                newreq = urllib2.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, newurl)
+                newreq.get_method = lambda: req.get_method()
+                return newreq
         exported_proxies = export_proxies(d)
 
+        handlers = [FixedHTTPRedirectHandler, HTTPMethodFallback]
+        if export_proxies:
+            handlers.append(urllib2.ProxyHandler())
+        handlers.append(CacheHTTPHandler())
         # XXX: Since Python 2.7.9 ssl cert validation is enabled by default
         # see PEP-0476, this causes verification errors on some https servers
         # so disable by default.
         import ssl
-        ssl_context = None
         if hasattr(ssl, '_create_unverified_context'):
-            ssl_context = ssl._create_unverified_context()
-
-        if exported_proxies == True and ssl_context is not None:
-            opener = urllib2.build_opener(urllib2.ProxyHandler, CacheHTTPHandler,
-                    urllib2.HTTPSHandler(context=ssl_context))
-        elif exported_proxies == False and ssl_context is not None:
-            opener = urllib2.build_opener(CacheHTTPHandler,
-                    urllib2.HTTPSHandler(context=ssl_context))
-        elif exported_proxies == True and ssl_context is None:
-            opener = urllib2.build_opener(urllib2.ProxyHandler, CacheHTTPHandler)
-        else:
-            opener = urllib2.build_opener(CacheHTTPHandler)
-
-        urllib2.Request.get_method = head_method
-        urllib2.install_opener(opener)
-
-        uri = ud.url.split(";")[0]
+            handlers.append(urllib2.HTTPSHandler(context=ssl._create_unverified_context()))
+        opener = urllib2.build_opener(*handlers)
 
         try:
-            urllib2.urlopen(uri)
-        except:
+            uri = ud.url.split(";")[0]
+            r = urllib2.Request(uri)
+            r.get_method = lambda: "HEAD"
+            opener.open(r)
+        except urllib2.URLError as e:
+            bb.warn("checkstatus() urlopen failed: %s" % e)
             return False
         return True
 
-- 
2.6.4



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] fetch2/wget: fallback to GET if HEAD is rejected in checkstatus()
  2016-01-20 13:14 [PATCH] fetch2/wget: fallback to GET if HEAD is rejected in checkstatus() Ross Burton
@ 2016-01-20 15:48 ` Aníbal Limón
  0 siblings, 0 replies; 2+ messages in thread
From: Aníbal Limón @ 2016-01-20 15:48 UTC (permalink / raw)
  To: Ross Burton, bitbake-devel

Hi Ross,

The code looks good only two comments,

	- Will be good if you add this case when HEAD isn't available to the
unittest. [1]
	
	- I don't thinks is a good idea to log something inside fetcher (since
fetcher is a helper module) i.e. when use shared states the logic calls
checkstatus for every possible sstate available and it will show you a
lot of warnings if isn't. I have a ticket for change this behavior [2].

Cheers,
	alimon

[1] http://git.openembedded.org/bitbake/tree/lib/bb/tests/fetch.py#n727
[2] https://bugzilla.yoctoproject.org/show_bug.cgi?id=8727

On 01/20/2016 07:14 AM, Ross Burton wrote:
> The core change here is to fall back to GET requests if HEAD is rejected in the
> checkstatus() method, as you can't do a HEAD on Amazon S3 (used by Github
> archives).  This meant removing the monkey patch that the default method was GET
> and adding a fixed redirect handler that doesn't reset to GET.
> 
> Also, change the way the opener is constructed from an if/elif cluster to a
> conditionally constructed list.
> 
> Signed-off-by: Ross Burton <ross.burton@intel.com>
> ---
>  bitbake/lib/bb/fetch2/wget.py | 65 ++++++++++++++++++++++++++++---------------
>  1 file changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/bitbake/lib/bb/fetch2/wget.py b/bitbake/lib/bb/fetch2/wget.py
> index c8c6d5c..dafa068 100644
> --- a/bitbake/lib/bb/fetch2/wget.py
> +++ b/bitbake/lib/bb/fetch2/wget.py
> @@ -235,38 +235,57 @@ class Wget(FetchMethod):
>  
>              return exported
>  
> -        def head_method(self):
> -            return "HEAD"
> -
> +        class HTTPMethodFallback(urllib2.BaseHandler):
> +            """
> +            Fallback to GET if HEAD is not allowed (405 HTTP error)
> +            """
> +            def http_error_405(self, req, fp, code, msg, headers):
> +                fp.read()
> +                fp.close()
> +
> +                newheaders = dict((k,v) for k,v in req.headers.items()
> +                                  if k.lower() not in ("content-length", "content-type"))
> +                return self.parent.open(urllib2.Request(req.get_full_url(),
> +                                                        headers=newheaders,
> +                                                        origin_req_host=req.get_origin_req_host(),
> +                                                        unverifiable=True))
> +
> +            """
> +            Some servers (e.g. GitHub archives, hosted on Amazon S3) return 403
> +            Forbidden when they actually mean 405 Method Not Allowed.
> +            """
> +            http_error_403 = http_error_405
> +
> +        class FixedHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
> +            """
> +            urllib2.HTTPRedirectHandler resets the method to GET on redirect,
> +            when we want to follow redirects using the original method.
> +            """
> +            def redirect_request(self, req, fp, code, msg, headers, newurl):
> +                newreq = urllib2.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, newurl)
> +                newreq.get_method = lambda: req.get_method()
> +                return newreq
>          exported_proxies = export_proxies(d)
>  
> +        handlers = [FixedHTTPRedirectHandler, HTTPMethodFallback]
> +        if export_proxies:
> +            handlers.append(urllib2.ProxyHandler())
> +        handlers.append(CacheHTTPHandler())
>          # XXX: Since Python 2.7.9 ssl cert validation is enabled by default
>          # see PEP-0476, this causes verification errors on some https servers
>          # so disable by default.
>          import ssl
> -        ssl_context = None
>          if hasattr(ssl, '_create_unverified_context'):
> -            ssl_context = ssl._create_unverified_context()
> -
> -        if exported_proxies == True and ssl_context is not None:
> -            opener = urllib2.build_opener(urllib2.ProxyHandler, CacheHTTPHandler,
> -                    urllib2.HTTPSHandler(context=ssl_context))
> -        elif exported_proxies == False and ssl_context is not None:
> -            opener = urllib2.build_opener(CacheHTTPHandler,
> -                    urllib2.HTTPSHandler(context=ssl_context))
> -        elif exported_proxies == True and ssl_context is None:
> -            opener = urllib2.build_opener(urllib2.ProxyHandler, CacheHTTPHandler)
> -        else:
> -            opener = urllib2.build_opener(CacheHTTPHandler)
> -
> -        urllib2.Request.get_method = head_method
> -        urllib2.install_opener(opener)
> -
> -        uri = ud.url.split(";")[0]
> +            handlers.append(urllib2.HTTPSHandler(context=ssl._create_unverified_context()))
> +        opener = urllib2.build_opener(*handlers)
>  
>          try:
> -            urllib2.urlopen(uri)
> -        except:
> +            uri = ud.url.split(";")[0]
> +            r = urllib2.Request(uri)
> +            r.get_method = lambda: "HEAD"
> +            opener.open(r)
> +        except urllib2.URLError as e:
> +            bb.warn("checkstatus() urlopen failed: %s" % e)
>              return False
>          return True
>  
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-01-20 15:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-20 13:14 [PATCH] fetch2/wget: fallback to GET if HEAD is rejected in checkstatus() Ross Burton
2016-01-20 15:48 ` Aníbal Limón

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.