Linux maintainer tooling and workflows
 help / color / mirror / Atom feed
* Fetching an mbox from lore
@ 2023-07-22 17:12 Maxime Ripard
  2023-07-22 18:47 ` Willy Tarreau
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Maxime Ripard @ 2023-07-22 17:12 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: users, tools

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]

Hi,

I've been trying to fetch an mbox from lore with an arbitrary search request.

I could fetch it fine using curl with the following example:

curl -XPOST -H "Content-Length:0" -OJ "http://lore.kernel.org/linux-clk/?q=d:1.week.ago..&x=m"

This returns a gzip'd mbox, everything's fine.

However, for some reason I can't duplicate it with python's requests
API, and it looks like I get redirected back and forth between HTTPS and
HTTP when I try to connect with the following script:

#!/usr/bin/env python3

from urllib.parse import urlparse

from requests import Request, Session

LORE_URL = "https://lore.kernel.org/linux-clk"

def try_url_redirect(url):
    headers={"Content-Length": "0"}
    params={"q": "d:1.week.ago..", "x": "m"}

    s = Session()

    req = Request('POST', url, headers=headers, params=params)
    p = req.prepare()

    print("Trying to connect to %s" % p.url)

    resp = s.send(p, allow_redirects=False)

    if resp.status_code == 301:
        print("Redirecting to %s" % resp.headers['location'])

        url = urlparse(resp.headers['location'])
        url = url._replace(query="")

        return (resp.status_code, url.geturl())

if __name__ == '__main__':
    code, url = try_url_redirect(LORE_URL)
    try_url_redirect(url)

The output is:

Trying to connect to https://lore.kernel.org/linux-clk?q=d%3A1.week.ago..&x=m
Redirecting to http://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m
Trying to connect to http://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m
Redirecting to https://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m

If I do allow redirects, then requests will issue a GET on the new
location and I'll end up with the HTML webpage of that request.

Am I trying to do something not supported here, or is it supposed to
work and my script is wrong for some reason?

Thanks!
Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetching an mbox from lore
  2023-07-22 17:12 Fetching an mbox from lore Maxime Ripard
@ 2023-07-22 18:47 ` Willy Tarreau
  2023-07-22 22:34 ` Rob Herring
  2023-07-23  1:36 ` Eric Wong
  2 siblings, 0 replies; 6+ messages in thread
From: Willy Tarreau @ 2023-07-22 18:47 UTC (permalink / raw)
  To: Maxime Ripard; +Cc: Konstantin Ryabitsev, users, tools

Hi Maxime,

On Sat, Jul 22, 2023 at 07:12:33PM +0200, Maxime Ripard wrote:
> Hi,
> 
> I've been trying to fetch an mbox from lore with an arbitrary search request.
> 
> I could fetch it fine using curl with the following example:
> 
> curl -XPOST -H "Content-Length:0" -OJ "http://lore.kernel.org/linux-clk/?q=d:1.week.ago..&x=m"
> 
> This returns a gzip'd mbox, everything's fine.

Note that when I do this I get redirected to the https URL and when I
use it, then it works.

> def try_url_redirect(url):
>     headers={"Content-Length": "0"}
>     params={"q": "d:1.week.ago..", "x": "m"}
> 
>     s = Session()
> 
>     req = Request('POST', url, headers=headers, params=params)
>     p = req.prepare()

I don't know about this part in python, but are you certain that
it's not trying to pass the params in the request body ? It would
seem natural to me since you've asked for a POST. In your curl
request, you're not sending arguments as part of the body but as
a query string with an empty body.

It would be useful to strace the output (use http:// to make it
easier) to verify, because I really don't trust the debugging
output which possibly just reassembles the URL as if it were a
GET except that it's not that. With curl (in HTTP) it's what the
request says at least:

   20:43:22.246844 sendto(5, "POST /linux-clk/?q=d:1.week.ago..&x=m HTTP/1.1\r\nHost: lore.kernel.org\r\nUser-Agent: curl/7.81.0\r\nAccept: */*\r\nContent-Length:0\r\n\r\n", 129, MSG_NOSIGNAL, NULL, 0) = 129

You should try to concatenate your arguments just at the end of
the URL and really send nothing with the POST.

Hoping this helps a little bit,
Willy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetching an mbox from lore
  2023-07-22 17:12 Fetching an mbox from lore Maxime Ripard
  2023-07-22 18:47 ` Willy Tarreau
@ 2023-07-22 22:34 ` Rob Herring
  2023-07-24  8:07   ` Maxime Ripard
  2023-07-23  1:36 ` Eric Wong
  2 siblings, 1 reply; 6+ messages in thread
From: Rob Herring @ 2023-07-22 22:34 UTC (permalink / raw)
  To: Maxime Ripard; +Cc: Konstantin Ryabitsev, users, tools

On Sat, Jul 22, 2023 at 11:13 AM Maxime Ripard <mripard@redhat.com> wrote:
>
> Hi,
>
> I've been trying to fetch an mbox from lore with an arbitrary search request.

Any reason you aren't just using lei? This is my script to get such an
mbox and open it in mutt.

8<-----------------------------------------------------
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-only

usage()
{
        echo "syntax: `basename $0` [-t] <query string>"
        echo ""
        echo "For query syntax, see https://lore.kernel.org/all/_/text/help/"
        exit 1
}

while getopts "ht" opt
do
        case "$opt" in
        t)      threads="-t";;
        [h?])    usage;;
        esac
done
shift $((OPTIND-1))
query_str="$*"
[ -z "$query_str" ] && usage


tmp_mbox=$(mktemp)

echo "$query_str" NOT tc:stable@vger.kernel.org | \
  lei q --no-save --dedupe=mid $threads -f mboxrd -O
https://lore.kernel.org/all/ -o $tmp_mbox --stdin

if [ -s "$tmp_mbox" ]; then
        mutt -f $tmp_mbox
fi

rm $tmp_mbox

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetching an mbox from lore
  2023-07-22 17:12 Fetching an mbox from lore Maxime Ripard
  2023-07-22 18:47 ` Willy Tarreau
  2023-07-22 22:34 ` Rob Herring
@ 2023-07-23  1:36 ` Eric Wong
  2023-07-24  8:06   ` Maxime Ripard
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2023-07-23  1:36 UTC (permalink / raw)
  To: Maxime Ripard; +Cc: Konstantin Ryabitsev, users, tools

Maxime Ripard <mripard@redhat.com> wrote:
> LORE_URL = "https://lore.kernel.org/linux-clk"

That should be:

  LORE_URL = "https://lore.kernel.org/linux-clk/"

The trailing slash is critical.  The URL format of public-inbox
enforces it to ensure mirroring via `wget --recursive' works.

> Trying to connect to https://lore.kernel.org/linux-clk?q=d%3A1.week.ago..&x=m
> Redirecting to http://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m
> Trying to connect to http://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m
> Redirecting to https://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m

I'm not sure why lore goes HTTPS => HTTP, though...

lore probably needs to have Plack::Middleware::ReverseProxy
installed and configure the reverse proxy (likely nginx) to set
X-Forwarded-HTTPS:1 or X-Forwarded-Proto:https headers

> If I do allow redirects, then requests will issue a GET on the new
> location and I'll end up with the HTML webpage of that request.
> 
> Am I trying to do something not supported here, or is it supposed to
> work and my script is wrong for some reason?

I can't remember off the top of my head how redirect rules ought
to work for non-GET requests, but avoiding redirects is usually
prudent

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetching an mbox from lore
  2023-07-23  1:36 ` Eric Wong
@ 2023-07-24  8:06   ` Maxime Ripard
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Ripard @ 2023-07-24  8:06 UTC (permalink / raw)
  To: Eric Wong; +Cc: Konstantin Ryabitsev, users, tools

[-- Attachment #1: Type: text/plain, Size: 580 bytes --]

Hi

Thanks to everyone that replied :)

On Sun, Jul 23, 2023 at 01:36:24AM +0000, Eric Wong wrote:
> Maxime Ripard <mripard@redhat.com> wrote:
> > LORE_URL = "https://lore.kernel.org/linux-clk"
> 
> That should be:
> 
>   LORE_URL = "https://lore.kernel.org/linux-clk/"
> 
> The trailing slash is critical.  The URL format of public-inbox
> enforces it to ensure mirroring via `wget --recursive' works.

That was it, thanks so much

I was scratching my head trying to figure it out, but it turned out to
be much simpler than what I was looking for :)

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fetching an mbox from lore
  2023-07-22 22:34 ` Rob Herring
@ 2023-07-24  8:07   ` Maxime Ripard
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Ripard @ 2023-07-24  8:07 UTC (permalink / raw)
  To: Rob Herring; +Cc: Konstantin Ryabitsev, users, tools

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

Hi Rob,

On Sat, Jul 22, 2023 at 04:34:01PM -0600, Rob Herring wrote:
> On Sat, Jul 22, 2023 at 11:13 AM Maxime Ripard <mripard@redhat.com> wrote:
> >
> > Hi,
> >
> > I've been trying to fetch an mbox from lore with an arbitrary search request.
> 
> Any reason you aren't just using lei? This is my script to get such an
> mbox and open it in mutt.

Yeah, I'm trying to add support for lore in did:

https://github.com/psss/did

So I'd rather avoid adding an external dependency and prefer to do it in
pure python.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-24  8:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-22 17:12 Fetching an mbox from lore Maxime Ripard
2023-07-22 18:47 ` Willy Tarreau
2023-07-22 22:34 ` Rob Herring
2023-07-24  8:07   ` Maxime Ripard
2023-07-23  1:36 ` Eric Wong
2023-07-24  8:06   ` Maxime Ripard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox