All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Florian Westphal <fw@strlen.de>,
	Mike Christie <mchristi@redhat.com>,
	target-devel <target-devel@vger.kernel.org>,
	Linux-netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: iscsi target regression due to "tcp: remove prequeue support" patch
Date: Thu, 18 Jan 2018 15:10:43 +0000	[thread overview]
Message-ID: <20180118151043.GA21673@breakpoint.cc> (raw)
In-Reply-To: <1516264698.24576.240.camel@haakon3.daterainc.com>

Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> On Mon, 2018-01-15 at 11:41 +0100, Florian Westphal wrote:
> > Mike Christie <mchristi@redhat.com> wrote:
> >  
> > > Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT:
> > > 0x00000000, CmdSN: 0x00000000, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
> > > 
> > > we have got a login command and we seem to then go into
> > > iscsit_do_rx_data -> sock_recvmsg
> > > 
> > > We seem to get stuck in there though, because we stay blocked until:
> > > 
> > > Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready:
> > > conn: ffff88b35cbb3000
> > > Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn:
> > > ffff88b35cbb3000 >>>>
> > > 
> > > where initiator side timeout fires 15 seconds later and it disconnects
> > > the tcp connection, and we eventually break out of the recvmsg call:

[..]

> > > Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
> > > Dec 13 17:55:16 rhel73n1 kernel: iscsi_target_do_login_rx after
> > > rx_login_io, ffff88b35cbb3000, kworker/2:2:1829
> > > 
> > > Is the iscsi target doing something incorrect in its use of
> > > sk_data_ready and sock_recvmsg or is the tcp patch at fault?
> > 
> > I have not received any bug reports except this one.
> > 
> > I also have a hard time following iscsi code flow.
> > 
> > > Dec 13 17:55:01 rhel73n1 kernel: Starting login_timer for kworker/2:2/1829
> > > Dec 13 17:55:01 rhel73n1 kernel: rx_loop: 48, total_rx: 48, data: 48
> > > Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT: 0x00000000, CmdSN: 0x00000000, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
> > > Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready: conn: ffff88b35cbb3000
> > 
> > Looks like things are fine up to this point.
> > 
> > > Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn: ffff88b35cbb3000 >>>>
> > 
> > This makes things return early from sk_data_ready callback.
> 
> Correct.
>
> This is existing behavior for individual iscsi_conn login delayed_work
> contexts (conn->login_work) which have not yet returned from a previous
> sock_recvmsg(..., MSG_WAITALL) blocking call.
>
> This causes the next iscsi_target_sk_data_ready() callback to hit
> LOGIN_FLAGS_READ_ACTIVE=1, and return immediately without kicking
> conn->login_work to process iscsi_target_do_login_rx() ->
> sock_recvmsg(..., MSG_WAITALL).

Who is responsible to remove the worker/sk from the wait queue?

> > > Dec 13 17:55:16 rhel73n1 kernel: Entering iscsi_target_sk_state_change
> > > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,returning FALSE
> > > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_close_change: state: 1
> > > Dec 13 17:55:16 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1 sk_state_change conn: ffff88b35cbb3000
> > > Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
> > 
> > So it looks like all data is there, and probably has been there all the
> > past 15 seconds, but nothing noticed.
> > 
> > Why is LOGIN_FLAGS_READ_ACTIVE set?  Who sets this?  Who is supposed to clear that?
> > Why does it exist in first place?
> 
> The bit is set in iscsi_target_sk_data_ready() when conn->login_work is
> not already blocked by sock_recvmsg(..., MSG_WAITALL).  Once it's set,
> conn->login_work is kicked to run iscsi_target_do_login_rx() ->
> sock_recvmsg(..., MSG_WAITALL) which blocks waiting for the next 48 byte
> login request PDU + payload.
> 
> Once the active conn->login_work context in iscsi_target_do_login_rx()
> returns from sock_recvmsg(..., MSG_WAITALL) with full login request PDU
> + payload bytes, the bit is cleared.
> 
> AFAICT, there was a wake_up removed by commit e7942d063 that results in
> multi iscsi login PDU authentication exchanges blocking on a incoming
> login request payload.

With you so far, BUT -- Mike has lowlatency=1 set -- so all the
tcp_prequeue code paths should never be hit in first place.

I just tried a 4.13 kernel and no tcp prequeue is path is hit when
lowlatency sysctl is set afaics.

> It would indicate users providing their own ->sk_data_ready() callback
> must be responsible for waking up a kthread context blocked on
> sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
> received before the first sock_recvmsg(..., MSG_WAITALL) completes.

I agree, it looks like we need something like this?
(not even build tested):

diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index b686e2ce9c0e..3723f8f419aa 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
        if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
                write_unlock_bh(&sk->sk_callback_lock);
                pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
+               if (WARN_ON(iscsi_target_sk_data_ready = conn->orig_data_ready))
+                       return;
+               conn->orig_data_ready(sk);
                return;
        }


WARNING: multiple messages have this Message-ID (diff)
From: Florian Westphal <fw@strlen.de>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Florian Westphal <fw@strlen.de>,
	Mike Christie <mchristi@redhat.com>,
	target-devel <target-devel@vger.kernel.org>,
	Linux-netdev <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: iscsi target regression due to "tcp: remove prequeue support" patch
Date: Thu, 18 Jan 2018 16:10:43 +0100	[thread overview]
Message-ID: <20180118151043.GA21673@breakpoint.cc> (raw)
In-Reply-To: <1516264698.24576.240.camel@haakon3.daterainc.com>

Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> On Mon, 2018-01-15 at 11:41 +0100, Florian Westphal wrote:
> > Mike Christie <mchristi@redhat.com> wrote:
> >  
> > > Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT:
> > > 0x00000000, CmdSN: 0x00000000, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
> > > 
> > > we have got a login command and we seem to then go into
> > > iscsit_do_rx_data -> sock_recvmsg
> > > 
> > > We seem to get stuck in there though, because we stay blocked until:
> > > 
> > > Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready:
> > > conn: ffff88b35cbb3000
> > > Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn:
> > > ffff88b35cbb3000 >>>>
> > > 
> > > where initiator side timeout fires 15 seconds later and it disconnects
> > > the tcp connection, and we eventually break out of the recvmsg call:

[..]

> > > Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
> > > Dec 13 17:55:16 rhel73n1 kernel: iscsi_target_do_login_rx after
> > > rx_login_io, ffff88b35cbb3000, kworker/2:2:1829
> > > 
> > > Is the iscsi target doing something incorrect in its use of
> > > sk_data_ready and sock_recvmsg or is the tcp patch at fault?
> > 
> > I have not received any bug reports except this one.
> > 
> > I also have a hard time following iscsi code flow.
> > 
> > > Dec 13 17:55:01 rhel73n1 kernel: Starting login_timer for kworker/2:2/1829
> > > Dec 13 17:55:01 rhel73n1 kernel: rx_loop: 48, total_rx: 48, data: 48
> > > Dec 13 17:55:01 rhel73n1 kernel: Got Login Command, Flags 0x81, ITT: 0x00000000, CmdSN: 0x00000000, ExpStatSN: 0xf86dc69b, CID: 0, Length: 65
> > > Dec 13 17:55:01 rhel73n1 kernel: Entering iscsi_target_sk_data_ready: conn: ffff88b35cbb3000
> > 
> > Looks like things are fine up to this point.
> > 
> > > Dec 13 17:55:01 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1, conn: ffff88b35cbb3000 >>>>
> > 
> > This makes things return early from sk_data_ready callback.
> 
> Correct.
>
> This is existing behavior for individual iscsi_conn login delayed_work
> contexts (conn->login_work) which have not yet returned from a previous
> sock_recvmsg(..., MSG_WAITALL) blocking call.
>
> This causes the next iscsi_target_sk_data_ready() callback to hit
> LOGIN_FLAGS_READ_ACTIVE=1, and return immediately without kicking
> conn->login_work to process iscsi_target_do_login_rx() ->
> sock_recvmsg(..., MSG_WAITALL).

Who is responsible to remove the worker/sk from the wait queue?

> > > Dec 13 17:55:16 rhel73n1 kernel: Entering iscsi_target_sk_state_change
> > > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,returning FALSE
> > > Dec 13 17:55:16 rhel73n1 kernel: __iscsi_target_sk_close_change: state: 1
> > > Dec 13 17:55:16 rhel73n1 kernel: Got LOGIN_FLAGS_READ_ACTIVE=1 sk_state_change conn: ffff88b35cbb3000
> > > Dec 13 17:55:16 rhel73n1 kernel: rx_loop: 68, total_rx: 68, data: 68
> > 
> > So it looks like all data is there, and probably has been there all the
> > past 15 seconds, but nothing noticed.
> > 
> > Why is LOGIN_FLAGS_READ_ACTIVE set?  Who sets this?  Who is supposed to clear that?
> > Why does it exist in first place?
> 
> The bit is set in iscsi_target_sk_data_ready() when conn->login_work is
> not already blocked by sock_recvmsg(..., MSG_WAITALL).  Once it's set,
> conn->login_work is kicked to run iscsi_target_do_login_rx() ->
> sock_recvmsg(..., MSG_WAITALL) which blocks waiting for the next 48 byte
> login request PDU + payload.
> 
> Once the active conn->login_work context in iscsi_target_do_login_rx()
> returns from sock_recvmsg(..., MSG_WAITALL) with full login request PDU
> + payload bytes, the bit is cleared.
> 
> AFAICT, there was a wake_up removed by commit e7942d063 that results in
> multi iscsi login PDU authentication exchanges blocking on a incoming
> login request payload.

With you so far, BUT -- Mike has lowlatency=1 set -- so all the
tcp_prequeue code paths should never be hit in first place.

I just tried a 4.13 kernel and no tcp prequeue is path is hit when
lowlatency sysctl is set afaics.

> It would indicate users providing their own ->sk_data_ready() callback
> must be responsible for waking up a kthread context blocked on
> sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
> received before the first sock_recvmsg(..., MSG_WAITALL) completes.

I agree, it looks like we need something like this?
(not even build tested):

diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index b686e2ce9c0e..3723f8f419aa 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
        if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
                write_unlock_bh(&sk->sk_callback_lock);
                pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
+               if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready))
+                       return;
+               conn->orig_data_ready(sk);
                return;
        }

  reply	other threads:[~2018-01-18 15:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-14  5:56 iscsi target regression due to "tcp: remove prequeue support" patch Mike Christie
2018-01-09  6:32 ` Nicholas A. Bellinger
2018-01-09  6:32   ` Nicholas A. Bellinger
2018-01-15  6:41   ` Nicholas A. Bellinger
2018-01-15  6:41     ` Nicholas A. Bellinger
2018-01-15 20:33     ` Mike Christie
2018-01-15 20:33       ` Mike Christie
2018-01-18  8:41       ` Nicholas A. Bellinger
2018-01-18  8:41         ` Nicholas A. Bellinger
2018-01-15 10:41 ` Florian Westphal
2018-01-18  8:38   ` Nicholas A. Bellinger
2018-01-18  8:38     ` Nicholas A. Bellinger
2018-01-18 15:10     ` Florian Westphal [this message]
2018-01-18 15:10       ` Florian Westphal
2018-01-19  4:28       ` Mike Christie
2018-01-19  4:28         ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180118151043.GA21673@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=davem@davemloft.net \
    --cc=mchristi@redhat.com \
    --cc=nab@linux-iscsi.org \
    --cc=netdev@vger.kernel.org \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.