public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libceph: init the cursor when preparing the sparse read
@ 2024-02-29  4:19 xiubli
  2024-02-29  9:48 ` Luis Henriques
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: xiubli @ 2024-02-29  4:19 UTC (permalink / raw)
  To: ceph-devel
  Cc: idryomov, jlayton, vshankar, mchangir, Xiubo Li, stable,
	Luis Henriques

From: Xiubo Li <xiubli@redhat.com>

The osd code has remove cursor initilizing code and this will make
the sparse read state into a infinite loop. We should initialize
the cursor just before each sparse-read in messnger v2.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/64607
Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 net/ceph/messenger_v2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..7ae0f80100f4 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
 static int prepare_sparse_read_data(struct ceph_connection *con)
 {
 	struct ceph_msg *msg = con->in_msg;
+	u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
 
 	dout("%s: starting sparse read\n", __func__);
 
@@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
 	if (!con_secure(con))
 		con->in_data_crc = -1;
 
+	ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
+
 	reset_in_kvecs(con);
 	con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
 	con->v2.data_len_remain = data_len(msg);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-02-29  4:19 [PATCH] libceph: init the cursor when preparing the sparse read xiubli
@ 2024-02-29  9:48 ` Luis Henriques
  2024-02-29 10:48 ` Ilya Dryomov
  2024-03-01 16:16 ` Ilya Dryomov
  2 siblings, 0 replies; 12+ messages in thread
From: Luis Henriques @ 2024-02-29  9:48 UTC (permalink / raw)
  To: xiubli; +Cc: ceph-devel, idryomov, jlayton, vshankar, mchangir, stable

xiubli@redhat.com writes:

> From: Xiubo Li <xiubli@redhat.com>
>
> The osd code has remove cursor initilizing code and this will make
> the sparse read state into a infinite loop. We should initialize
> the cursor just before each sparse-read in messnger v2.
>
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/64607
> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> Reported-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>

Thanks a lot Xiubo.  Feel free to add my

Tested-by: Luis Henriques <lhenriques@suse.de>

Cheers,
-- 
Luís

> ---
>  net/ceph/messenger_v2.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index a0ca5414b333..7ae0f80100f4 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>  static int prepare_sparse_read_data(struct ceph_connection *con)
>  {
>  	struct ceph_msg *msg = con->in_msg;
> +	u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
>  
>  	dout("%s: starting sparse read\n", __func__);
>  
> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
>  	if (!con_secure(con))
>  		con->in_data_crc = -1;
>  
> +	ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
> +
>  	reset_in_kvecs(con);
>  	con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>  	con->v2.data_len_remain = data_len(msg);
> -- 
>
> 2.43.0
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-02-29  4:19 [PATCH] libceph: init the cursor when preparing the sparse read xiubli
  2024-02-29  9:48 ` Luis Henriques
@ 2024-02-29 10:48 ` Ilya Dryomov
  2024-03-01  1:53   ` Xiubo Li
  2024-03-01 16:16 ` Ilya Dryomov
  2 siblings, 1 reply; 12+ messages in thread
From: Ilya Dryomov @ 2024-02-29 10:48 UTC (permalink / raw)
  To: xiubli; +Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques

On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> The osd code has remove cursor initilizing code and this will make
> the sparse read state into a infinite loop. We should initialize
> the cursor just before each sparse-read in messnger v2.
>
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/64607
> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> Reported-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  net/ceph/messenger_v2.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index a0ca5414b333..7ae0f80100f4 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>  static int prepare_sparse_read_data(struct ceph_connection *con)
>  {
>         struct ceph_msg *msg = con->in_msg;
> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
>
>         dout("%s: starting sparse read\n", __func__);
>
> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
>         if (!con_secure(con))
>                 con->in_data_crc = -1;
>
> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
> +
>         reset_in_kvecs(con);
>         con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>         con->v2.data_len_remain = data_len(msg);
> --
> 2.43.0
>

Hi Xiubo,

How did this get missed?  Was generic/580 not paired with msgr2 in crc
mode or are we not running generic/580 at all?

Multiple runs have happened since the patch was staged so if the matrix
is set up correctly ms_mode=crc should have been in effect for xfstests
at least a couple of times.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-02-29 10:48 ` Ilya Dryomov
@ 2024-03-01  1:53   ` Xiubo Li
  2024-03-01 17:15     ` Ilya Dryomov
  0 siblings, 1 reply; 12+ messages in thread
From: Xiubo Li @ 2024-03-01  1:53 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques


On 2/29/24 18:48, Ilya Dryomov wrote:
> On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The osd code has remove cursor initilizing code and this will make
>> the sparse read state into a infinite loop. We should initialize
>> the cursor just before each sparse-read in messnger v2.
>>
>> Cc: stable@vger.kernel.org
>> URL: https://tracker.ceph.com/issues/64607
>> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
>> Reported-by: Luis Henriques <lhenriques@suse.de>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   net/ceph/messenger_v2.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>> index a0ca5414b333..7ae0f80100f4 100644
>> --- a/net/ceph/messenger_v2.c
>> +++ b/net/ceph/messenger_v2.c
>> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>   static int prepare_sparse_read_data(struct ceph_connection *con)
>>   {
>>          struct ceph_msg *msg = con->in_msg;
>> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
>>
>>          dout("%s: starting sparse read\n", __func__);
>>
>> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
>>          if (!con_secure(con))
>>                  con->in_data_crc = -1;
>>
>> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
>> +
>>          reset_in_kvecs(con);
>>          con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>>          con->v2.data_len_remain = data_len(msg);
>> --
>> 2.43.0
>>
> Hi Xiubo,
>
> How did this get missed?  Was generic/580 not paired with msgr2 in crc
> mode or are we not running generic/580 at all?
>
> Multiple runs have happened since the patch was staged so if the matrix
> is set up correctly ms_mode=crc should have been in effect for xfstests
> at least a couple of times.

I just found that my test script is incorrect and missed this case.

The test locally is covered the msgr1 mostly and I think the qa test 
suite also doesn't cover it too. I will try to improve the qa tests later.

Thanks

- Xiubo


> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-02-29  4:19 [PATCH] libceph: init the cursor when preparing the sparse read xiubli
  2024-02-29  9:48 ` Luis Henriques
  2024-02-29 10:48 ` Ilya Dryomov
@ 2024-03-01 16:16 ` Ilya Dryomov
  2024-03-04  1:02   ` Xiubo Li
  2 siblings, 1 reply; 12+ messages in thread
From: Ilya Dryomov @ 2024-03-01 16:16 UTC (permalink / raw)
  To: xiubli; +Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques

On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> The osd code has remove cursor initilizing code and this will make
> the sparse read state into a infinite loop. We should initialize
> the cursor just before each sparse-read in messnger v2.
>
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/64607
> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> Reported-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  net/ceph/messenger_v2.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index a0ca5414b333..7ae0f80100f4 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>  static int prepare_sparse_read_data(struct ceph_connection *con)
>  {
>         struct ceph_msg *msg = con->in_msg;
> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);

Hi Xiubo,

Why is sparse_read_total being tested here?  Isn't this function
supposed to be called only for sparse reads, after the state is set to
IN_S_PREPARE_SPARSE_DATA based on a similar test:

    if (msg->sparse_read_total)
            con->v2.in_state = IN_S_PREPARE_SPARSE_DATA;
    else
            con->v2.in_state = IN_S_PREPARE_READ_DATA;

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-03-01  1:53   ` Xiubo Li
@ 2024-03-01 17:15     ` Ilya Dryomov
  2024-03-04  0:43       ` Xiubo Li
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Dryomov @ 2024-03-01 17:15 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques

On Fri, Mar 1, 2024 at 2:53 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 2/29/24 18:48, Ilya Dryomov wrote:
> > On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
> >> From: Xiubo Li <xiubli@redhat.com>
> >>
> >> The osd code has remove cursor initilizing code and this will make
> >> the sparse read state into a infinite loop. We should initialize
> >> the cursor just before each sparse-read in messnger v2.
> >>
> >> Cc: stable@vger.kernel.org
> >> URL: https://tracker.ceph.com/issues/64607
> >> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> >> Reported-by: Luis Henriques <lhenriques@suse.de>
> >> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >> ---
> >>   net/ceph/messenger_v2.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> >> index a0ca5414b333..7ae0f80100f4 100644
> >> --- a/net/ceph/messenger_v2.c
> >> +++ b/net/ceph/messenger_v2.c
> >> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
> >>   static int prepare_sparse_read_data(struct ceph_connection *con)
> >>   {
> >>          struct ceph_msg *msg = con->in_msg;
> >> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
> >>
> >>          dout("%s: starting sparse read\n", __func__);
> >>
> >> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
> >>          if (!con_secure(con))
> >>                  con->in_data_crc = -1;
> >>
> >> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
> >> +
> >>          reset_in_kvecs(con);
> >>          con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
> >>          con->v2.data_len_remain = data_len(msg);
> >> --
> >> 2.43.0
> >>
> > Hi Xiubo,
> >
> > How did this get missed?  Was generic/580 not paired with msgr2 in crc
> > mode or are we not running generic/580 at all?
> >
> > Multiple runs have happened since the patch was staged so if the matrix
> > is set up correctly ms_mode=crc should have been in effect for xfstests
> > at least a couple of times.
>
> I just found that my test script is incorrect and missed this case.
>
> The test locally is covered the msgr1 mostly and I think the qa test
> suite also doesn't cover it too. I will try to improve the qa tests later.

Could you please provide some details on the fixes needed to address
the coverage gap in the fs suite?  I'm lost because you marked [1] for
backporting to reef as (part of?) the solution, however Venky's job [2]
that is linked there in the tracker is based on main and therefore has
everything.

Additionally, [2] seems to be have failed when installing packages, so
the relationship to [1] isn't obvious to me.

[1] https://tracker.ceph.com/issues/59195
[2] https://pulpito.ceph.com/vshankar-2024-02-27_04:05:06-fs-wip-vshankar-testing-20240226.124304-testing-default-smithi/7574417/

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-03-01 17:15     ` Ilya Dryomov
@ 2024-03-04  0:43       ` Xiubo Li
  2024-03-04 11:12         ` Ilya Dryomov
  0 siblings, 1 reply; 12+ messages in thread
From: Xiubo Li @ 2024-03-04  0:43 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques


On 3/2/24 01:15, Ilya Dryomov wrote:
> On Fri, Mar 1, 2024 at 2:53 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 2/29/24 18:48, Ilya Dryomov wrote:
>>> On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> The osd code has remove cursor initilizing code and this will make
>>>> the sparse read state into a infinite loop. We should initialize
>>>> the cursor just before each sparse-read in messnger v2.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> URL: https://tracker.ceph.com/issues/64607
>>>> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
>>>> Reported-by: Luis Henriques <lhenriques@suse.de>
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    net/ceph/messenger_v2.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>>> index a0ca5414b333..7ae0f80100f4 100644
>>>> --- a/net/ceph/messenger_v2.c
>>>> +++ b/net/ceph/messenger_v2.c
>>>> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>>>    static int prepare_sparse_read_data(struct ceph_connection *con)
>>>>    {
>>>>           struct ceph_msg *msg = con->in_msg;
>>>> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
>>>>
>>>>           dout("%s: starting sparse read\n", __func__);
>>>>
>>>> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
>>>>           if (!con_secure(con))
>>>>                   con->in_data_crc = -1;
>>>>
>>>> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
>>>> +
>>>>           reset_in_kvecs(con);
>>>>           con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>>>>           con->v2.data_len_remain = data_len(msg);
>>>> --
>>>> 2.43.0
>>>>
>>> Hi Xiubo,
>>>
>>> How did this get missed?  Was generic/580 not paired with msgr2 in crc
>>> mode or are we not running generic/580 at all?
>>>
>>> Multiple runs have happened since the patch was staged so if the matrix
>>> is set up correctly ms_mode=crc should have been in effect for xfstests
>>> at least a couple of times.
>> I just found that my test script is incorrect and missed this case.
>>
>> The test locally is covered the msgr1 mostly and I think the qa test
>> suite also doesn't cover it too. I will try to improve the qa tests later.
> Could you please provide some details on the fixes needed to address
> the coverage gap in the fs suite?

Mainly to support the msgr v2 for fscrypt, before we only tested the 
fscrypt based on the msgr v1 for kclient. In ceph upstream we have 
support this while not backport it to reef yet.

>    I'm lost because you marked [1] for
> backporting to reef as (part of?) the solution, however Venky's job [2]
> that is linked there in the tracker is based on main and therefore has
> everything.
>
> Additionally, [2] seems to be have failed when installing packages, so
> the relationship to [1] isn't obvious to me.

It should be a wrong link, let me check and correct it later.

Thanks

- Xiubo

>
> [1] https://tracker.ceph.com/issues/59195
> [2] https://pulpito.ceph.com/vshankar-2024-02-27_04:05:06-fs-wip-vshankar-testing-20240226.124304-testing-default-smithi/7574417/
>
> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-03-01 16:16 ` Ilya Dryomov
@ 2024-03-04  1:02   ` Xiubo Li
  2024-03-04 11:02     ` Ilya Dryomov
  0 siblings, 1 reply; 12+ messages in thread
From: Xiubo Li @ 2024-03-04  1:02 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques


On 3/2/24 00:16, Ilya Dryomov wrote:
> On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The osd code has remove cursor initilizing code and this will make
>> the sparse read state into a infinite loop. We should initialize
>> the cursor just before each sparse-read in messnger v2.
>>
>> Cc: stable@vger.kernel.org
>> URL: https://tracker.ceph.com/issues/64607
>> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
>> Reported-by: Luis Henriques <lhenriques@suse.de>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   net/ceph/messenger_v2.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>> index a0ca5414b333..7ae0f80100f4 100644
>> --- a/net/ceph/messenger_v2.c
>> +++ b/net/ceph/messenger_v2.c
>> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>   static int prepare_sparse_read_data(struct ceph_connection *con)
>>   {
>>          struct ceph_msg *msg = con->in_msg;
>> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
> Hi Xiubo,
>
> Why is sparse_read_total being tested here?  Isn't this function
> supposed to be called only for sparse reads, after the state is set to
> IN_S_PREPARE_SPARSE_DATA based on a similar test:
>
>      if (msg->sparse_read_total)
>              con->v2.in_state = IN_S_PREPARE_SPARSE_DATA;
>      else
>              con->v2.in_state = IN_S_PREPARE_READ_DATA;

Then the patch could be simplified and just be:

diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..ab3ab130a911 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct 
ceph_connection *con)
         if (!con_secure(con))
                 con->in_data_crc = -1;

+       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
+ con->in_msg->sparse_read_total);
+
         reset_in_kvecs(con);
         con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
         con->v2.data_len_remain = data_len(msg);

Else where should we do the test like this ?

Thanks

- Xiubo

> Thanks,
>
>                  Ilya
>


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-03-04  1:02   ` Xiubo Li
@ 2024-03-04 11:02     ` Ilya Dryomov
       [not found]       ` <f54b8fb3-4343-4802-a495-51c5feed52b4@redhat.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Dryomov @ 2024-03-04 11:02 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques

On Mon, Mar 4, 2024 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 3/2/24 00:16, Ilya Dryomov wrote:
> > On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
> >> From: Xiubo Li <xiubli@redhat.com>
> >>
> >> The osd code has remove cursor initilizing code and this will make
> >> the sparse read state into a infinite loop. We should initialize
> >> the cursor just before each sparse-read in messnger v2.
> >>
> >> Cc: stable@vger.kernel.org
> >> URL: https://tracker.ceph.com/issues/64607
> >> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> >> Reported-by: Luis Henriques <lhenriques@suse.de>
> >> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >> ---
> >>   net/ceph/messenger_v2.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> >> index a0ca5414b333..7ae0f80100f4 100644
> >> --- a/net/ceph/messenger_v2.c
> >> +++ b/net/ceph/messenger_v2.c
> >> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
> >>   static int prepare_sparse_read_data(struct ceph_connection *con)
> >>   {
> >>          struct ceph_msg *msg = con->in_msg;
> >> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
> > Hi Xiubo,
> >
> > Why is sparse_read_total being tested here?  Isn't this function
> > supposed to be called only for sparse reads, after the state is set to
> > IN_S_PREPARE_SPARSE_DATA based on a similar test:
> >
> >      if (msg->sparse_read_total)
> >              con->v2.in_state = IN_S_PREPARE_SPARSE_DATA;
> >      else
> >              con->v2.in_state = IN_S_PREPARE_READ_DATA;
>
> Then the patch could be simplified and just be:
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index a0ca5414b333..ab3ab130a911 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct
> ceph_connection *con)
>          if (!con_secure(con))
>                  con->in_data_crc = -1;
>
> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
> + con->in_msg->sparse_read_total);
> +
>          reset_in_kvecs(con);
>          con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>          con->v2.data_len_remain = data_len(msg);
>
> Else where should we do the test like this ?

Hi Xiubo,

I suspect you copied this test from prepare_message_data() in msgr1,
where that function is called unconditionally for all reads.  In msgr2,
prepare_sparse_read_data() is called conditionally, so the test just
seems bogus.

That said, CephFS is the only user of sparse read code, so you should
know better at this point ;)

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-03-04  0:43       ` Xiubo Li
@ 2024-03-04 11:12         ` Ilya Dryomov
  2024-03-04 13:23           ` Xiubo Li
  0 siblings, 1 reply; 12+ messages in thread
From: Ilya Dryomov @ 2024-03-04 11:12 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques

On Mon, Mar 4, 2024 at 1:43 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 3/2/24 01:15, Ilya Dryomov wrote:
> > On Fri, Mar 1, 2024 at 2:53 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 2/29/24 18:48, Ilya Dryomov wrote:
> >>> On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
> >>>> From: Xiubo Li <xiubli@redhat.com>
> >>>>
> >>>> The osd code has remove cursor initilizing code and this will make
> >>>> the sparse read state into a infinite loop. We should initialize
> >>>> the cursor just before each sparse-read in messnger v2.
> >>>>
> >>>> Cc: stable@vger.kernel.org
> >>>> URL: https://tracker.ceph.com/issues/64607
> >>>> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> >>>> Reported-by: Luis Henriques <lhenriques@suse.de>
> >>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> >>>> ---
> >>>>    net/ceph/messenger_v2.c | 3 +++
> >>>>    1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> >>>> index a0ca5414b333..7ae0f80100f4 100644
> >>>> --- a/net/ceph/messenger_v2.c
> >>>> +++ b/net/ceph/messenger_v2.c
> >>>> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
> >>>>    static int prepare_sparse_read_data(struct ceph_connection *con)
> >>>>    {
> >>>>           struct ceph_msg *msg = con->in_msg;
> >>>> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
> >>>>
> >>>>           dout("%s: starting sparse read\n", __func__);
> >>>>
> >>>> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
> >>>>           if (!con_secure(con))
> >>>>                   con->in_data_crc = -1;
> >>>>
> >>>> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
> >>>> +
> >>>>           reset_in_kvecs(con);
> >>>>           con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
> >>>>           con->v2.data_len_remain = data_len(msg);
> >>>> --
> >>>> 2.43.0
> >>>>
> >>> Hi Xiubo,
> >>>
> >>> How did this get missed?  Was generic/580 not paired with msgr2 in crc
> >>> mode or are we not running generic/580 at all?
> >>>
> >>> Multiple runs have happened since the patch was staged so if the matrix
> >>> is set up correctly ms_mode=crc should have been in effect for xfstests
> >>> at least a couple of times.
> >> I just found that my test script is incorrect and missed this case.
> >>
> >> The test locally is covered the msgr1 mostly and I think the qa test
> >> suite also doesn't cover it too. I will try to improve the qa tests later.
> > Could you please provide some details on the fixes needed to address
> > the coverage gap in the fs suite?
>
> Mainly to support the msgr v2 for fscrypt, before we only tested the
> fscrypt based on the msgr v1 for kclient. In ceph upstream we have
> support this while not backport it to reef yet.

I'm even more confused now...  If the fs suite in main covers msgr2 +
fscrypt (I'm taking your "in ceph upstream we have support" to mean
that), how did this bug get missed by runs on main?  At least a dozen
of them must have gone through in the form of Venky's integration
branches.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
  2024-03-04 11:12         ` Ilya Dryomov
@ 2024-03-04 13:23           ` Xiubo Li
  0 siblings, 0 replies; 12+ messages in thread
From: Xiubo Li @ 2024-03-04 13:23 UTC (permalink / raw)
  To: Ilya Dryomov
  Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques


On 3/4/24 19:12, Ilya Dryomov wrote:
> On Mon, Mar 4, 2024 at 1:43 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 3/2/24 01:15, Ilya Dryomov wrote:
>>> On Fri, Mar 1, 2024 at 2:53 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>> On 2/29/24 18:48, Ilya Dryomov wrote:
>>>>> On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>>
>>>>>> The osd code has remove cursor initilizing code and this will make
>>>>>> the sparse read state into a infinite loop. We should initialize
>>>>>> the cursor just before each sparse-read in messnger v2.
>>>>>>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> URL: https://tracker.ceph.com/issues/64607
>>>>>> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
>>>>>> Reported-by: Luis Henriques <lhenriques@suse.de>
>>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>>> ---
>>>>>>     net/ceph/messenger_v2.c | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
>>>>>> index a0ca5414b333..7ae0f80100f4 100644
>>>>>> --- a/net/ceph/messenger_v2.c
>>>>>> +++ b/net/ceph/messenger_v2.c
>>>>>> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>>>>>>     static int prepare_sparse_read_data(struct ceph_connection *con)
>>>>>>     {
>>>>>>            struct ceph_msg *msg = con->in_msg;
>>>>>> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
>>>>>>
>>>>>>            dout("%s: starting sparse read\n", __func__);
>>>>>>
>>>>>> @@ -2034,6 +2035,8 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
>>>>>>            if (!con_secure(con))
>>>>>>                    con->in_data_crc = -1;
>>>>>>
>>>>>> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg, len);
>>>>>> +
>>>>>>            reset_in_kvecs(con);
>>>>>>            con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>>>>>>            con->v2.data_len_remain = data_len(msg);
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>>> Hi Xiubo,
>>>>>
>>>>> How did this get missed?  Was generic/580 not paired with msgr2 in crc
>>>>> mode or are we not running generic/580 at all?
>>>>>
>>>>> Multiple runs have happened since the patch was staged so if the matrix
>>>>> is set up correctly ms_mode=crc should have been in effect for xfstests
>>>>> at least a couple of times.
>>>> I just found that my test script is incorrect and missed this case.
>>>>
>>>> The test locally is covered the msgr1 mostly and I think the qa test
>>>> suite also doesn't cover it too. I will try to improve the qa tests later.
>>> Could you please provide some details on the fixes needed to address
>>> the coverage gap in the fs suite?
>> Mainly to support the msgr v2 for fscrypt, before we only tested the
>> fscrypt based on the msgr v1 for kclient. In ceph upstream we have
>> support this while not backport it to reef yet.
> I'm even more confused now...  If the fs suite in main covers msgr2 +
> fscrypt (I'm taking your "in ceph upstream we have support" to mean
> that), how did this bug get missed by runs on main?  At least a dozen
> of them must have gone through in the form of Venky's integration
> branches.

Maybe some other known issues have masked this bug, before the qa tests 
didn't behave well for a long time for some reasons.

And many test will run base on reef branch, which hasn't backport it yet.

Thanks

- Xiubo

> Thanks,
>
>                  Ilya
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] libceph: init the cursor when preparing the sparse read
       [not found]       ` <f54b8fb3-4343-4802-a495-51c5feed52b4@redhat.com>
@ 2024-03-04 15:45         ` Ilya Dryomov
  0 siblings, 0 replies; 12+ messages in thread
From: Ilya Dryomov @ 2024-03-04 15:45 UTC (permalink / raw)
  To: Xiubo Li; +Cc: ceph-devel, jlayton, vshankar, mchangir, stable, Luis Henriques

On Mon, Mar 4, 2024 at 3:00 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 3/4/24 19:02, Ilya Dryomov wrote:
>
> On Mon, Mar 4, 2024 at 2:02 AM Xiubo Li <xiubli@redhat.com> wrote:
>
> On 3/2/24 00:16, Ilya Dryomov wrote:
>
> On Thu, Feb 29, 2024 at 5:22 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> The osd code has remove cursor initilizing code and this will make
> the sparse read state into a infinite loop. We should initialize
> the cursor just before each sparse-read in messnger v2.
>
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/64607
> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> Reported-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>   net/ceph/messenger_v2.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index a0ca5414b333..7ae0f80100f4 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -2025,6 +2025,7 @@ static int prepare_sparse_read_cont(struct ceph_connection *con)
>   static int prepare_sparse_read_data(struct ceph_connection *con)
>   {
>          struct ceph_msg *msg = con->in_msg;
> +       u64 len = con->in_msg->sparse_read_total ? : data_len(con->in_msg);
>
> Hi Xiubo,
>
> Why is sparse_read_total being tested here?  Isn't this function
> supposed to be called only for sparse reads, after the state is set to
> IN_S_PREPARE_SPARSE_DATA based on a similar test:
>
>      if (msg->sparse_read_total)
>              con->v2.in_state = IN_S_PREPARE_SPARSE_DATA;
>      else
>              con->v2.in_state = IN_S_PREPARE_READ_DATA;
>
> Then the patch could be simplified and just be:
>
> diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
> index a0ca5414b333..ab3ab130a911 100644
> --- a/net/ceph/messenger_v2.c
> +++ b/net/ceph/messenger_v2.c
> @@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct
> ceph_connection *con)
>          if (!con_secure(con))
>                  con->in_data_crc = -1;
>
> +       ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
> + con->in_msg->sparse_read_total);
> +
>          reset_in_kvecs(con);
>          con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
>          con->v2.data_len_remain = data_len(msg);
>
> Else where should we do the test like this ?
>
> Hi Xiubo,
>
> I suspect you copied this test from prepare_message_data() in msgr1,
> where that function is called unconditionally for all reads.  In msgr2,
> prepare_sparse_read_data() is called conditionally, so the test just
> seems bogus.
>
> That said, CephFS is the only user of sparse read code, so you should
> know better at this point ;)
>
> As we know the 'sparse_read_total' it's a dedicated member and will be set only in sparse-read case.
>
> In msgr1 for all the read cases they will call "prepare_message_data()", so I just did this check in "prepare_message_data()".

Right.

>
> While for msgr2 it's a little different from msg1 and it won't call 'prepare_read_data()' for all the reads, and for sparse-read it has its own dedicated helper, which is "prepare_sparse_read_data()".

Right.

> So I just did this check in 'prepare_sparse_read_data()' instead.

This where I'm getting lost.  Why do a "is this a sparse read" check in
a helper that is dedicated to sparse reads and isn't called for anything
else?

> For "prepare_read_data()" it doesn't make any sense to check "sparse_read_total".

Right, so why doesn't the same go for prepare_sparse_read_data()?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-03-04 15:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29  4:19 [PATCH] libceph: init the cursor when preparing the sparse read xiubli
2024-02-29  9:48 ` Luis Henriques
2024-02-29 10:48 ` Ilya Dryomov
2024-03-01  1:53   ` Xiubo Li
2024-03-01 17:15     ` Ilya Dryomov
2024-03-04  0:43       ` Xiubo Li
2024-03-04 11:12         ` Ilya Dryomov
2024-03-04 13:23           ` Xiubo Li
2024-03-01 16:16 ` Ilya Dryomov
2024-03-04  1:02   ` Xiubo Li
2024-03-04 11:02     ` Ilya Dryomov
     [not found]       ` <f54b8fb3-4343-4802-a495-51c5feed52b4@redhat.com>
2024-03-04 15:45         ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox