From: Bob Pearson <rpearsonhpe@gmail.com>
To: Bart Van Assche <bvanassche@acm.org>, Leon Romanovsky <leon@kernel.org>
Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, linux-rdma@vger.kernel.org,
matsuda-daisuke@fujitsu.com, shinichiro.kawasaki@wdc.com,
linux-scsi@vger.kernel.org, Zhu Yanjun <yanjun.zhu@intel.com>,
Zhu Yanjun <yanjun.zhu@linux.dev>
Subject: Re: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"
Date: Wed, 4 Oct 2023 16:16:38 -0500 [thread overview]
Message-ID: <c7cb2866-932d-42da-9971-ff8eba7a13c6@gmail.com> (raw)
In-Reply-To: <eb16cea2-d727-4799-b857-e872d7855909@acm.org>
[-- Attachment #1: Type: text/plain, Size: 2013 bytes --]
On 10/4/23 12:44, Bart Van Assche wrote:
> On 9/30/23 23:30, Leon Romanovsky wrote:
>> On Wed, Sep 27, 2023 at 11:51:12AM -0500, Bob Pearson wrote:
>>> On 9/26/23 15:24, Bart Van Assche wrote:
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> index 1501120d4f52..6cd5d5a7a316 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_task.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> @@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq;
>>>>
>>>> int rxe_alloc_wq(void)
>>>> {
>>>> - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
>>>> + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1);
>>>> if (!rxe_wq)
>>>> return -ENOMEM;
>>>>
>>>> Thanks,
>>>>
>>>> Bart.
>>
>> <...>
>>
>>> Nevertheless this is a good hint since it seems to imply that there is a race between the requester and
>>> completer which is certainly possible.
>>
>> Bob, Bart
>>
>> Can you please send this change as a formal patch?
>> As we prefer workqueue with bad performance implementation over tasklets.
>
> Hi Bob,
>
> Do you perhaps have a preference for who posts the formal patch?
>
> Thanks,
>
> Bart.
>
Bart,
Not really.
I have spent the past two weeks chasing this bug and don't have much to report. I have never been able to
reproduce your kasan bug. I have found like Zhu that the hang is always there but the frequency changes a
lot depending on changes. For example various printk's can increase or decrease the frequency.
I spent this morning looking at flame graphs captured during the hang which lasts about 60 seconds before
it times out and check tears down the test. It is attached to this note. There seems to be a lot of recursion
in what I assume is some attempt at error recovery. The recursion is probably in user space because the
symbols are not available to perf.
I would be worried that there may be stack overflow which could cause bad behavior.
Bob
[-- Attachment #2: perf-kernel.svg --]
[-- Type: image/svg+xml, Size: 517740 bytes --]
next prev parent reply other threads:[~2023-10-04 21:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 16:32 [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks" Zhu Yanjun
2023-09-22 16:42 ` Bart Van Assche
2023-09-26 9:43 ` Leon Romanovsky
2023-09-26 9:43 ` Leon Romanovsky
2023-09-26 14:06 ` Leon Romanovsky
2023-09-26 17:05 ` Bart Van Assche
2023-09-26 18:34 ` Bob Pearson
2023-09-26 20:24 ` Bart Van Assche
2023-09-27 0:08 ` Rain River
2023-09-27 16:36 ` Bob Pearson
2023-09-27 16:51 ` Bob Pearson
2023-10-01 6:30 ` Leon Romanovsky
[not found] ` <8afdc6ac-1f31-c12f-a60c-811a0101fc89@linux.dev>
[not found] ` <88137631-028c-4a60-b7b1-ac55f98badbf@app.fastmail.com>
[not found] ` <a0d05185-7f03-b3a8-1493-2b50302161d2@linux.dev>
[not found] ` <e1576d79-642d-40bd-8e55-c37009cb6426@app.fastmail.com>
[not found] ` <1290ba1d-6102-ea17-c80e-9f1280b26067@linux.dev>
[not found] ` <20231003095901.GA51282@unreal>
[not found] ` <5ea7795a-49a6-2ba0-4caf-02ba7b6961f9@linux.dev>
[not found] ` <20231003181123.GD51282@unreal>
[not found] ` <be4c9b0e-8acf-7fee-5ad0-209df5d3b0f9@linux.dev>
2023-10-04 1:00 ` Zhu Yanjun
2023-10-04 17:44 ` Bart Van Assche
2023-10-04 21:16 ` Bob Pearson [this message]
2023-10-04 3:41 ` Zhu Yanjun
2023-10-04 17:43 ` Bart Van Assche
2023-10-04 18:38 ` Jason Gunthorpe
2023-10-05 9:25 ` Zhu Yanjun
2023-10-05 14:21 ` Jason Gunthorpe
2023-10-05 14:50 ` Bart Van Assche
2023-10-05 15:56 ` Jason Gunthorpe
2023-10-06 15:58 ` Bob Pearson
2023-10-07 0:35 ` Zhu Yanjun
2023-10-08 16:01 ` Zhu Yanjun
2023-10-08 17:09 ` Leon Romanovsky
2023-10-10 4:53 ` Daisuke Matsuda (Fujitsu)
2023-10-10 16:09 ` Jason Gunthorpe
2023-10-10 21:29 ` Bart Van Assche
2023-10-11 15:51 ` Jason Gunthorpe
2023-10-11 20:14 ` Bart Van Assche
2023-10-11 23:12 ` Jason Gunthorpe
2023-10-12 11:49 ` Zhu Yanjun
2023-10-12 15:38 ` Bob Pearson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c7cb2866-932d-42da-9971-ff8eba7a13c6@gmail.com \
--to=rpearsonhpe@gmail.com \
--cc=bvanassche@acm.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=matsuda-daisuke@fujitsu.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=yanjun.zhu@intel.com \
--cc=yanjun.zhu@linux.dev \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).