* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
[not found] ` <20230519225157.760788-6-aditi.ghag@isovalent.com>
@ 2023-09-20 0:38 ` Martin KaFai Lau
2023-09-20 17:16 ` Aditi Ghag
2023-09-25 23:34 ` Aditi Ghag
0 siblings, 2 replies; 7+ messages in thread
From: Martin KaFai Lau @ 2023-09-20 0:38 UTC (permalink / raw)
To: Aditi Ghag; +Cc: sdf, Martin KaFai Lau, bpf, Network Development
On 5/19/23 3:51 PM, Aditi Ghag wrote:
> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
> +{
> + struct bpf_udp_iter_state *iter = seq->private;
> + struct udp_iter_state *state = &iter->state;
> + struct net *net = seq_file_net(seq);
> + struct udp_table *udptable;
> + unsigned int batch_sks = 0;
> + bool resized = false;
> + struct sock *sk;
> +
> + /* The current batch is done, so advance the bucket. */
> + if (iter->st_bucket_done) {
> + state->bucket++;
> + iter->offset = 0;
> + }
> +
> + udptable = udp_get_table_seq(seq, net);
> +
> +again:
> + /* New batch for the next bucket.
> + * Iterate over the hash table to find a bucket with sockets matching
> + * the iterator attributes, and return the first matching socket from
> + * the bucket. The remaining matched sockets from the bucket are batched
> + * before releasing the bucket lock. This allows BPF programs that are
> + * called in seq_show to acquire the bucket lock if needed.
> + */
> + iter->cur_sk = 0;
> + iter->end_sk = 0;
> + iter->st_bucket_done = false;
> + batch_sks = 0;
> +
> + for (; state->bucket <= udptable->mask; state->bucket++) {
> + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
> +
> + if (hlist_empty(&hslot2->head)) {
> + iter->offset = 0;
> + continue;
> + }
> +
> + spin_lock_bh(&hslot2->lock);
> + udp_portaddr_for_each_entry(sk, &hslot2->head) {
> + if (seq_sk_match(seq, sk)) {
> + /* Resume from the last iterated socket at the
> + * offset in the bucket before iterator was stopped.
> + */
> + if (iter->offset) {
> + --iter->offset;
Hi Aditi, I think this part has a bug.
When I run './test_progs -t bpf_iter/udp6' in a machine with some udp
so_reuseport sockets, this test is never finished.
A broken case I am seeing is when the bucket has >1 sockets and bpf_seq_read()
can only get one sk at a time before it calls bpf_iter_udp_seq_stop().
I did not try the change yet. However, from looking at the code where
iter->offset is changed, --iter->offset here is the most likely culprit and it
will make backward progress for the same bucket (state->bucket). Other places
touching iter->offset look fine.
It needs a local "int offset" variable for the zero test. Could you help to take
a look, add (or modify) a test and fix it?
The progs/bpf_iter_udp[46].c test can be used to reproduce. The test_udp[46] in
prog_tests/bpf_iter.c needs to be changed though to ensure there is multiple sk
in the same bucket. Probably a few so_reuseport sk should do.
Thanks.
> + continue;
> + }
> + if (iter->end_
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
2023-09-20 0:38 ` [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator Martin KaFai Lau
@ 2023-09-20 17:16 ` Aditi Ghag
2023-09-25 23:34 ` Aditi Ghag
1 sibling, 0 replies; 7+ messages in thread
From: Aditi Ghag @ 2023-09-20 17:16 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Stanislav Fomichev, Martin KaFai Lau, bpf, Network Development
> On Sep 19, 2023, at 5:38 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 5/19/23 3:51 PM, Aditi Ghag wrote:
>> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>> +{
>> + struct bpf_udp_iter_state *iter = seq->private;
>> + struct udp_iter_state *state = &iter->state;
>> + struct net *net = seq_file_net(seq);
>> + struct udp_table *udptable;
>> + unsigned int batch_sks = 0;
>> + bool resized = false;
>> + struct sock *sk;
>> +
>> + /* The current batch is done, so advance the bucket. */
>> + if (iter->st_bucket_done) {
>> + state->bucket++;
>> + iter->offset = 0;
>> + }
>> +
>> + udptable = udp_get_table_seq(seq, net);
>> +
>> +again:
>> + /* New batch for the next bucket.
>> + * Iterate over the hash table to find a bucket with sockets matching
>> + * the iterator attributes, and return the first matching socket from
>> + * the bucket. The remaining matched sockets from the bucket are batched
>> + * before releasing the bucket lock. This allows BPF programs that are
>> + * called in seq_show to acquire the bucket lock if needed.
>> + */
>> + iter->cur_sk = 0;
>> + iter->end_sk = 0;
>> + iter->st_bucket_done = false;
>> + batch_sks = 0;
>> +
>> + for (; state->bucket <= udptable->mask; state->bucket++) {
>> + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>> +
>> + if (hlist_empty(&hslot2->head)) {
>> + iter->offset = 0;
>> + continue;
>> + }
>> +
>> + spin_lock_bh(&hslot2->lock);
>> + udp_portaddr_for_each_entry(sk, &hslot2->head) {
>> + if (seq_sk_match(seq, sk)) {
>> + /* Resume from the last iterated socket at the
>> + * offset in the bucket before iterator was stopped.
>> + */
>> + if (iter->offset) {
>> + --iter->offset;
>
> Hi Aditi, I think this part has a bug.
>
> When I run './test_progs -t bpf_iter/udp6' in a machine with some udp so_reuseport sockets, this test is never finished.
>
> A broken case I am seeing is when the bucket has >1 sockets and bpf_seq_read() can only get one sk at a time before it calls bpf_iter_udp_seq_stop().
>
> I did not try the change yet. However, from looking at the code where iter->offset is changed, --iter->offset here is the most likely culprit and it will make backward progress for the same bucket (state->bucket). Other places touching iter->offset look fine.
>
> It needs a local "int offset" variable for the zero test. Could you help to take a look, add (or modify) a test and fix it?
>
> The progs/bpf_iter_udp[46].c test can be used to reproduce. The test_udp[46] in prog_tests/bpf_iter.c needs to be changed though to ensure there is multiple sk in the same bucket. Probably a few so_reuseport sk should do.
Hi Martin,
Thanks for the report. I'll take a look.
>
> Thanks.
>
>> + continue;
>> + }
>> + if (iter->end_
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
2023-09-20 0:38 ` [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator Martin KaFai Lau
2023-09-20 17:16 ` Aditi Ghag
@ 2023-09-25 23:34 ` Aditi Ghag
2023-09-26 5:02 ` Martin KaFai Lau
2023-09-26 5:07 ` Martin KaFai Lau
1 sibling, 2 replies; 7+ messages in thread
From: Aditi Ghag @ 2023-09-25 23:34 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: sdf, Martin KaFai Lau, bpf, Network Development
> On Sep 19, 2023, at 5:38 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 5/19/23 3:51 PM, Aditi Ghag wrote:
>> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>> +{
>> + struct bpf_udp_iter_state *iter = seq->private;
>> + struct udp_iter_state *state = &iter->state;
>> + struct net *net = seq_file_net(seq);
>> + struct udp_table *udptable;
>> + unsigned int batch_sks = 0;
>> + bool resized = false;
>> + struct sock *sk;
>> +
>> + /* The current batch is done, so advance the bucket. */
>> + if (iter->st_bucket_done) {
>> + state->bucket++;
>> + iter->offset = 0;
>> + }
>> +
>> + udptable = udp_get_table_seq(seq, net);
>> +
>> +again:
>> + /* New batch for the next bucket.
>> + * Iterate over the hash table to find a bucket with sockets matching
>> + * the iterator attributes, and return the first matching socket from
>> + * the bucket. The remaining matched sockets from the bucket are batched
>> + * before releasing the bucket lock. This allows BPF programs that are
>> + * called in seq_show to acquire the bucket lock if needed.
>> + */
>> + iter->cur_sk = 0;
>> + iter->end_sk = 0;
>> + iter->st_bucket_done = false;
>> + batch_sks = 0;
>> +
>> + for (; state->bucket <= udptable->mask; state->bucket++) {
>> + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>> +
>> + if (hlist_empty(&hslot2->head)) {
>> + iter->offset = 0;
>> + continue;
>> + }
>> +
>> + spin_lock_bh(&hslot2->lock);
>> + udp_portaddr_for_each_entry(sk, &hslot2->head) {
>> + if (seq_sk_match(seq, sk)) {
>> + /* Resume from the last iterated socket at the
>> + * offset in the bucket before iterator was stopped.
>> + */
>> + if (iter->offset) {
>> + --iter->offset;
>
> Hi Aditi, I think this part has a bug.
>
> When I run './test_progs -t bpf_iter/udp6' in a machine with some udp so_reuseport sockets, this test is never finished.
>
> A broken case I am seeing is when the bucket has >1 sockets and bpf_seq_read() can only get one sk at a time before it calls bpf_iter_udp_seq_stop().
Just so that I understand the broken case better, are you doing something in your BPF iterator program so that "bpf_seq_read() can only get one sk at a time"?
>
> I did not try the change yet. However, from looking at the code where iter->offset is changed, --iter->offset here is the most likely culprit and it will make backward progress for the same bucket (state->bucket). Other places touching iter->offset look fine.
>
> It needs a local "int offset" variable for the zero test. Could you help to take a look, add (or modify) a test and fix it?
>
> The progs/bpf_iter_udp[46].c test can be used to reproduce. The test_udp[46] in prog_tests/bpf_iter.c needs to be changed though to ensure there is multiple sk in the same bucket. Probably a few so_reuseport sk should do.
The sock_destroy patch set had added a test with multiple so_reuseport sks in a bucket in order to exercise batching [1]. I was wondering if extending the test with an additional bucket should do it, or some more cases are required (asked for clarification above) to reproduce the issue.
[1] https://elixir.bootlin.com/linux/v6.5/source/tools/testing/selftests/bpf/prog_tests/sock_destroy.c#L146
>
> Thanks.
>
>> + continue;
>> + }
>> + if (iter->end_
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
2023-09-25 23:34 ` Aditi Ghag
@ 2023-09-26 5:02 ` Martin KaFai Lau
2023-09-26 16:07 ` Aditi Ghag
2023-09-26 5:07 ` Martin KaFai Lau
1 sibling, 1 reply; 7+ messages in thread
From: Martin KaFai Lau @ 2023-09-26 5:02 UTC (permalink / raw)
To: Aditi Ghag; +Cc: sdf, Martin KaFai Lau, bpf, Network Development
On 9/25/23 4:34 PM, Aditi Ghag wrote:
>
>
>> On Sep 19, 2023, at 5:38 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 5/19/23 3:51 PM, Aditi Ghag wrote:
>>> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>>> +{
>>> + struct bpf_udp_iter_state *iter = seq->private;
>>> + struct udp_iter_state *state = &iter->state;
>>> + struct net *net = seq_file_net(seq);
>>> + struct udp_table *udptable;
>>> + unsigned int batch_sks = 0;
>>> + bool resized = false;
>>> + struct sock *sk;
>>> +
>>> + /* The current batch is done, so advance the bucket. */
>>> + if (iter->st_bucket_done) {
>>> + state->bucket++;
>>> + iter->offset = 0;
>>> + }
>>> +
>>> + udptable = udp_get_table_seq(seq, net);
>>> +
>>> +again:
>>> + /* New batch for the next bucket.
>>> + * Iterate over the hash table to find a bucket with sockets matching
>>> + * the iterator attributes, and return the first matching socket from
>>> + * the bucket. The remaining matched sockets from the bucket are batched
>>> + * before releasing the bucket lock. This allows BPF programs that are
>>> + * called in seq_show to acquire the bucket lock if needed.
>>> + */
>>> + iter->cur_sk = 0;
>>> + iter->end_sk = 0;
>>> + iter->st_bucket_done = false;
>>> + batch_sks = 0;
>>> +
>>> + for (; state->bucket <= udptable->mask; state->bucket++) {
>>> + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>>> +
>>> + if (hlist_empty(&hslot2->head)) {
>>> + iter->offset = 0;
>>> + continue;
>>> + }
>>> +
>>> + spin_lock_bh(&hslot2->lock);
>>> + udp_portaddr_for_each_entry(sk, &hslot2->head) {
>>> + if (seq_sk_match(seq, sk)) {
>>> + /* Resume from the last iterated socket at the
>>> + * offset in the bucket before iterator was stopped.
>>> + */
>>> + if (iter->offset) {
>>> + --iter->offset;
>>
>> Hi Aditi, I think this part has a bug.
>>
>> When I run './test_progs -t bpf_iter/udp6' in a machine with some udp so_reuseport sockets, this test is never finished.
>>
>> A broken case I am seeing is when the bucket has >1 sockets and bpf_seq_read() can only get one sk at a time before it calls bpf_iter_udp_seq_stop().
>
> Just so that I understand the broken case better, are you doing something in your BPF iterator program so that "bpf_seq_read() can only get one sk at a time"?
>
>>
>> I did not try the change yet. However, from looking at the code where iter->offset is changed, --iter->offset here is the most likely culprit and it will make backward progress for the same bucket (state->bucket). Other places touching iter->offset look fine.
>>
>> It needs a local "int offset" variable for the zero test. Could you help to take a look, add (or modify) a test and fix it?
>>
>> The progs/bpf_iter_udp[46].c test can be used to reproduce. The test_udp[46] in prog_tests/bpf_iter.c needs to be changed though to ensure there is multiple sk in the same bucket. Probably a few so_reuseport sk should do.
>
>
> The sock_destroy patch set had added a test with multiple so_reuseport sks in a bucket in order to exercise batching [1]. I was wondering if extending the test with an additional bucket should do it, or some more cases are required (asked for clarification above) to reproduce the issue.
Number of bucket should not matter. It should only need a bucket to have
multiple sockets.
I did notice test_udp_server() has 5 so_reuseport udp sk in the same bucket when
trying to understand how this issue was missed. It is enough on the hashtable
side. This is the easier part and one start_reuseport_server() call will do.
Having multiple sk in a bucket is not enough to reprod though.
The bpf prog 'iter_udp6_server' in the sock_destroy test is not doing
bpf_seq_printf(). bpf_seq_printf() is necessary to reproduce the issue. The
read() buf from the userspace program side also needs to be small. It needs to
hit the "if (seq->count >= size) break;" condition in the "while (1)" loop in
the kernel/bpf/bpf_iter.c.
You can try to add both to the sock_destroy test. I was suggesting
bpf_iter/udp[46] test instead (i.e. the test_udp[46] function) because the
bpf_seq_printf and the buf[] size are all aligned to reprod the problem already.
Try to add a start_reuseport_server(..., 5) to the beginning of test_udp6() in
prog_tests/bpf_iter.c to ensure there is multiple udp sk in a bucket. It should
be enough to reprod.
In the final fix, I don't have strong preference on where the test should be.
Modifying one of the two existing tests (i.e. sock_destroy or bpf_iter) or a
completely new test.
Let me know if you have problem reproducing it. Thanks.
>
>
> [1] https://elixir.bootlin.com/linux/v6.5/source/tools/testing/selftests/bpf/prog_tests/sock_destroy.c#L146
>
>>
>> Thanks.
>>
>>> + continue;
>>> + }
>>> + if (iter->end_
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
2023-09-25 23:34 ` Aditi Ghag
2023-09-26 5:02 ` Martin KaFai Lau
@ 2023-09-26 5:07 ` Martin KaFai Lau
1 sibling, 0 replies; 7+ messages in thread
From: Martin KaFai Lau @ 2023-09-26 5:07 UTC (permalink / raw)
To: Aditi Ghag; +Cc: sdf, Martin KaFai Lau, bpf, Network Development
On 9/25/23 4:34 PM, Aditi Ghag wrote:
> Just so that I understand the broken case better, are you doing something in your BPF iterator program so that "bpf_seq_read() can only get one sk at a time"?
ah, hit send too early.
Yes, bpf_seq_printf(). It is why I was suggesting to use the bpf_iter/udp[46] to
reprod by adding start_reuseport_server(). Please see my earlier reply.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
2023-09-26 5:02 ` Martin KaFai Lau
@ 2023-09-26 16:07 ` Aditi Ghag
2023-10-24 22:50 ` Aditi Ghag
0 siblings, 1 reply; 7+ messages in thread
From: Aditi Ghag @ 2023-09-26 16:07 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: sdf, Martin KaFai Lau, bpf, Network Development
> On Sep 25, 2023, at 10:02 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 9/25/23 4:34 PM, Aditi Ghag wrote:
>>> On Sep 19, 2023, at 5:38 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 5/19/23 3:51 PM, Aditi Ghag wrote:
>>>> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>>>> +{
>>>> + struct bpf_udp_iter_state *iter = seq->private;
>>>> + struct udp_iter_state *state = &iter->state;
>>>> + struct net *net = seq_file_net(seq);
>>>> + struct udp_table *udptable;
>>>> + unsigned int batch_sks = 0;
>>>> + bool resized = false;
>>>> + struct sock *sk;
>>>> +
>>>> + /* The current batch is done, so advance the bucket. */
>>>> + if (iter->st_bucket_done) {
>>>> + state->bucket++;
>>>> + iter->offset = 0;
>>>> + }
>>>> +
>>>> + udptable = udp_get_table_seq(seq, net);
>>>> +
>>>> +again:
>>>> + /* New batch for the next bucket.
>>>> + * Iterate over the hash table to find a bucket with sockets matching
>>>> + * the iterator attributes, and return the first matching socket from
>>>> + * the bucket. The remaining matched sockets from the bucket are batched
>>>> + * before releasing the bucket lock. This allows BPF programs that are
>>>> + * called in seq_show to acquire the bucket lock if needed.
>>>> + */
>>>> + iter->cur_sk = 0;
>>>> + iter->end_sk = 0;
>>>> + iter->st_bucket_done = false;
>>>> + batch_sks = 0;
>>>> +
>>>> + for (; state->bucket <= udptable->mask; state->bucket++) {
>>>> + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>>>> +
>>>> + if (hlist_empty(&hslot2->head)) {
>>>> + iter->offset = 0;
>>>> + continue;
>>>> + }
>>>> +
>>>> + spin_lock_bh(&hslot2->lock);
>>>> + udp_portaddr_for_each_entry(sk, &hslot2->head) {
>>>> + if (seq_sk_match(seq, sk)) {
>>>> + /* Resume from the last iterated socket at the
>>>> + * offset in the bucket before iterator was stopped.
>>>> + */
>>>> + if (iter->offset) {
>>>> + --iter->offset;
>>>
>>> Hi Aditi, I think this part has a bug.
>>>
>>> When I run './test_progs -t bpf_iter/udp6' in a machine with some udp so_reuseport sockets, this test is never finished.
>>>
>>> A broken case I am seeing is when the bucket has >1 sockets and bpf_seq_read() can only get one sk at a time before it calls bpf_iter_udp_seq_stop().
>> Just so that I understand the broken case better, are you doing something in your BPF iterator program so that "bpf_seq_read() can only get one sk at a time"?
>>>
>>> I did not try the change yet. However, from looking at the code where iter->offset is changed, --iter->offset here is the most likely culprit and it will make backward progress for the same bucket (state->bucket). Other places touching iter->offset look fine.
>>>
>>> It needs a local "int offset" variable for the zero test. Could you help to take a look, add (or modify) a test and fix it?
>>>
>>> The progs/bpf_iter_udp[46].c test can be used to reproduce. The test_udp[46] in prog_tests/bpf_iter.c needs to be changed though to ensure there is multiple sk in the same bucket. Probably a few so_reuseport sk should do.
>> The sock_destroy patch set had added a test with multiple so_reuseport sks in a bucket in order to exercise batching [1]. I was wondering if extending the test with an additional bucket should do it, or some more cases are required (asked for clarification above) to reproduce the issue.
>
> Number of bucket should not matter. It should only need a bucket to have multiple sockets.
>
> I did notice test_udp_server() has 5 so_reuseport udp sk in the same bucket when trying to understand how this issue was missed. It is enough on the hashtable side. This is the easier part and one start_reuseport_server() call will do. Having multiple sk in a bucket is not enough to reprod though.
>
> The bpf prog 'iter_udp6_server' in the sock_destroy test is not doing bpf_seq_printf(). bpf_seq_printf() is necessary to reproduce the issue. The read() buf from the userspace program side also needs to be small. It needs to hit the "if (seq->count >= size) break;" condition in the "while (1)" loop in the kernel/bpf/bpf_iter.c.
>
> You can try to add both to the sock_destroy test. I was suggesting bpf_iter/udp[46] test instead (i.e. the test_udp[46] function) because the bpf_seq_printf and the buf[] size are all aligned to reprod the problem already. Try to add a start_reuseport_server(..., 5) to the beginning of test_udp6() in prog_tests/bpf_iter.c to ensure there is multiple udp sk in a bucket. It should be enough to reprod.
Gotcha! I think I understand the repro steps. The offset field in question was added for this scenario where an iterator is stopped and resumed that the sock_destroy test cases don't entirely exercise.
Thanks!
>
> In the final fix, I don't have strong preference on where the test should be.
> Modifying one of the two existing tests (i.e. sock_destroy or bpf_iter) or a completely new test.
>
> Let me know if you have problem reproducing it. Thanks.
>
>> [1] https://elixir.bootlin.com/linux/v6.5/source/tools/testing/selftests/bpf/prog_tests/sock_destroy.c#L146
>>>
>>> Thanks.
>>>
>>>> + continue;
>>>> + }
>>>> + if (iter->end_
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator
2023-09-26 16:07 ` Aditi Ghag
@ 2023-10-24 22:50 ` Aditi Ghag
0 siblings, 0 replies; 7+ messages in thread
From: Aditi Ghag @ 2023-10-24 22:50 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Stanislav Fomichev, Martin KaFai Lau, bpf, Network Development
> On Sep 26, 2023, at 9:07 AM, Aditi Ghag <aditi.ghag@isovalent.com> wrote:
>
>
>
>> On Sep 25, 2023, at 10:02 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 9/25/23 4:34 PM, Aditi Ghag wrote:
>>>> On Sep 19, 2023, at 5:38 PM, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>>
>>>> On 5/19/23 3:51 PM, Aditi Ghag wrote:
>>>>> +static struct sock *bpf_iter_udp_batch(struct seq_file *seq)
>>>>> +{
>>>>> + struct bpf_udp_iter_state *iter = seq->private;
>>>>> + struct udp_iter_state *state = &iter->state;
>>>>> + struct net *net = seq_file_net(seq);
>>>>> + struct udp_table *udptable;
>>>>> + unsigned int batch_sks = 0;
>>>>> + bool resized = false;
>>>>> + struct sock *sk;
>>>>> +
>>>>> + /* The current batch is done, so advance the bucket. */
>>>>> + if (iter->st_bucket_done) {
>>>>> + state->bucket++;
>>>>> + iter->offset = 0;
>>>>> + }
>>>>> +
>>>>> + udptable = udp_get_table_seq(seq, net);
>>>>> +
>>>>> +again:
>>>>> + /* New batch for the next bucket.
>>>>> + * Iterate over the hash table to find a bucket with sockets matching
>>>>> + * the iterator attributes, and return the first matching socket from
>>>>> + * the bucket. The remaining matched sockets from the bucket are batched
>>>>> + * before releasing the bucket lock. This allows BPF programs that are
>>>>> + * called in seq_show to acquire the bucket lock if needed.
>>>>> + */
>>>>> + iter->cur_sk = 0;
>>>>> + iter->end_sk = 0;
>>>>> + iter->st_bucket_done = false;
>>>>> + batch_sks = 0;
>>>>> +
>>>>> + for (; state->bucket <= udptable->mask; state->bucket++) {
>>>>> + struct udp_hslot *hslot2 = &udptable->hash2[state->bucket];
>>>>> +
>>>>> + if (hlist_empty(&hslot2->head)) {
>>>>> + iter->offset = 0;
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + spin_lock_bh(&hslot2->lock);
>>>>> + udp_portaddr_for_each_entry(sk, &hslot2->head) {
>>>>> + if (seq_sk_match(seq, sk)) {
>>>>> + /* Resume from the last iterated socket at the
>>>>> + * offset in the bucket before iterator was stopped.
>>>>> + */
>>>>> + if (iter->offset) {
>>>>> + --iter->offset;
>>>>
>>>> Hi Aditi, I think this part has a bug.
>>>>
>>>> When I run './test_progs -t bpf_iter/udp6' in a machine with some udp so_reuseport sockets, this test is never finished.
>>>>
>>>> A broken case I am seeing is when the bucket has >1 sockets and bpf_seq_read() can only get one sk at a time before it calls bpf_iter_udp_seq_stop().
>>> Just so that I understand the broken case better, are you doing something in your BPF iterator program so that "bpf_seq_read() can only get one sk at a time"?
>>>>
>>>> I did not try the change yet. However, from looking at the code where iter->offset is changed, --iter->offset here is the most likely culprit and it will make backward progress for the same bucket (state->bucket). Other places touching iter->offset look fine.
>>>>
>>>> It needs a local "int offset" variable for the zero test. Could you help to take a look, add (or modify) a test and fix it?
>>>>
>>>> The progs/bpf_iter_udp[46].c test can be used to reproduce. The test_udp[46] in prog_tests/bpf_iter.c needs to be changed though to ensure there is multiple sk in the same bucket. Probably a few so_reuseport sk should do.
>>> The sock_destroy patch set had added a test with multiple so_reuseport sks in a bucket in order to exercise batching [1]. I was wondering if extending the test with an additional bucket should do it, or some more cases are required (asked for clarification above) to reproduce the issue.
>>
>> Number of bucket should not matter. It should only need a bucket to have multiple sockets.
>>
>> I did notice test_udp_server() has 5 so_reuseport udp sk in the same bucket when trying to understand how this issue was missed. It is enough on the hashtable side. This is the easier part and one start_reuseport_server() call will do. Having multiple sk in a bucket is not enough to reprod though.
>>
>> The bpf prog 'iter_udp6_server' in the sock_destroy test is not doing bpf_seq_printf(). bpf_seq_printf() is necessary to reproduce the issue. The read() buf from the userspace program side also needs to be small. It needs to hit the "if (seq->count >= size) break;" condition in the "while (1)" loop in the kernel/bpf/bpf_iter.c.
>>
>> You can try to add both to the sock_destroy test. I was suggesting bpf_iter/udp[46] test instead (i.e. the test_udp[46] function) because the bpf_seq_printf and the buf[] size are all aligned to reprod the problem already. Try to add a start_reuseport_server(..., 5) to the beginning of test_udp6() in prog_tests/bpf_iter.c to ensure there is multiple udp sk in a bucket. It should be enough to reprod.
>
>
> Gotcha! I think I understand the repro steps. The offset field in question was added for this scenario where an iterator is stopped and resumed that the sock_destroy test cases don't entirely exercise.
> Thanks!
Just a small update: I was able to reproduce the issue where the so_reuseport test hangs by modifying the read buffer in the existing sock_destroy test (test_udp_server). I have a fix, and verified that the test doesn't hang with the fixed code. Will send a patch soon.
>
>>
>> In the final fix, I don't have strong preference on where the test should be.
>> Modifying one of the two existing tests (i.e. sock_destroy or bpf_iter) or a completely new test.
>>
>> Let me know if you have problem reproducing it. Thanks.
>>
>>> [1] https://elixir.bootlin.com/linux/v6.5/source/tools/testing/selftests/bpf/prog_tests/sock_destroy.c#L146
>>>>
>>>> Thanks.
>>>>
>>>>> + continue;
>>>>> + }
>>>>> + if (iter->end_
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-10-24 22:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230519225157.760788-1-aditi.ghag@isovalent.com>
[not found] ` <20230519225157.760788-6-aditi.ghag@isovalent.com>
2023-09-20 0:38 ` [PATCH v9 bpf-next 5/9] bpf: udp: Implement batching for sockets iterator Martin KaFai Lau
2023-09-20 17:16 ` Aditi Ghag
2023-09-25 23:34 ` Aditi Ghag
2023-09-26 5:02 ` Martin KaFai Lau
2023-09-26 16:07 ` Aditi Ghag
2023-10-24 22:50 ` Aditi Ghag
2023-09-26 5:07 ` Martin KaFai Lau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).