From: lxlenovostar@gmail.com (lx)
To: kernelnewbies@lists.kernelnewbies.org
Subject: delete NIC from poll list in process_backlog()
Date: Wed, 24 Dec 2014 10:42:18 +0800 [thread overview]
Message-ID: <549A280A.3010706@gmail.com> (raw)
In-Reply-To: <CAPeXpcR65JANS-aULNWrnCS46gMSZ1oufG3qgOE_Vtt5cyRVOQ@mail.gmail.com>
hi all:
I think I got it. The new kernel just remove the NIC from poll list
first, If we should repoll it, kernel will add it again. The codes is:
4560 static void net_rx_action(struct softirq_action *h)
4561 {
4562 struct softnet_data *sd = this_cpu_ptr(&softnet_data);
4563 unsigned long time_limit = jiffies + 2;
4564 int budget = netdev_budget;
4565 LIST_HEAD(list);
4566 LIST_HEAD(repoll);
4567 void *have;
4568
4569 local_irq_disable();
4570 list_splice_init(&sd->poll_list, &list);
4571 local_irq_enable();
4572
4573 while (!list_empty(&list)) {
4574 struct napi_struct *n;
4575 int work, weight;
4576
4577 /* If softirq window is exhausted then punt.
4578 * Allow this to run for 2 jiffies since which will allow
4579 * an average latency of 1.5/HZ.
4580 */
4581 if (unlikely(budget <= 0 || time_after_eq(jiffies,
time_limit)))
4582 goto softnet_break;
4583
4584
4585 n = list_first_entry(&list, struct napi_struct, poll_list);
4586 list_del_init(&n->poll_list);
4587
4588 have = netpoll_poll_lock(n);
4589
4590 weight = n->weight;
4591
4592 /* This NAPI_STATE_SCHED test is for avoiding a race
4593 * with netpoll's poll_napi(). Only the entity which
4594 * obtains the lock and sees NAPI_STATE_SCHED set will
4595 * actually make the ->poll() call. Therefore we avoid
4596 * accidentally calling ->poll() when NAPI is not scheduled.
4597 */
4598 work = 0;
4599 if (test_bit(NAPI_STATE_SCHED, &n->state)) {
4600 work = n->poll(n, weight);
4601 trace_napi_poll(n);
4602 }
4603
4604 WARN_ON_ONCE(work > weight);
4605
4606 budget -= work;
4607
4608 /* Drivers must not modify the NAPI state if they
4609 * consume the entire weight. In such cases this code
4610 * still "owns" the NAPI instance and therefore can
4611 * move the instance around on the list at-will.
4612 */
4613 if (unlikely(work == weight)) {
4614 if (unlikely(napi_disable_pending(n))) {
4615 napi_complete(n);
4616 } else {
4617 if (n->gro_list) {
4618 /* flush too old packets
4619 * If HZ < 1000, flush all packets.
4620 */
4621 napi_gro_flush(n, HZ >= 1000);
4622 }
4623 list_add_tail(&n->poll_list, &repoll);
4624 }
4625 }
4626
4627 netpoll_poll_unlock(have);
4628 }
4629
4630 if (!sd_has_rps_ipi_waiting(sd) &&
4631 list_empty(&list) &&
4632 list_empty(&repoll))
4633 return;
4634 out:
4635 local_irq_disable();
4636
4637 list_splice_tail_init(&sd->poll_list, &list);
4638 list_splice_tail(&repoll, &list);
4639 list_splice(&list, &sd->poll_list);
4640 if (!list_empty(&sd->poll_list))
4641 __raise_softirq_irqoff(NET_RX_SOFTIRQ);
4642
4643 net_rps_action_and_irq_enable(sd);
4644
4645 return;
4646
4647 softnet_break:
4648 sd->time_squeeze++;
4649 goto out;
4650 }
line 4585: choose a NIC which will poll.
line 4586: remove the NIC from the poll list.
line 4623: add the NIC again in repoll list if we nedd poll it.
line 4638: splice the poll list and repoll list.
So, I think this is not a bug.
@valdis.kletnieks
I think we should remove NIC form poll list after the list of
sd->input_pkt_queue is empty, so I think code should run like this:
4350 rps_lock(sd);
4351 if (skb_queue_empty(&sd->input_pkt_queue)) {
4352 /*
4353 * Inline a custom version of __napi_complete().
4354 * only current cpu owns and manipulates this napi,
4355 * and NAPI_STATE_SCHED is the only possible flag set
4356 * on backlog.
4357 * We can use a plain write instead of clear_bit(),
4358 * and we dont need an smp_mb() memory barrier.
4359 */
list_del(&napi->poll_list); // I add it.
4360 napi->state = 0;
4361 rps_unlock(sd);
4362
4363 break;
4364 }
Thank you
? 2014/12/23 20:13, Vignesh Radhakrishnan ??:
> We do delete it. I am not very thorough with the code , but here is what
> i think is happening.
>
> process_backlog is set as the backlog polling function for softnet data
> sd->backlog.poll = process_backlog;
>
> Now net_rx_action which runs in soft irq context, calls the polling as
> per the following code and the deletion as well.
>
>
> Notice where we are deleting it :
>
> n = list_first_entry(&list, struct napi_struct, poll_list);
> list_del_init(&n->poll_list);
> --------------------------------------> Here is the deletion.
>
> have = netpoll_poll_lock(n);
>
> weight = n->weight;
>
> /* This NAPI_STATE_SCHED test is for avoiding a race
> * with netpoll's poll_napi(). Only the entity which
> * obtains the lock and sees NAPI_STATE_SCHED set will
> * actually make the ->poll() call. Therefore we avoid
> * accidentally calling ->poll() when NAPI is not
> scheduled.
> */
> work = 0;
> if (test_bit(NAPI_STATE_SCHED, &n->state)) {
> work = n->poll(n, weight);
> -------------------------------------> Process_backlog is called here
> trace_napi_poll(n);
> }
>
> Earlier we had deletion in process_backlog itself, but
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d75b1ade567ffab085e8adbbdacf0092d10cd09c
> has changed it.
>
> Thanks and regards,
> Vignesh Radhakrishnan
>
> On Tue, Dec 23, 2014 at 2:29 PM, lx <lxlenovostar@gmail.com
> <mailto:lxlenovostar@gmail.com>> wrote:
>
> hi all:
> I read the function of process_backlog, and I found it don't
> delete the NIC form poll list. the codes is:
>
> 4321 static int process_backlog(struct napi_struct *napi, int quota)
> 4322 {
> 4323 int work = 0;
> 4324 struct softnet_data *sd = container_of(napi, struct
> softnet_data, backlog);
> 4325
> 4326 /* Check if we have pending ipi, its better to send them now,
> 4327 * not waiting net_rx_action() end.
> 4328 */
> 4329 if (sd_has_rps_ipi_waiting(sd)) {
> 4330 local_irq_disable();
> 4331 net_rps_action_and_irq_enable(sd);
> 4332 }
> 4333
> 4334 napi->weight = weight_p;
> 4335 local_irq_disable();
> 4336 while (1) {
> 4337 struct sk_buff *skb;
> 4338
> 4339 while ((skb = __skb_dequeue(&sd->process_queue))) {
> 4340 local_irq_enable();
> 4341 __netif_receive_skb(skb);
> 4342 local_irq_disable();
> 4343 input_queue_head_incr(sd);
> 4344 if (++work >= quota) {
> 4345 local_irq_enable();
> 4346 return work;
> 4347 }
> 4348 }
> 4349
> 4350 rps_lock(sd);
> 4351 if (skb_queue_empty(&sd->input_pkt_queue)) {
> 4352 /*
> 4353 * Inline a custom version of __napi_complete().
> 4354 * only current cpu owns and manipulates this napi,
> 4355 * and NAPI_STATE_SCHED is the only possible flag set
> 4356 * on backlog.
> 4357 * We can use a plain write instead of clear_bit(),
> 4358 * and we dont need an smp_mb() memory barrier.
> 4359 */
> 4360 napi->state = 0;
> 4361 rps_unlock(sd);
> 4362
> 4363 break;
> 4364 }
> 4365
> 4366 skb_queue_splice_tail_init(&sd->input_pkt_queue,
> 4367 &sd->process_queue);
> 4368 rps_unlock(sd);
> 4369 }
> 4370 local_irq_enable();
> 4371
> 4372 return work;
> 4373 }
>
> I think we should delete the NIC from the poll list by:
> list_del(&napi->poll_list);
>
> and my git repository infromation is:
>
> [root at localhost linux]# git show
> commit aa39477b5692611b91ac9455ae588738852b3f60
> Merge: 48ec833 5164bec
> Author: Linus Torvalds <torvalds@linux-foundation.org
> <mailto:torvalds@linux-foundation.org>>
> Date: Mon Dec 22 14:47:17 2014 -0800
>
> Merge tag 'dm-3.19-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
> <http://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm>
> Pull device mapper fixes from Mike Snitzer:
> "Thre stable fixes and one fix for a regression introduced
> during 3.19
> merge:
> - Fix inability to discard used space when the thin-pool
> target is in
> out-of-data-space mode and also transition the thin-pool
> back to
> write mode once free space is made available.
> - Fix DM core bio-based end_io bug that prevented proper
> post-processing of the error code returned from the block
> layer.
> - Fix crash in DM thin-pool due to thin device being added
> to the
> pool's active_thins list before properly initializing the thin
> device's refcount"
> * tag 'dm-3.19-fixes' of
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
> <http://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm>:
> dm: fix missed error code if .end_io isn't implemented by
> target_type
> dm thin: fix crash by initializing thin device's refcount and
> completion earlier
> dm thin: fix missing out-of-data-space to write mode
> transition if blocks are released
> dm thin: fix inability to discard blocks when in
> out-of-data-space mode
>
>
> Don't handle the list_del is right, or this is a bug?
>
> Thank you.
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org <mailto:Kernelnewbies@kernelnewbies.org>
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
>
>
> --
> http://vigneshradhakrishnan.blogspot.com/
prev parent reply other threads:[~2014-12-24 2:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-23 8:59 delete NIC from poll list in process_backlog() lx
2014-12-23 14:03 ` Valdis.Kletnieks at vt.edu
[not found] ` <CAPeXpcR65JANS-aULNWrnCS46gMSZ1oufG3qgOE_Vtt5cyRVOQ@mail.gmail.com>
2014-12-24 2:42 ` lx [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549A280A.3010706@gmail.com \
--to=lxlenovostar@gmail.com \
--cc=kernelnewbies@lists.kernelnewbies.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.