Re: [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jarek Poplawski <jarkao2@gmail.com>
To: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: davem@davemloft.net, jeff@garzik.org, netdev@vger.kernel.org,
	akpm@linux-foundation.org,
	Alexander Duyck <alexander.h.duyck@intel.com>
Subject: Re: [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support
Date: Sat, 13 Sep 2008 00:42:12 +0200	[thread overview]
Message-ID: <20080912224212.GA16662@ami.dom.local> (raw)
In-Reply-To: <20080912030750.6982.26892.stgit@jtkirshe-mobile.jf.intel.com>

Jeff Kirsher wrote, On 09/12/2008 05:08 AM:

> From: Alexander Duyck <alexander.h.duyck@intel.com>
...
> diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt
> index d391ea6..5787ee6 100644
> --- a/Documentation/networking/multiqueue.txt
> +++ b/Documentation/networking/multiqueue.txt
> @@ -24,4 +24,49 @@ netif_{start|stop|wake}_subqueue() functions to manage each queue while the
>  device is still operational.  netdev->queue_lock is still used when the device
>  comes online or when it's completely shut down (unregister_netdev(), etc.).
>  
> -Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
> +
> +Section 2: Qdisc support for multiqueue devices
> +
> +-----------------------------------------------
> +
> +Currently two qdiscs support multiqueue devices.

This looks like a bit misleading: one could think it's necessary to use
one of these two with mq devices.

>  The first is the default
> +pfifo_fast qdisc.  This qdisc supports one qdisc per hardware queue.  A new
> +round-robin qdisc, sch_multiq also supports multiple hardware queues. The
> +qdisc is responsible for classifying the skb's and then directing the skb's to
> +bands and queues based on the value in skb->queue_mapping.  Use this field in
> +the base driver to determine which queue to send the skb to.
> +
> +sch_multiq has been added for hardware that wishes to avoid unnecessary
> +requeuing.

This could suggest that other qdiscs don't wish the same or even
generate excessive requeuing, which IMHO could be true, but it's not
proven, and might be only temporary situation anyway...

Anyway, I wonder why you don't mention here anything from your first
nice explantion to my doubts around the first attempt to restore
multiqueue to prio, like EEDC/DCB/etc?

>   It will cycle though the bands and verify that the hardware queue
> +associated with the band is not stopped prior to dequeuing a packet.
> +
> +On qdisc load, the number of bands is based on the number of queues on the
> +hardware.  Once the association is made, any skb with skb->queue_mapping set,
> +will be queued to the band associated with the hardware queue.
> +
> +
> +Section 3: Brief howto using MULTIQ for multiqueue devices
> +---------------------------------------------------------------
> +
> +The userspace command 'tc,' part of the iproute2 package, is used to configure
> +qdiscs.  To add the MULTIQ qdisc to your network device, assuming the device
> +is called eth0, run the following command:
> +
> +# tc qdisc add dev eth0 root handle 1: multiq
> +
> +The qdisc will allocate the number of bands to equal the number of queues that
> +the device reports, and bring the qdisc online.  Assuming eth0 has 4 Tx
> +queues, the band mapping would look like:
> +
> +band 0 => queue 0
> +band 1 => queue 1
> +band 2 => queue 2
> +band 3 => queue 3
> +
> +Traffic will begin flowing through each queue if your base device has either
> +the default simple_tx_hash or a custom netdev->select_queue() defined.

I think a device doesn't need to have simple_tx_hash defined?

...
> +static int multiq_tune(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct multiq_sched_data *q = qdisc_priv(sch);
> +	struct tc_multiq_qopt *qopt;
> +	int i;
> +
> +	if (sch->parent != TC_H_ROOT)
> +		return -EINVAL;
> +	if (!netif_is_multiqueue(qdisc_dev(sch)))
> +		return -EINVAL;
> +	if (nla_len(opt) < sizeof(*qopt))
> +		return -EINVAL;
> +
> +	qopt = nla_data(opt);
> +
> +	qopt->bands = qdisc_dev(sch)->real_num_tx_queues;
> +
> +	sch_tree_lock(sch);
> +	q->bands = qopt->bands;
> +	for (i = q->bands; i < q->max_bands; i++) {
> +		struct Qdisc *child = xchg(&q->queues[i], &noop_qdisc);
> +		if (child != &noop_qdisc) {

Or maybe?:
		if (q->queues[i] != &noop_qdisc) {
			struct Qdisc *child = xchg(&q->queues[i], &noop_qdisc);

> +			qdisc_tree_decrease_qlen(child, child->q.qlen);
> +			qdisc_destroy(child);
> +		}
> +	}
...
> +static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct multiq_sched_data *q = qdisc_priv(sch);
> +	int i;
> +
> +	q->queues = NULL;
> +
> +	if (opt == NULL)
> +		return -EINVAL;
> +
> +	q->max_bands = qdisc_dev(sch)->num_tx_queues;
> +
> +	q->queues = kcalloc(q->max_bands, sizeof(struct Qdisc *), GFP_KERNEL);
> +	if (!q->queues)
> +		return -ENOBUFS;
> +	for (i = 0; i < q->max_bands; i++)
> +		q->queues[i] = &noop_qdisc;
> +
> +	return multiq_tune(sch, opt);

But multiq_tune() can EINVAL, so kfree is needed here.

Jarek P.

PS: to save some "paper" a typo, I guess, from PATCH 3/3:

> +The behavior of tc filters remains the same.  However a new tc action,
> +skbedit, has been added.  Assuming you wanted to route all traffic to a
> +specific host, for example 192.168.0.3, though a specific queue you could use

 +specific host, for example 192.168.0.3, through a specific queue you could use

next prev parent reply	other threads:[~2008-09-12 22:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-12  3:07 [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent Jeff Kirsher
2008-09-12  3:08 ` [NET-NEXT PATCH 2/3][UPDATED] pkt_sched: Add multiqueue scheduler support Jeff Kirsher
2008-09-12  3:19   ` David Miller
2008-09-12  4:42     ` Alexander Duyck
2008-09-12 22:42   ` Jarek Poplawski [this message]
2008-09-12 23:33     ` Duyck, Alexander H
2008-09-12  3:08 ` [NET-NEXT PATCH 3/3][UPDATED] pkt_action: add new action skbedit Jeff Kirsher
2008-09-12  3:17 ` [NET-NEXT PATCH 1/3] vlan: vlan device not reading gso max size of parent David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080912224212.GA16662@ami.dom.local \
    --to=jarkao2@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=jeff@garzik.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).