public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: William Lee Irwin III <wli@holomorphy.com>
To: "Martin J. Bligh" <mbligh@aracnet.com>
Cc: Dave Jones <davej@codemonkey.org.uk>,
	Roman Zippel <zippel@linux-m68k.org>,
	John Stoffel <stoffel@lucent.com>,
	DevilKin-LKML <devilkin-lkml@blindguardian.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Linux 2.5.70 compile error
Date: Wed, 28 May 2003 00:43:17 -0700	[thread overview]
Message-ID: <20030528074317.GE19818@holomorphy.com> (raw)
In-Reply-To: <13590000.1054102279@[10.10.2.4]>

At some point in the past, my attribution was stripped from:
>> Or better yet, remove all the #ifdefs, finish generalizing the APIC
>> code, and have nothing to configure at all. For 2.7 ...

On Tue, May 27, 2003 at 11:11:20PM -0700, Martin J. Bligh wrote:
> For the "commerical" options like Summit and bigsmp, I think this is an
> option for 2.5 even, given some more testing. And then the wierdo arches
> can be better hidden ;-)

The delimiter between weirdo and non-weirdo arches is wrong IMHO; so
long as the only differences are APIC handling (and that in fact holds
for NUMA-Q and a couple of others that were left out of the generic
bits) a "proper" APIC abstraction should handle it just fine.

One might argue that NUMA-Q is a corner case not worth handling by a
generic APIC layer; it is a corner case, however it far more clearly
shows where generality is needed, in particular because it mixes
clustered hierarchical DFR and logical IPI's with physical DESTMOD in
IO-APIC RTE's. At the moment the general confusions, mis-declarations,
and what appear to be either latent or actual bugs going around
include/asm-i386/mach-*/mach_apic.h point to a need for a consolidated
library of APIC macros and functions accurately conforming to the Intel
specifications. AFAICT there are four and only four or five parameters
that really make a difference:

(1) APIC vs. xAPIC
(2) clustered hierarchical DFR vs. flat DFR
(3) physical DESTMOD vs. logical DESTMOD in IO-APIC RTE's
(4) wakeup via INIT or via NMI
(5) physical IPI's or logical IPI's

So one could easily form destinations by:

physical_destination_t cpumask_to_physical_destination(cpumask_t cpumask)
{
	int cpu, ncpus = cpus_weight(cpumask);

	if (!ncpus) {
		WARN_ON(1);
		if (xapic)
			return XAPIC_PHYSICAL_BROADCAST;
		else
			return SAPIC_PHYSICAL_BROADCAST;
	}

	WARN_ON(ncpus > 1);
	cpu = first_cpu(cpumask);
	if (!xapic && cpu >= 16) {
		WARN_ON(1);
		return SAPIC_PHYSICAL_BROADCAST;
	}

	return (physical_destination_t) { 1UL << cpu };
}

logical_destination_t cpumask_to_logical_destination(cpumask_t cpumask)
{
	unsigned long k, raw_dest = 0UL;

	if (apic_dfr_flat) {
		cpumask_t invalid_mask = cpus_promote(0xFFUL);
		cpus_complement(invalid_mask);
		cpus_and(invalid_mask, invalid_mask, cpumask);
		WARN_ON(!cpus_empty(invalid_mask));
		return (logical_destination_t){ cpus_coerce(cpumask) & 0xFFUL };
	}

	for (k = 0; k < NR_CPUS; ++k) {
		if (!cpu_isset(k, cpumask))
			continue;
		if (raw_dest) {
			unsigned long old_cluster, new_cluster;
			old_cluster = raw_dest >> 4;
			new_cluster = k >> 4;
			if (old_cluster != new_cluster) {
				WARN_ON(1);
				continue;
			}
		}
		raw_dest |= (k & ~0xFUL) | (1UL << (k & 0xf));
	}
	if (!raw_dest) {
		WARN_ON(1);
		return APIC_LOGICAL_BROADCAST;
	}
	return (logical_destination_t) { raw_dest };
}

Destination formation is done in two contexts:
(1) IPI's
(2) IO-APIC RTE's

The higher level of abstraction could easily insulate subarches with:

ipi_destination_t cpumask_to_ipi_destination(cpumask_t cpumask)
{
	switch (APIC_IPI_MODE) {
		case APIC_IPI_MODE_PHYSICAL:
			return cpumask_to_physical_destination(cpumask);
			break;
		case APIC_IPI_MODE_LOGICAL:
			return cpumask_to_logical_destination(cpumask);
			break;
		default:
			BUG();
			return (ipi_destination_t) { 0 };
	}
}

and

rte_destination_t cpumask_to_ipi_destination(cpumask_t cpumask)
{
	switch (IOAPIC_RTE_DESTMOD) {
		case IOAPIC_RTE_DESTMOD_PHYSICAL:
			return cpumask_to_physical_destination(cpumask);
			break;
		case IOAPIC_RTE_DESTMOD_LOGICAL:
			return cpumask_to_logical_destination(cpumask);
			break;
		default:
			BUG();
			return (rte_destination_t) { 0 };
	}
}

with ipi_destination_t typedef'd to physical_destination_t or
logical_destination_t according to the subarch.

CPU wakeup would proceed similarly; just check an operational variable
and either NMI with a logical destination or INIT with a physical
destination, with tabulated BIOS IDT addresses to edit.

And the infamous setup_ioapic_ids_from_mpc() needs a notion of APIC
buses to which IO-APIC's and cpus are connected. It should be look like:

static void __init setup_ioapic_ids_from_mpc(void)
{
	int apic_bus;

	for (apic_bus = 0; apic_bus < MAX_APIC_BUSES; ++apic_bus) {
		unsigned long local_physid_map[BITS_TO_LONGS(XAPIC_PHYSICAL_BROADCAST+1)] = { [0...XAPIC_PHYSICAL_BROADCAST] = 0UL };
		int ioapic, cpu;

		if (!apic_bus_present(apic_bus))
			continue;

		for (cpu = 0; cpu < NR_CPUS; ++cpu) {
			if (!cpu_present(cpu))
				continue;
			if (apic_bus_of_cpu(cpu) != apic_bus)
				continue;

			set_bit(cpu, local_physid_map);
		}

		for (ioapic = 0; ioapic < nr_ioapics; ++ioapic) {
			int ioapic_physid, entry;

			if (apic_bus_of_ioapic(ioapic) != apic_bus)
				continue;

			if (MP_APICID_PHYSICAL)
				ioapic_physid = mp_ioapics[ioapic].mpc_apicid;
			else {
				for (entry = 0; entry < MAX_MPC_ENTRY; ++entry)
					if (!translation_table[entry])
						continue;
					if (translation_table[entry]->mpc_apicid != mp_ioapics[ioapic].mpc_apicid)
						continue;
					if (translation_table[entry]->trans_type == MP_IOAPIC)
						break;

				if (entry >= MAX_MPC_ENTRY)
					panic("IO-APIC %d not listed in"
						" translation table\n", ioapic);
				ioapic_physid = translation_table[entry]->trans_local;
			}

			if (test_bit(ioapic_physid, local_physid_map)) {
				unsigned long free_physids[BITS_TO_LONGS(XAPIC_PHYSICAL_BROADCAST+1)];
				int j, new_physid;
				unsigned long flags;
				struct IO_APIC_reg_00 reg_00;

				for (j = 0; j < ARRAY_SIZE(free_physids); ++j)
					free_physids[j] = ~local_physid_map[j];

				new_physid = find_first_bit(free_physids,
						XAPIC_PHYSICAL_BROADCAST + 1);
				if (!xapic && new_physid >= SAPIC_PHYSICAL_BROADCAST)
					panic("physical APIC ID clash not "
						"resolvable due to physical "
						"APIC ID space exhaustion on "
						"APIC bus %d\n", apic_bus);
				printk("renumbering IO-APIC %d to physical"
					"APIC ID %d from %d\n",
					ioapic, new_physid, ioapic_physid);
				spin_lock_irqsave(&ioapic_lock, flags);
				*(int *)&reg_00 = io_apic_read(ioapic, 0);
				reg_00.ID = new_physid;
				io_apic_write(ioapic, 0, *(int *)&reg_00);
				spin_unlock_irqsave(&ioapic_lock, flags);
				set_bit(new_physid, local_physid_map);
				if (MP_APICID_PHYSICAL)
					mp_ioapics[ioapic].mpc_apicid =
								new_physid;
				else
					translation_table[entry]->trans_local =
								new_physid;
			}
			set_bit(ioapic_physid, local_physid_map);
		}
	}
}

Afterward, APIC-based subarches would just declare their preferences
and the APIC library would dispatch on the operational variables they
declare in mach_apic.h. Shoving the entire thing in a header with
inlines kills off so-called runtime overhead, and the generic subarch
bits would be largely unaltered, and provide a machine vector largely
identical to what it does today. Then, since APIC support is complete,
the detection phases would retarget the APIC driver to the proper
machine vector for the detected machine configuration and work across
a broader variety of systems than it does today.

This is not a pointless exercise: the bogosities I pointed out in the
userspace irqbalancing thread are very real. In particular the APIC vs.
xAPIC oddities for mach-default, though they're not exercised in any
obvious way, are very troubling, and it's unclear whether bigsmp is
peroperly functional as it stands in 2.5.70, since target_cpus()
disallows RTE's targeting clusters > 0 and NO_BALANCE_IRQ is 1. It
vaguely looks like bigsmp cut and paste NUMA-Q APIC code and did the
wrong thing for xAPIC as a result.

Following the spec with respect to destination formation is just not
that far out, people. The above code is just "example code" or whatever
and the BUG() checks are probably overly strict, but the general
notions are basically the rest of what's needed for true APIC genericity.


-- wli

  reply	other threads:[~2003-05-28  7:30 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-05-27  2:08 Linux 2.5.70 Linus Torvalds
2003-05-27  2:45 ` Linux 2.5.70 (Compiler warnings) Udo A. Steinberg
2003-05-27  3:09   ` YOSHIFUJI Hideaki / 吉藤英明
2003-05-27  7:10     ` David S. Miller
2003-05-27  8:48 ` Linux 2.5.70 compile error DevilKin
2003-05-27 13:05   ` William Lee Irwin III
2003-05-27 15:29     ` DevilKin-LKML
2003-05-27 15:36       ` William Lee Irwin III
2003-05-27 15:48         ` John Stoffel
2003-05-27 15:52           ` William Lee Irwin III
2003-05-27 16:35             ` John Stoffel
2003-05-27 16:43               ` William Lee Irwin III
2003-05-27 17:50                 ` John Stoffel
2003-05-27 17:56                   ` William Lee Irwin III
2003-05-27 21:56                   ` Martin J. Bligh
2003-05-27 18:19           ` Roman Zippel
2003-05-27 18:40             ` Dave Jones
2003-05-27 18:50               ` William Lee Irwin III
2003-05-27 21:59               ` Martin J. Bligh
2003-05-27 23:40                 ` Dave Jones
2003-05-28  3:20                   ` Martin J. Bligh
2003-05-28  3:34                 ` William Lee Irwin III
2003-05-28  6:11                   ` Martin J. Bligh
2003-05-28  7:43                     ` William Lee Irwin III [this message]
2003-05-28 15:28                       ` Martin J. Bligh
2003-05-29  1:14                         ` William Lee Irwin III
2003-05-28  6:52                   ` Dave Jones
2003-05-28 22:08                 ` Bill Davidsen
2003-05-29  0:36                   ` Martin J. Bligh
2003-05-27 19:46             ` John Stoffel
2003-05-27 15:38       ` Sean Neakums
2003-05-27 15:52         ` Martin J. Bligh
2003-05-27 16:58 ` Linux 2.5.70 Ricky Beam
2003-05-27 17:26   ` Linus Torvalds
2003-05-27 16:48     ` Alan Cox
2003-05-27 17:53       ` Linus Torvalds
2003-05-27 21:15         ` Hans Reiser
2003-05-27 18:09     ` Ricky Beam
2003-05-27 18:22       ` Jeff Garzik
2003-05-27 18:31       ` Robert Love
2003-05-27 19:18       ` Adrian Bunk
2003-05-28 16:08       ` Bill Davidsen
2003-05-28 15:58     ` Bill Davidsen
2003-05-28 16:14       ` Linus Torvalds
2003-05-28 19:22         ` ismail (cartman) donmez
2003-05-29 11:14           ` Dave Jones
2003-05-29 11:14             ` ismail (cartman) donmez
2003-05-28 22:40         ` Bill Davidsen
2003-05-27 21:06   ` Andreas Boman
2003-05-28  2:30   ` Daniel Phillips
2003-05-27 21:42 ` John Cherry
2003-05-28 17:55   ` John Cherry
2003-05-28  5:45 ` [PATCH] register_ioctl32_conversion symbol exports fix Andres Salomon
     [not found]   ` <20030528074701.GA17449@fs.tum.de>
2003-05-28  8:05     ` Andres Salomon
2003-05-28 14:59 ` Linux 2.5.70 Paweł Gołaszewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030528074317.GE19818@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=davej@codemonkey.org.uk \
    --cc=devilkin-lkml@blindguardian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@aracnet.com \
    --cc=stoffel@lucent.com \
    --cc=zippel@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox