From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 23 Aug 2001 15:41:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 23 Aug 2001 15:41:18 -0400 Received: from stat8.steeleye.com ([63.113.59.41]:63749 "EHLO fenric.sc.steeleye.com") by vger.kernel.org with ESMTP id ; Thu, 23 Aug 2001 15:40:58 -0400 Message-ID: <3B855C19.7A4233BF@SteelEye.com> Date: Thu, 23 Aug 2001 15:40:09 -0400 From: Paul Clements X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.13 i686) X-Accept-Language: en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: gcc bug causing problem in kernel builds Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org We have been experiencing some problems trying to use kernel modules with kernels that are compiled with different versions of gcc. On our kernel build machine (where we compile our kernel modules) we have gcc 2.91.66 (I believe the preferred kernel compiler, according to Documentation/Changes); RedHat 7.1 ships with gcc 2.96. Now, the problem is that RedHat also apparently compiles (at least its newer) kernels with the 2.96 gcc. Unfortunately, there appears to be a structure misalignment problem in gcc 2.96. One particular instance of this problem that we are running into is in the raid1.o module in the 2.4.3 kernel. The structure alignment problem is causing our gcc 2.91.66-compiled raid1 module to malfunction. (raid1.o compiled from the same source on gcc 2.96 works fine.) We've traced the problem down to the following assembly code generated by the 2.96 and 2.91.66 gcc's respectively: (assembly code for parameter setup and call to __alloc_pages (within raid1_grow_buffers)) 2.96: movl $contig_page_data_Rsmp_cef82582+3800, %eax call __alloc_pages_Rsmp_decacc2f 2.91.66: movl $contig_page_data_Rsmp_cef82582+3884,%eax call __alloc_pages_Rsmp_decacc2f gcc 2.91.66 is padding out the zone_t structure by 28 bytes. With an array of 3 of those before our field in question that equals 84 bytes offset in the above assembler code. The 28 byte padding is because gcc 2.91.66 is trying to 32 byte align this structure. The reason for this is that the first submember of zone_t is explicitly defined as 32 byte aligned (per_cpu_t). So, gcc 2.91.66 is (properly) aligning the per_cpu_t structure on a 32 byte boundary as specified by the __attribute__((aligned(32))) directive in that structure's definition: (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[0] $22 = (per_cpu_t *) 0x4e0 (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[1] $23 = (per_cpu_t *) 0x500 (gdb) p 0x500 % 32 $24 = 0 (gdb) p 0x4e0 % 32 $25 = 0 gcc 2.96 is not properly aligning this structure: (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[0] $32 = (per_cpu_t *) 0x4c4 (gdb) p &((pg_data_t *)0)->node_zones[1].cpu_pages[1] $33 = (per_cpu_t *) 0x4e4 (gdb) p 0x4c4 % 32 $34 = 4 (gdb) p 0x4e4 % 32 $35 = 4 So, in order for our raid1 modules to work properly with a kernel compiled by gcc 2.96, we must also use (the broken) 2.96 to compile our module. So I guess some of the questions that arise from all this are: Does anyone know if the structure misalignment problem in gcc 2.96 is a known issue? (could this bug be induced by a RedHat-applied gcc patch, if there are any) How widespread is this problem? i.e., do other distributions have similar issues?, do other versions of gcc have similar issues?, are there other places in the kernel where this type of problem might occur? I'll also post to gcc bug list. ---- Paul Paul.Clements@SteelEye.com