All of lore.kernel.org
 help / color / mirror / Atom feed
* verbose argument to sd_synchronize_cache
From: Christoph Hellwig @ 2002-10-27 16:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

sd_synchronize_cache() is always called with verbose=1, so this argument
should go away.  But do we really want to be that verbose here?  IMHO
we should remove the argumet and the printks..

^ permalink raw reply

* Re: kernel BUG at drivers/serial/core.c:1067 with 2.5.44
From: Russell King @ 2002-10-27 16:33 UTC (permalink / raw)
  To: Alex Romosan; +Cc: linux-kernel
In-Reply-To: <87elabdf1q.fsf@sycorax.lbl.gov>

On Sun, Oct 27, 2002 at 08:25:53AM -0800, Alex Romosan wrote:
> Oct 27 07:39:54 trinculo kernel: kernel BUG at drivers/serial/core.c:1067!

Someone called uart_set_termios without the BKL held, violating the locking
requirements.

Unfortunately:

1. You appear to be running a klogd that'll translate the addresses.
2. your ksymoops doesn't seem to know what modules are loaded.

This means we've lost the information telling us who called
uart_set_termios illegally.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply

* Re: [PATCH] fix sector_div use in scsicam.c
From: Christoph Hellwig @ 2002-10-27 16:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi
In-Reply-To: <200210271621.g9RGLik11263@localhost.localdomain>

On Sun, Oct 27, 2002 at 10:21:44AM -0600, James Bottomley wrote:
> hch@lst.de said:
> > sector_div has the same slightly strange calling convention do_div
> > has: it's return value is the modulo of the two operators, the
> > division result is in the first parameter.  Also optimize one of the
> > expensive 64bit division away (okay, okay - it's not exactly an
> > fast-path :)) 
> 
> Oops, I thought the semantics were the other way around...
> 
> > -		ip[2] = sector_div(capacity, ip[0] * ip[1]);
> > +		ip[2] = capacity; 
> 
> This doesn't look right.  We updated the divisors (ip[0] and ip[1]) in the if 
> statement, so surely we have to recalculate the division?

Umm, right.


--- 1.11/drivers/scsi/scsicam.c	Fri Oct 25 13:31:53 2002
+++ edited/drivers/scsi/scsicam.c	Sun Oct 27 15:18:13 2002
@@ -80,11 +80,13 @@
 	if (ret || ip[0] > 255 || ip[1] > 63) {
 		ip[0] = 64;
 		ip[1] = 32;
-		if (sector_div(capacity, ip[0] * ip[1]) > 65534) {
+		sector_div(capacity, ip[0] * ip[1]);
+		if (capacity > 65534) {
 			ip[0] = 255;
 			ip[1] = 63;
 		}
-		ip[2] = sector_div(capacity, ip[0] * ip[1]);
+		sector_div(capacity, ip[0] * ip[1]);
+		ip[2] = capacity;
 	}
 
 	return 0;

^ permalink raw reply

* Re: [PATCH] fix sector_div use in scsicam.c
From: James Bottomley @ 2002-10-27 16:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi
In-Reply-To: <20021027170258.A15779@lst.de>

hch@lst.de said:
> sector_div has the same slightly strange calling convention do_div
> has: it's return value is the modulo of the two operators, the
> division result is in the first parameter.  Also optimize one of the
> expensive 64bit division away (okay, okay - it's not exactly an
> fast-path :)) 

Oops, I thought the semantics were the other way around...

> -		ip[2] = sector_div(capacity, ip[0] * ip[1]);
> +		ip[2] = capacity; 

This doesn't look right.  We updated the divisors (ip[0] and ip[1]) in the if 
statement, so surely we have to recalculate the division?

James



^ permalink raw reply

* kernel BUG at drivers/serial/core.c:1067 with 2.5.44
From: Alex Romosan @ 2002-10-27 16:25 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3459 bytes --]

i get the following on a pentium iii 650 MHz sony vaio at boot time:

ksymoops 2.4.6 on i686 2.5.44.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.5.44/ (default)
     -m /boot/System.map-2.5.44 (default)
     -x

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 27 07:39:54 trinculo kernel: kernel BUG at drivers/serial/core.c:1067!
Oct 27 07:39:54 trinculo kernel: invalid operand: 0000
Oct 27 07:39:54 trinculo kernel: 3c574_cs irtty irda autofs4 microcode ppp_async uhci-hcd ohci-hcd usbcore nls_cp437 vfat snd-pcm-oss snd-mixer-oss snd-ymfpci snd-pcm snd-mpu401-uart snd-rawmidi snd-ac97-codec snd-opl3-lib snd-timer snd-hwdep snd-seq-device snd soundcore
Oct 27 07:39:54 trinculo kernel: CPU:    0
Oct 27 07:39:54 trinculo kernel: EIP:    0060:[uart_set_termios+41/356]    Not tainted
Oct 27 07:39:54 trinculo kernel: EFLAGS: 00010286
Oct 27 07:39:54 trinculo kernel: eax: cfd21900   ebx: cab74000   ecx: 000008bd   edx: c13b1ecc
Oct 27 07:39:54 trinculo kernel: esi: c13b1ef0   edi: 000008bd   ebp: cb71cd3c   esp: cab75e58
Oct 27 07:39:54 trinculo kernel: ds: 0068   es: 0068   ss: 0068
Oct 27 07:39:54 trinculo kernel: Stack: cb7cde00 c13b1ef0 cab75ea8 00000000 00000001 d11b05f8 ca932000 cab75e84
Oct 27 07:39:54 trinculo kernel:        ca932000 cb7cde00 ca814000 00000005 00000000 000008bd 00000000 7f1c030b
Oct 27 07:39:54 trinculo kernel:        01000415 1a131100 170f1200 2f000016 d11b1321 cb7cde00 00000000 ca814000
Oct 27 07:39:54 trinculo kernel: Call Trace: [<d11b05f8>]  [<d11b1321>]  [vsnprintf+987/1052]  [dev_open+76/164]  [dev_change_flags+81/260]  [dev_ifsioc+117/868]  [dev_ioctl+783/1064]  [<d11a1f49>]  [sock_ioctl+203/240]  [sys_ioctl+637/724]  [syscall_call+7/11]
Oct 27 07:39:54 trinculo kernel: Code: 0f 0b 2b 04 9e 56 27 c0 8d b4 26 00 00 00 00 8b 4c 24 1c 3b
Using defaults from ksymoops -t elf32-i386 -a i386


>>eax; cfd21900 <_end+261855052/282709676>
>>ebx; cab74000 <_end+176209484/282709676>
>>edx; c13b1ecc <_end+17079576/282709676>
>>esi; c13b1ef0 <_end+17079612/282709676>
>>ebp; cb71cd3c <_end+188435336/282709676>
>>esp; cab75e58 <_end+176217252/282709676>

Trace; d11b05f8 <.data.end+31521/????>
Trace; d11b1321 <END_OF_CODE+34890/????>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   2b 04 9e                  sub    (%esi,%ebx,4),%eax
Code;  00000005 Before first symbol
   5:   56                        push   %esi
Code;  00000006 Before first symbol
   6:   27                        daa    
Code;  00000007 Before first symbol
   7:   c0 8d b4 26 00 00 00      rorb   $0x0,0x26b4(%ebp)
Code;  0000000e Before first symbol
   e:   00 8b 4c 24 1c 3b         add    %cl,0x3b1c244c(%ebx)


1 warning issued.  Results may not be reliable.

the funny thing is i've been running this kernel since it came out and
the oops didn't happen until today. i disabled irda in the bios and
now i can boot again. i am attaching my .config file.

--alex--


[-- Attachment #2: config-2.5.44 --]
[-- Type: text/plain, Size: 5918 bytes --]

CONFIG_X86=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_EXPERIMENTAL=y
CONFIG_NET=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y
CONFIG_MPENTIUMIII=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HUGETLB_PAGE=y
CONFIG_PREEMPT=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_MICROCODE=m
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_PM=y
CONFIG_APM=y
CONFIG_APM_CPU_IDLE=y
CONFIG_APM_ALLOW_INTS=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
CONFIG_HOTPLUG=y
CONFIG_PCMCIA=y
CONFIG_CARDBUS=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
CONFIG_PNP=y
CONFIG_PNP_NAMES=y
CONFIG_ISAPNP=y
CONFIG_PNPBIOS=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_LBD=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_FILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_SYN_COOKIES=y
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IRC=m
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_LIMIT=m
CONFIG_IP_NF_MATCH_MAC=m
CONFIG_IP_NF_MATCH_PKTTYPE=m
CONFIG_IP_NF_MATCH_MARK=m
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_DSCP=m
CONFIG_IP_NF_MATCH_AH_ESP=m
CONFIG_IP_NF_MATCH_LENGTH=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_MATCH_TCPMSS=m
CONFIG_IP_NF_MATCH_HELPER=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_MATCH_CONNTRACK=m
CONFIG_IP_NF_MATCH_UNCLEAN=m
CONFIG_IP_NF_MATCH_OWNER=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_MIRROR=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_NAT_SNMP_BASIC=m
CONFIG_IP_NF_NAT_IRC=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_TOS=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_DSCP=m
CONFIG_IP_NF_TARGET_MARK=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_TARGET_TCPMSS=m
CONFIG_IPV6_SCTP__=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_PPP=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_NET_PCMCIA=y
CONFIG_PCMCIA_3C574=m
CONFIG_IRDA=m
CONFIG_IRLAN=m
CONFIG_IRNET=m
CONFIG_IRCOMM=m
CONFIG_IRDA_ULTRA=y
CONFIG_IRDA_CACHE_LAST_LSAP=y
CONFIG_IRDA_FAST_RR=y
CONFIG_IRDA_DEBUG=y
CONFIG_IRTTY_SIR=m
CONFIG_IRPORT_SIR=m
CONFIG_NSC_FIR=m
CONFIG_WINBOND_FIR=m
CONFIG_TOSHIBA_FIR=m
CONFIG_SMC_IRCC_FIR=m
CONFIG_ALI_FIR=m
CONFIG_VLSI_FIR=m
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=m
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_INPUT_JOYSTICK=y
CONFIG_JOYSTICK_WARRIOR=m
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
CONFIG_INPUT_UINPUT=m
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_INTEL_RNG=m
CONFIG_RTC=y
CONFIG_SONYPI=y
CONFIG_AUTOFS4_FS=m
CONFIG_REISERFS_FS=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_CRAMFS=m
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_ROMFS_FS=m
CONFIG_EXT2_FS=y
CONFIG_UDF_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y
CONFIG_SUNRPC=m
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_SMB_FS=m
CONFIG_SMB_NLS_DEFAULT=y
CONFIG_SMB_NLS_REMOTE="cp437"
CONFIG_ZISOFS_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_SMB_NLS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="cp437"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_UTF8=m
CONFIG_VGA_CONSOLE=y
CONFIG_VIDEO_SELECT=y
CONFIG_FB=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_FB_VESA=y
CONFIG_VIDEO_SELECT=y
CONFIG_FB_NEOMAGIC=y
CONFIG_FBCON_CFB24=y
CONFIG_FBCON_ACCEL=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_RTCTIMER=m
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
CONFIG_SND_MTPAV=m
CONFIG_SND_SERIAL_U16550=m
CONFIG_SND_MPU401=m
CONFIG_SND_YMFPCI=m
CONFIG_USB=m
CONFIG_USB_DEVICEFS=y
CONFIG_USB_OHCI_HCD=m
CONFIG_USB_UHCI_HCD_ALT=m
CONFIG_USB_BLUETOOTH_TTY=m
CONFIG_USB_PRINTER=m
CONFIG_USB_HID=m
CONFIG_USB_HIDINPUT=y
CONFIG_HID_FF=y
CONFIG_HID_PID=y
CONFIG_LOGITECH_FF=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_KBD=m
CONFIG_USB_MOUSE=m
CONFIG_USB_AIPTEK=m
CONFIG_USB_MDC800=m
CONFIG_USB_SCANNER=m
CONFIG_USB_RIO500=m
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_X86_EXTRA_IRQS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_SECURITY_CAPABILITIES=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_X86_BIOS_REBOOT=y

[-- Attachment #3: Type: text/plain, Size: 277 bytes --]


-- 
| I believe the moment is at hand when, by a paranoiac and active |
|  advance of the mind, it will be possible (simultaneously with  |
|  automatism and other passive states) to systematize confusion  |
|  and thus to help to discredit completely the world of reality. |

^ permalink raw reply

* RE: [Linux-ia64] Is there any good linux/ia64 kernel debugger?
From: Van Maren, Kevin @ 2002-10-27 16:14 UTC (permalink / raw)
  To: linux-ia64
In-Reply-To: <marc-linux-ia64-105590709805259@msgid-missing>

Have you tried kgdb?  http://oss.sgi.com/projects/kgdb
I've used kdb, which works well with a serial console
(but not with a USB keyboard).

Kevin


-----Original Message-----
From: Jassie Tsai
To: linux-ia64@linuxia64.org
Sent: 10/27/02 10:21 AM
Subject: [Linux-ia64] Is there any good linux/ia64 kernel debugger?

I am tracing and modifying linux kernel-2.4.14-ia64.
Is there any good kernel source debugger to use?
Thanks very much!!!

Jassie Tsai


_______________________________________________
Linux-IA64 mailing list
Linux-IA64@linuxia64.org
http://lists.linuxia64.org/lists/listinfo/linux-ia64


^ permalink raw reply

* [PATCH] remove sd_disks global array from sd.c
From: Christoph Hellwig @ 2002-10-27 16:09 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

Add a pointer to struct scsi_disk instead.  This also obsoletes
sd_dskname().


--- 1.77/drivers/scsi/sd.c	Fri Oct 25 21:20:04 2002
+++ edited/drivers/scsi/sd.c	Sun Oct 27 15:54:59 2002
@@ -85,6 +85,7 @@
 
 struct scsi_disk {
 	struct scsi_device *device;
+	struct gendisk	*disk;
 	sector_t	capacity;	/* size in 512-byte sectors */
 	u8		media_present;
 	u8		write_prot;
@@ -258,25 +259,6 @@
 	}
 }
 
-static void sd_dskname(unsigned int dsk_nr, char *buffer)
-{
-	if (dsk_nr < 26)
-		sprintf(buffer, "sd%c", 'a' + dsk_nr);
-	else {
-		unsigned int min1;
-		unsigned int min2;
-		/*
-		 * For larger numbers of disks, we need to go to a new
-		 * naming scheme.
-		 */
-		min1 = dsk_nr / 26;
-		min2 = dsk_nr % 26;
-		sprintf(buffer, "sd%c%c", 'a' + min1 - 1, 'a' + min2);
-	}
-}
-
-static struct gendisk **sd_disks;
-
 /**
  *	sd_init_command - build a scsi (read or write) command from
  *	information in the request structure.
@@ -1271,11 +1253,8 @@
 			sd_dsk_arr[k] = sdkp;
 		}
 	}
-	init_mem_lth(sd_disks, sd_template.dev_max);
-	if (sd_disks)
-		zero_mem_lth(sd_disks, sd_template.dev_max);
 
-	if (!sd_dsk_arr || !sd_disks)
+	if (!sd_dsk_arr)
 		goto cleanup_mem;
 
 	return 0;
@@ -1284,8 +1263,6 @@
 #undef zero_mem_lth
 
 cleanup_mem:
-	vfree(sd_disks);
-	sd_disks = NULL;
 	if (sd_dsk_arr) {
                 for (k = 0; k < sd_template.dev_max; ++k)
 			vfree(sd_dsk_arr[k]);
@@ -1394,8 +1371,7 @@
 
 	set_capacity(gd, sdkp->capacity);
 	add_disk(gd);
-
-	sd_disks[dsk_nr] = gd;
+	sdkp->disk = gd;
 
 	printk(KERN_NOTICE "Attached scsi %sdisk %s at scsi%d, channel %d, "
 	       "id %d, lun %d\n", sdp->removable ? "removable " : "",
@@ -1461,12 +1437,11 @@
 	sdkp->capacity = 0;
 	/* sdkp->detaching = 1; */
 
-	del_gendisk(sd_disks[dsk_nr]);
+	del_gendisk(sdkp->disk);
 	sdp->attached--;
 	sd_template.dev_noticed--;
 	sd_template.nr_dev--;
-	put_disk(sd_disks[dsk_nr]);
-	sd_disks[dsk_nr] = NULL;
+	put_disk(sdkp->disk);
 }
 
 /**
@@ -1556,13 +1531,8 @@
 	if (!SDpnt->online)
 		return 0;
 
-	if(verbose) {
-		char buf[16];
-
-		sd_dskname(index, buf);
-
-		printk("%s ", buf);
-	}
+	if (verbose)
+		printk("%s ", sdkp->disk->disk_name);
 
 	SRpnt = scsi_allocate_request(SDpnt);
 	if(!SRpnt) {

^ permalink raw reply

* Re: Why not CAM?
From: Matthew Jacob @ 2002-10-27 16:05 UTC (permalink / raw)
  To: Christer Weinigel; +Cc: linux-scsi
In-Reply-To: <87smyrgaxt.fsf@zoo.weinigel.se>


> Hi,
> 
> I've wanted to ask this for a long time: Why not switch to SCSI CAM
> when you are redoing the Linux SCSI layer anyways? 
> 
> This is not intended to start a flamewar of any kind, I don't know
> very much about CAM except that it is some kind of standard and
> FreeBSD uses it and they seem to be mostly happy with it.  So I'm just
> curious if any of the SCSI hackers think that CAM would be worthwhile
> and if not, why?  Too heavyweight?  Too ugly?  Won't fit well with the
> new BIO design in the 2.5 kernel?

CAM (Common Access Method) is an ANSI standard. I fought against it
bitterly back in the last '80s, but in retrospect, it's a fine if
slightly somewhat overcomplicated model.

There are two extant widespread implementations: FreeBSD and Tru64
(formerly known as Digital Unix).

It has a lot to recommend it, but its success of failure in Linux would
probably be due to whomever has the time (and/or employer's backing) to
do a fairly complete implementation and present it for review.

Like a lot of standards, the devil is in the details.

-matt



^ permalink raw reply

* [PATCH] fix sector_div use in scsicam.c
From: Christoph Hellwig @ 2002-10-27 16:02 UTC (permalink / raw)
  To: James Bottomley, Patrick Mansfield; +Cc: linux-scsi

sector_div has the same slightly strange calling convention do_div has:
it's return value is the modulo of the two operators, the division
result is in the first parameter.  Also optimize one of the expensive
64bit division away (okay, okay - it's not exactly an fast-path :))


--- 1.11/drivers/scsi/scsicam.c	Fri Oct 25 13:31:53 2002
+++ edited/drivers/scsi/scsicam.c	Sun Oct 27 15:18:13 2002
@@ -80,11 +80,12 @@
 	if (ret || ip[0] > 255 || ip[1] > 63) {
 		ip[0] = 64;
 		ip[1] = 32;
-		if (sector_div(capacity, ip[0] * ip[1]) > 65534) {
+		sector_div(capacity, ip[0] * ip[1]);
+		if (capacity > 65534) {
 			ip[0] = 255;
 			ip[1] = 63;
 		}
-		ip[2] = sector_div(capacity, ip[0] * ip[1]);
+		ip[2] = capacity;
 	}
 
 	return 0;

^ permalink raw reply

* typo in 2.4.19 free_area_init_core()?
From: Chen, Kenneth W @ 2002-10-27 16:01 UTC (permalink / raw)
  To: marcelo; +Cc: Linux Kernel Mailing List

Marcelo,

Is this a typo in function free_area_init_core()?  The information on realsize is more interesting than the size variable.



--- mm/page_alloc.c~	Sun Oct 27 00:46:10 2002
+++ mm/page_alloc.c	Sun Oct 27 00:46:35 2002
@@ -735,7 +735,7 @@
 		if (zholes_size)
 			realsize -= zholes_size[j];
 
-		printk("zone(%lu): %lu pages.\n", j, size);
+		printk("zone(%lu): %lu pages.\n", j, realsize);
 		zone->size = size;
 		zone->name = zone_names[j];
 		zone->lock = SPIN_LOCK_UNLOCKED;

^ permalink raw reply

* Re: [PATCH] unified SysV and Posix mqueues as FS
From: Alexander Viro @ 2002-10-27 15:53 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: linux-kernel, jakub, torvalds
In-Reply-To: <3DBC075B.AF32C23@mac.com>



On Sun, 27 Oct 2002, Peter Waechtler wrote:

> I applied the patch from Jakub against 2.5.44
> There are still open issues but it's important to get this in before
> feature freeze.
> 
> While you can implement Posix mqueues in userland (Irix is doing this
> with fcntl(fd,F_SETLKW,) and shmem) a kernel implementation has some advantages:

*thud*

ioctls on _directories_, of all things?


^ permalink raw reply

* Re: [PATCH] unified SysV and Posix mqueues as FS
From: Jeff Garzik @ 2002-10-27 15:47 UTC (permalink / raw)
  To: Peter Waechtler; +Cc: linux-kernel, jakub, torvalds
In-Reply-To: <3DBC075B.AF32C23@mac.com>

Peter Waechtler wrote:

>I applied the patch from Jakub against 2.5.44
>There are still open issues but it's important to get this in before
>feature freeze.
>
>While you can implement Posix mqueues in userland (Irix is doing this
>with fcntl(fd,F_SETLKW,) and shmem) a kernel implementation has some advantages:
>
>a) no hassle with locks in case an app crashes
>b) guaranteed notification with signals (you can have two apps with
>	different uid that can acces the queue but aren't allowed to
>	send signals)
>c) surprisingly, seems a little faster - did not test with NPT
>
>
>Open issues are:
>
>- notification not tested
>- still linear search in queues
>- I would really enhance the sys_ipc for handling posix mqueue as well
>	(yes, perhaps it's more ugly - but it fits naturally, you can't
>	specify a priority with a read() - ending up with ioctl())
>- funny "locking" in ipc/util.c 
>- check the ipc ids
>
>  
>

I don't comment on the overall concept of the patch itself, it's not my 
area of expertise and it's too early in the morning to think about it ;-)

However, there are three issues to consider in the meantime:
* Documentation/CodingStyle problems.  You need to use standard 
one-tab-for-indentation formatting, just like the code around what you 
are adding/modifying.
* There is weird text translation in the patch (short example follows). 
 It may be better if you use mutt and vi to include your patch directly, 
without word wrapping, if attachments are getting mangled.

-		msq =3D msg_lock(msqid);
-		err =3D -EIDRM;
-		if(msq=3D=3DNULL)
-			goto out_free;
-		ss_del(&s);
-		=

* Linus probably won't see your email, he has threatened to flush his entire inbox when he returns from his trip ;-)

Regards,

	Jeff






^ permalink raw reply

* Re: [LARTC] htb warning (error?)
From: Stef Coene @ 2002-10-27 15:38 UTC (permalink / raw)
  To: lartc
In-Reply-To: <marc-lartc-103572484819181@msgid-missing>

On Sunday 27 October 2002 14:19, Razvan Cosma wrote:
> I'm getting lots of this in the logs:
> kernel: HTB: mindelay=500, report it please !
> so - I'm reporting it :)
> I use htb3.6-020525, imq-2.4.18.diff-10.1, kernel 2.4.18, iptables
> v1.2.6a.
>  What causes this message, and what other info would be useful for
> debugging?
It's a debug message so there is no real error.
But I don't know what it means.  I think it means that htb is dequeing to 
fast.

Stef

-- 

stef.coene@docum.org
 "Using Linux as bandwidth manager"
     http://www.docum.org/
     #lartc @ irc.oftc.net

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply

* [RFC,PATCH] remove event from affs, ext3, hpfs, ntfs, ufs
From: Manfred Spraul @ 2002-10-27 15:37 UTC (permalink / raw)
  To: linux-fsdevel, zippel, sct, akpm, adilger, mikulas, aia21

[-- Attachment #1: Type: text/plain, Size: 478 bytes --]

Andrew merged the vfs change needed to get rid of 'event', the attached 
patch (vs. 2.5.44-mm5) removes 'event' from all filesystems.

The replacement for
    inode->i_version = event++;
is
    'inode->i_version++;
and
    inode->i_version  = 0;
in the alloc_inode callback.

Additionally, the patch fixes a merge error in ext2 - one hunk is now in 
the wrong function.

Maintainers: Can I send the change to Andrew directly or do you want to 
do that yourself?

--
    Manfred

[-- Attachment #2: patch-event-big --]
[-- Type: text/plain, Size: 7807 bytes --]

diff -u -r 2.5/fs/affs/amigaffs.c build-2.5/fs/affs/amigaffs.c
--- 2.5/fs/affs/amigaffs.c	Sun Sep 22 06:25:04 2002
+++ build-2.5/fs/affs/amigaffs.c	Sun Oct 27 16:13:44 2002
@@ -69,7 +69,7 @@
 	affs_brelse(dir_bh);
 
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-	dir->i_version = ++event;
+	dir->i_version++;
 	mark_inode_dirty(dir);
 
 	return 0;
@@ -121,7 +121,7 @@
 	affs_brelse(bh);
 
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-	dir->i_version = ++event;
+	dir->i_version++;
 	mark_inode_dirty(dir);
 
 	return retval;
diff -u -r 2.5/fs/affs/super.c build-2.5/fs/affs/super.c
--- 2.5/fs/affs/super.c	Sat Oct 26 21:04:33 2002
+++ build-2.5/fs/affs/super.c	Sun Oct 27 16:19:55 2002
@@ -92,6 +92,7 @@
 	ei = (struct affs_inode_info *)kmem_cache_alloc(affs_inode_cachep, SLAB_KERNEL);
 	if (!ei)
 		return NULL;
+	ei->vfs_inode.i_version = 0;
 	return &ei->vfs_inode;
 }
 
diff -u -r 2.5/fs/ext2/ialloc.c build-2.5/fs/ext2/ialloc.c
--- 2.5/fs/ext2/ialloc.c	Sun Oct 27 16:05:34 2002
+++ build-2.5/fs/ext2/ialloc.c	Sun Oct 27 16:18:25 2002
@@ -598,7 +598,6 @@
 	struct buffer_head *bitmap_bh = NULL;
 	int i;
 
-	inode->i_version = 0;
 	lock_super (sb);
 	es = EXT2_SB(sb)->s_es;
 	for (i = 0; i < EXT2_SB(sb)->s_groups_count; i++) {
diff -u -r 2.5/fs/ext2/inode.c build-2.5/fs/ext2/inode.c
--- 2.5/fs/ext2/inode.c	Sun Oct 27 16:05:34 2002
+++ build-2.5/fs/ext2/inode.c	Sun Oct 27 16:18:30 2002
@@ -1015,7 +1015,6 @@
 	}
 	inode->i_blksize = PAGE_SIZE;	/* This is the optimal IO size (for stat), not the fs block size */
 	inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
-	inode->i_version = 0;
 	ei->i_flags = le32_to_cpu(raw_inode->i_flags);
 	ei->i_faddr = le32_to_cpu(raw_inode->i_faddr);
 	ei->i_frag_no = raw_inode->i_frag;
diff -u -r 2.5/fs/ext2/super.c build-2.5/fs/ext2/super.c
--- 2.5/fs/ext2/super.c	Sun Oct 27 16:05:34 2002
+++ build-2.5/fs/ext2/super.c	Sun Oct 27 16:18:48 2002
@@ -159,6 +159,7 @@
 	ei->i_acl = EXT2_ACL_NOT_CACHED;
 	ei->i_default_acl = EXT2_ACL_NOT_CACHED;
 #endif
+	ei->vfs_inode.i_version = 0;
 	return &ei->vfs_inode;
 }
 
diff -u -r 2.5/fs/ext3/inode.c build-2.5/fs/ext3/inode.c
--- 2.5/fs/ext3/inode.c	Sun Oct 27 16:05:34 2002
+++ build-2.5/fs/ext3/inode.c	Sun Oct 27 16:19:13 2002
@@ -2248,7 +2248,6 @@
 					 * (for stat), not the fs block
 					 * size */  
 	inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
-	inode->i_version = ++event;
 	ei->i_flags = le32_to_cpu(raw_inode->i_flags);
 #ifdef EXT3_FRAGMENTS
 	ei->i_faddr = le32_to_cpu(raw_inode->i_faddr);
diff -u -r 2.5/fs/ext3/namei.c build-2.5/fs/ext3/namei.c
--- 2.5/fs/ext3/namei.c	Sun Oct 27 16:05:34 2002
+++ build-2.5/fs/ext3/namei.c	Sun Oct 27 15:59:24 2002
@@ -1201,7 +1201,7 @@
 	 */
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
 	ext3_update_dx_flag(dir);
-	dir->i_version = ++event;
+	dir->i_version++;
 	ext3_mark_inode_dirty(handle, dir);
 	BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
 	err = ext3_journal_dirty_metadata(handle, bh);
@@ -1520,7 +1520,7 @@
 						    le16_to_cpu(de->rec_len));
 			else
 				de->inode = 0;
-			dir->i_version = ++event;
+			dir->i_version++;
 			BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
 			ext3_journal_dirty_metadata(handle, bh);
 			return 0;
@@ -1964,7 +1964,7 @@
 		ext3_warning (inode->i_sb, "ext3_rmdir",
 			      "empty directory has nlink!=2 (%d)",
 			      inode->i_nlink);
-	inode->i_version = ++event;
+	inode->i_version++;
 	inode->i_nlink = 0;
 	/* There's no need to set i_disksize: the fact that i_nlink is
 	 * zero will ensure that the right thing happens during any
@@ -2214,7 +2214,7 @@
 		if (EXT3_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
 					      EXT3_FEATURE_INCOMPAT_FILETYPE))
 			new_de->file_type = old_de->file_type;
-		new_dir->i_version = ++event;
+		new_dir->i_version++;
 		BUFFER_TRACE(new_bh, "call ext3_journal_dirty_metadata");
 		ext3_journal_dirty_metadata(handle, new_bh);
 		brelse(new_bh);
diff -u -r 2.5/fs/ext3/super.c build-2.5/fs/ext3/super.c
--- 2.5/fs/ext3/super.c	Sun Oct 27 16:05:34 2002
+++ build-2.5/fs/ext3/super.c	Sun Oct 27 16:19:23 2002
@@ -433,6 +433,7 @@
 	ei->i_acl = EXT3_ACL_NOT_CACHED;
 	ei->i_default_acl = EXT3_ACL_NOT_CACHED;
 #endif
+	ei->vfs_inode.i_version = 0;
 	return &ei->vfs_inode;
 }
 
diff -u -r 2.5/fs/hpfs/dnode.c build-2.5/fs/hpfs/dnode.c
--- 2.5/fs/hpfs/dnode.c	Sat Oct 26 21:03:07 2002
+++ build-2.5/fs/hpfs/dnode.c	Sun Oct 27 16:16:21 2002
@@ -403,7 +403,7 @@
 		c = 1;
 		goto ret;
 	}	
-	i->i_version = ++event;
+	i->i_version++;
 	c = hpfs_add_to_dnode(i, dno, name, namelen, new_de, 0);
 	ret:
 	if (!cdepth) hpfs_unlock_creation(i->i_sb);
@@ -710,7 +710,7 @@
 			return 2;
 		}
 	}
-	i->i_version = ++event;
+	i->i_version++;
 	for_all_poss(i, hpfs_pos_del, (t = get_pos(dnode, de)) + 1, 1);
 	hpfs_delete_de(i->i_sb, dnode, de);
 	hpfs_mark_4buffers_dirty(qbh);
diff -u -r 2.5/fs/hpfs/inode.c build-2.5/fs/hpfs/inode.c
--- 2.5/fs/hpfs/inode.c	Sat Oct 26 21:04:37 2002
+++ build-2.5/fs/hpfs/inode.c	Sun Oct 27 16:18:09 2002
@@ -84,7 +84,6 @@
 	hpfs_inode->i_ea_uid = 0;
 	hpfs_inode->i_ea_gid = 0;
 	hpfs_inode->i_ea_size = 0;
-	i->i_version = ++event;
 
 	hpfs_inode->i_rddir_off = NULL;
 	hpfs_inode->i_dirty = 0;
diff -u -r 2.5/fs/hpfs/super.c build-2.5/fs/hpfs/super.c
--- 2.5/fs/hpfs/super.c	Sat Oct 26 21:04:37 2002
+++ build-2.5/fs/hpfs/super.c	Sun Oct 27 16:17:45 2002
@@ -166,6 +166,7 @@
 	ei = (struct hpfs_inode_info *)kmem_cache_alloc(hpfs_inode_cachep, SLAB_KERNEL);
 	if (!ei)
 		return NULL;
+	ei->vfs_inode.i_version = 0;
 	return &ei->vfs_inode;
 }
 
diff -u -r 2.5/fs/ntfs/inode.c build-2.5/fs/ntfs/inode.c
--- 2.5/fs/ntfs/inode.c	Sun Sep 22 06:25:18 2002
+++ build-2.5/fs/ntfs/inode.c	Sun Oct 27 16:10:07 2002
@@ -495,7 +495,7 @@
 	 * This is for checking whether an inode has changed w.r.t. a file so
 	 * that the file can be updated if necessary (compare with f_version).
 	 */
-	vi->i_version = ++event;
+	vi->i_version = 0;
 
 	vi->i_uid = vol->uid;
 	vi->i_gid = vol->gid;
diff -u -r 2.5/fs/ufs/dir.c build-2.5/fs/ufs/dir.c
--- 2.5/fs/ufs/dir.c	Sat Oct 26 21:02:10 2002
+++ build-2.5/fs/ufs/dir.c	Sun Oct 27 16:21:03 2002
@@ -353,7 +353,7 @@
 void ufs_set_link(struct inode *dir, struct ufs_dir_entry *de,
 		struct buffer_head *bh, struct inode *inode)
 {
-	dir->i_version = ++event;
+	dir->i_version++;
 	de->d_ino = cpu_to_fs32(dir->i_sb, inode->i_ino);
 	mark_buffer_dirty(bh);
 	if (IS_DIRSYNC(dir)) {
@@ -463,7 +463,7 @@
 	}
 	brelse (bh);
 	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-	dir->i_version = ++event;
+	dir->i_version++;
 	mark_inode_dirty(dir);
 
 	UFSD(("EXIT\n"))
@@ -504,7 +504,7 @@
 				fs16_add(sb, &pde->d_reclen,
 					fs16_to_cpu(sb, dir->d_reclen));
 			dir->d_ino = 0;
-			inode->i_version = ++event;
+			inode->i_version++;
 			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 			mark_inode_dirty(inode);
 			mark_buffer_dirty(bh);
diff -u -r 2.5/fs/ufs/inode.c build-2.5/fs/ufs/inode.c
--- 2.5/fs/ufs/inode.c	Sat Oct 26 21:04:41 2002
+++ build-2.5/fs/ufs/inode.c	Sun Oct 27 16:21:17 2002
@@ -519,7 +519,7 @@
 	inode->i_mtime = fs32_to_cpu(sb, ufs_inode->ui_mtime.tv_sec);
 	inode->i_blocks = fs32_to_cpu(sb, ufs_inode->ui_blocks);
 	inode->i_blksize = PAGE_SIZE;   /* This is the optimal IO size (for stat) */
-	inode->i_version = ++event;
+	inode->i_version++;
 	ufsi->i_flags = fs32_to_cpu(sb, ufs_inode->ui_flags);
 	ufsi->i_gen = fs32_to_cpu(sb, ufs_inode->ui_gen);
 	ufsi->i_shadow = fs32_to_cpu(sb, ufs_inode->ui_u3.ui_sun.ui_shadow);
diff -u -r 2.5/fs/ufs/super.c build-2.5/fs/ufs/super.c
--- 2.5/fs/ufs/super.c	Sat Oct 26 21:04:41 2002
+++ build-2.5/fs/ufs/super.c	Sun Oct 27 16:21:31 2002
@@ -1006,6 +1006,7 @@
 	ei = (struct ufs_inode_info *)kmem_cache_alloc(ufs_inode_cachep, SLAB_KERNEL);
 	if (!ei)
 		return NULL;
+	ei->vfs_inode.i_version = 0;
 	return &ei->vfs_inode;
 }
 

^ permalink raw reply

* A modern RAID solution?
From: Alexy Khrabrov @ 2002-10-27 15:30 UTC (permalink / raw)
  To: linux-scsi


Now that I got all those drives spinning, I'm eager to try out 
a RAID array.  I reckon it's a better way to hedge against disk
failures than backups on tape -- even 20/40 GB dds4 or DLT is
not enough those days, and loading/rotating is tedious.

I heard an opinion that software RAID on Linux with SCSI is
"almost" as good as a hardware controller.  What is the experience
here?  Also, since I'm running LVM throughout, including the root
partition, does RAID coexist well with LVM?

But if I go the way of the hardware controller, is it better
to get a separate one, or one of those new cards from Adaptec
which say they have a Host RAID 0/1 or some such?

I'm looking at a 3 drives RAID to begin with, perhaps 4.

-- 
Cheers,
Alexy Khrabrov :: www.setup.org :: Age Quod Agis

^ permalink raw reply

* Re: The return of the return of crunch time (2.5 merge candidate list 1.6)
From: Andrew Pimlott @ 2002-10-27 15:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel
In-Reply-To: <20021027080125.A14145@wotan.suse.de>

On Sun, Oct 27, 2002 at 08:01:25AM +0100, Andi Kleen wrote:
> On Sat, Oct 26, 2002 at 03:09:06PM -0400, Andrew Pimlott wrote:
> > Would you mind spelling out the problem case?  It's ususally not a
> > big deal, because when a target and dependency have the same
> > timestamp, make considers the target to be newer.
> 
> I assume you mean 'older', not 'newer'?

No (but maybe I phrased it badly):

    % cat Makefile 
    foo: bar
            echo did it
    % touch foo bar
    % ls --full-time foo bar
    -rw-r--r--    1 pimlott  pimlott         0 Sun Oct 27 09:36:26 2002 bar
    -rw-r--r--    1 pimlott  pimlott         0 Sun Oct 27 09:36:26 2002 foo
    % make
    make: `foo' is up to date.

Ie, foo is considered newer.

> Any default action is wrong in some case when an rule can take less
> than a second,

I'm sure there is a case where this is true, but my imagination and
googling failed to provide one.  Even the messages to the GNU make
mailing list when Paul Eggert implemented nanosecond support didn't
include a specific rationale.

> there is no replacement for an accurate time stamp.

While I agree, I thought that a concrete example might help persuade
others.  (I think I've even run into instances where second
resolution was a real problem, I just can't recall them.)

> > I really feel strongly that you should not export resolution finer
> > that what the filesystem can store.  There is too much risk of
> > breakage (especially given the late date of submission), and if (as
> > you said) all common filesystems will be able to store sub-second
> > timestamps soon, this shouldn't be a significant drawback.  If this
> > requires a new hook into the filesystem, so be it.
> 
> You have to export in some unit and it is convenient to use the most
> finegrained available (ns). This matches what other Unixes like
> Solaris do too. The program can always chose to ignore the ns 
> (which will most do at least initially) part or even round more.
> 
> What happens currently in my patch is that the inode in memory stores jiffies
> resolution. As long as you don't run out of inode cache and need to
> flush/reload an inode you always have the best resolution.
> 
> When an inode is flushed on an old fs with only second resolution the 
> subsecond part is truncated. This has the drawback that an inode
> timestamp can jump backwards on reload as seen by user space.

Example problem case (assuming a fs that stores only seconds, and a
make that uses nanoseconds):

- I run the "save and build" command while editing foo.c at T = 0.1.
- foo.o is built at T = 0.2.
- I do some read-only operations on foo.c (eg, checkin), such that
  foo.o gets flushed but foo.c stays in memory.
- I build again.  foo.o is reloaded and has timestamp T = 0, and so
  gets spuriously rebuilt.

> Another way would be to round on flush, but that also has some problems :-
> for example you can get timestamps which are ahead of the current
> wall clock.

Only if the flush is less than a second after the write, right?
How likely is that in Linux?

I tend to prefer the proposal to set the nanosecond field to 10^9-1.
At least my scenario above doesn't happen.

Andrew

^ permalink raw reply

* Why not CAM?
From: Christer Weinigel @ 2002-10-27 15:26 UTC (permalink / raw)
  To: linux-scsi

Hi,

I've wanted to ask this for a long time: Why not switch to SCSI CAM
when you are redoing the Linux SCSI layer anyways? 

This is not intended to start a flamewar of any kind, I don't know
very much about CAM except that it is some kind of standard and
FreeBSD uses it and they seem to be mostly happy with it.  So I'm just
curious if any of the SCSI hackers think that CAM would be worthwhile
and if not, why?  Too heavyweight?  Too ugly?  Won't fit well with the
new BIO design in the 2.5 kernel?

  /Christer

-- 
"Just how much can I get away with and still go to heaven?"

Freelance consultant specializing in device driver programming for Linux 
Christer Weinigel <christer@weinigel.se>  http://www.weinigel.se

^ permalink raw reply

* [PATCH] unified SysV and Posix mqueues as FS
From: Peter Waechtler @ 2002-10-27 15:33 UTC (permalink / raw)
  To: linux-kernel, jakub, torvalds

[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

I applied the patch from Jakub against 2.5.44
There are still open issues but it's important to get this in before
feature freeze.

While you can implement Posix mqueues in userland (Irix is doing this
with fcntl(fd,F_SETLKW,) and shmem) a kernel implementation has some advantages:

a) no hassle with locks in case an app crashes
b) guaranteed notification with signals (you can have two apps with
	different uid that can acces the queue but aren't allowed to
	send signals)
c) surprisingly, seems a little faster - did not test with NPT


Open issues are:

- notification not tested
- still linear search in queues
- I would really enhance the sys_ipc for handling posix mqueue as well
	(yes, perhaps it's more ugly - but it fits naturally, you can't
	specify a priority with a read() - ending up with ioctl())
- funny "locking" in ipc/util.c 
- check the ipc ids

[-- Attachment #2: posix-mqueue.txt --]
[-- Type: text/plain, Size: 43929 bytes --]

diff -Nur -X dontdiff vanilla-2.5.44/Documentation/ioctl-number.txt linux-2.5.44/Documentation/ioctl-number.txt
--- vanilla-2.5.44/Documentation/ioctl-number.txt	2002-04-20 18:22:08.000000000 +0200
+++ linux-2.5.44/Documentation/ioctl-number.txt	2002-10-27 15:33:23.000000000 +0100
@@ -186,6 +186,7 @@
 0xB0	all	RATIO devices		in development:
 					<mailto:vgo@ratio.de>
 0xB1	00-1F	PPPoX			<mailto:mostrows@styx.uwaterloo.ca>
+0xB2	00-1F	linux/mqueue.h
 0xCB	00-1F	CBM serial IEC bus	in development:
 					<mailto:michael.klein@puffin.lb.shuttle.de>
 
diff -Nur -X dontdiff vanilla-2.5.44/include/linux/mqueue.h linux-2.5.44/include/linux/mqueue.h
--- vanilla-2.5.44/include/linux/mqueue.h	1970-01-01 01:00:00.000000000 +0100
+++ linux-2.5.44/include/linux/mqueue.h	2002-10-23 14:48:31.000000000 +0200
@@ -0,0 +1,37 @@
+#ifndef _LINUX_MQUEUE_H
+#define _LINUX_MQUEUE_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+#include <asm/siginfo.h>
+
+struct mq_attr {
+	long	mq_flags;       /* O_NONBLOCK or 0 */
+	long	mq_maxmsg;      /* Maximum number of messages in the queue */
+	long	mq_msgsize;     /* Maximum size of one message in bytes */
+	long	mq_curmsgs;     /* Current number of messages in the queue */
+	long	__pad[2];
+};
+
+struct mq_open {
+	char            *mq_name;       /* pathname */
+	int             mq_oflag;       /* flags */
+	mode_t          mq_mode;        /* mode */
+	struct mq_attr  mq_attr;        /* attributes */
+};
+
+struct mq_sndrcv {
+	size_t          mq_len;         /* message length */
+	long            mq_type;        /* message type */
+	char            *mq_buf;        /* message buffer */
+};
+
+#define MQ_OPEN                _IOW(0xB2, 0, struct mq_open)
+#define MQ_GETATTR     _IOR(0xB2, 1, struct mq_attr)
+#define MQ_SEND                _IOW(0xB2, 2, struct mq_sndrcv)
+#define MQ_RECEIVE     _IOWR(0xB2, 3, struct mq_sndrcv)
+#define MQ_NOTIFY      _IOW(0xB2, 4, struct sigevent)
+
+#define MQ_DEFAULT_TYPE        0x7FFFFFFE
+
+#endif /* _LINUX_MQUEUE_H */
diff -Nur -X dontdiff vanilla-2.5.44/include/linux/msg.h linux-2.5.44/include/linux/msg.h
--- vanilla-2.5.44/include/linux/msg.h	2002-08-10 00:09:02.000000000 +0200
+++ linux-2.5.44/include/linux/msg.h	2002-10-25 20:06:47.000000000 +0200
@@ -2,6 +2,7 @@
 #define _LINUX_MSG_H
 
 #include <linux/ipc.h>
+#include <linux/signal.h>
 
 /* ipcs ctl commands */
 #define MSG_STAT 11
@@ -49,7 +50,7 @@
 	unsigned short  msgseg; 
 };
 
-#define MSGMNI    16   /* <= IPCMNI */     /* max # of msg queue identifiers */
+#define MSGMNI   128   /* <= IPCMNI */     /* max # of msg queue identifiers */
 #define MSGMAX  8192   /* <= INT_MAX */   /* max size of message (bytes) */
 #define MSGMNB 16384   /* <= INT_MAX */   /* default max size of a message queue */
 
@@ -63,33 +64,88 @@
 
 #ifdef __KERNEL__
 
+#define SEARCH_ANY		1
+#define SEARCH_EQUAL		2
+#define SEARCH_NOTEQUAL		3
+#define SEARCH_LESSEQUAL	4
+
+#define DATALEN_MSG	(PAGE_SIZE-sizeof(struct msg_msg))
+#define DATALEN_SEG	(PAGE_SIZE-sizeof(struct msg_msgseg))
+
+/* used  by sys_msgctl(,IPC_SET,) */
+struct msq_setbuf {
+	unsigned long	qbytes;
+	uid_t		uid;
+	gid_t		gid;
+	mode_t		mode;
+};
+
+/* one msg_receiver structure for each sleeping receiver */
+struct msg_receiver {
+	struct list_head r_list;
+	struct task_struct *r_tsk;
+
+	int r_mode;
+	long r_msgtype;
+	long r_maxsize;
+
+	struct msg_msg* volatile r_msg;
+};
+
+/* one msg_sender for each sleeping sender */
+struct msg_sender {
+	struct list_head list;
+	struct task_struct *tsk;
+};
+
+struct msg_msgseg {
+	struct msg_msgseg *next;
+	/* the next part of the message follows immediately */
+};
+
 /* one msg_msg structure for each message */
 struct msg_msg {
 	struct list_head m_list; 
 	long  m_type;          
 	int m_ts;           /* message text size */
-	struct msg_msgseg* next;
+	struct msg_msgseg *next;
 	/* the actual message follows immediately */
 };
 
-#define DATALEN_MSG	(PAGE_SIZE-sizeof(struct msg_msg))
-#define DATALEN_SEG	(PAGE_SIZE-sizeof(struct msg_msgseg))
+struct mq_link {
+	struct list_head link;
+	struct task_struct *tsk;
+	struct mq_attr *attr;
+};
 
 /* one msq_queue structure for each present queue on the system */
 struct msg_queue {
 	struct kern_ipc_perm q_perm;
-	time_t q_stime;			/* last msgsnd time */
-	time_t q_rtime;			/* last msgrcv time */
-	time_t q_ctime;			/* last change time */
-	unsigned long q_cbytes;		/* current number of bytes on queue */
-	unsigned long q_qnum;		/* number of messages in queue */
-	unsigned long q_qbytes;		/* max number of bytes on queue */
-	pid_t q_lspid;			/* pid of last msgsnd */
-	pid_t q_lrpid;			/* last receive pid */
+#define q_flags q_perm.mode
+	time_t q_stime;         /* last msgsnd time */
+	time_t q_rtime;         /* last msgrcv time */
+	time_t q_ctime;         /* last change time */
+	unsigned long q_cbytes;     /* current number of bytes on queue */
+	unsigned long q_qnum;       /* number of messages in queue */
+	unsigned long q_qbytes;     /* max number of bytes on queue */
+
+	unsigned int q_msgsize;     /* max number of bytes for one message */
+	unsigned int q_maxmsg;      /* max number of outstanding messages */
+
+	pid_t q_lspid;          /* pid of last msgsnd */
+	pid_t q_lrpid;          /* last receive pid */
+
+	int q_signo;            /* signal to be sent if empty queue with no waiting
+			                receivers should be sent */
+	pid_t q_pid;            /* to which pid */
+	sigval_t q_sigval;      /* which value to pass */
+	int id;
 
 	struct list_head q_messages;
 	struct list_head q_receivers;
 	struct list_head q_senders;
+	unsigned int q_namelen;
+	unsigned char q_name[0];
 };
 
 asmlinkage long sys_msgget (key_t key, int msgflg);
diff -Nur -X dontdiff vanilla-2.5.44/ipc/msg.c linux-2.5.44/ipc/msg.c
--- vanilla-2.5.44/ipc/msg.c	2002-10-13 23:03:57.000000000 +0200
+++ linux-2.5.44/ipc/msg.c	2002-10-27 15:42:12.000000000 +0100
@@ -13,15 +13,23 @@
  * mostly rewritten, threaded and wake-one semantics added
  * MSGMAX limit removed, sysctl's added
  * (c) 1999 Manfred Spraul <manfreds@colorfullife.com>
+ *
+ * make it a filesystem (based on Christoph Rohland's work on shmfs),
+ * (c) 2000 Jakub Jelinek <jakub@redhat.com>
+ * adapted and cleaned up for 2.5.44 by Peter Wächtler <pwaechtler@mac.com>
  */
 
 #include <linux/config.h>
 #include <linux/slab.h>
-#include <linux/msg.h>
 #include <linux/spinlock.h>
 #include <linux/init.h>
+#include <linux/fs.h>
 #include <linux/proc_fs.h>
 #include <linux/list.h>
+#include <linux/signal.h>
+#include <linux/mqueue.h>
+#include <linux/msg.h>
+#include <linux/namei.h>
 #include <linux/security.h>
 #include <asm/uaccess.h>
 #include "util.h"
@@ -30,34 +38,87 @@
 int msg_ctlmax = MSGMAX;
 int msg_ctlmnb = MSGMNB;
 int msg_ctlmni = MSGMNI;
+static int msg_mode;
+
+#define MSG_FS_MAGIC	822419456
+
+#define MSG_NAME_LEN NAME_MAX
+#define MSG_FMT ".IPC_%08x"
+#define MSG_FMT_LEN 13
+
+#define MSG_UNLK	0010000 /* filename is unlinked */
+#define MSG_SYSV	0020000 /* It is a SYSV message queue */
+
+static struct super_block * msg_sb;
+
+static struct super_block *msg_read_super(struct file_system_type *,int , char *, void *);
+static void msg_put_super(struct super_block *);
+static int msg_remount_fs(struct super_block *, int *, char *);
+static void msg_fill_inode(struct inode *);
+static int msg_statfs(struct super_block *, struct statfs *);
+static int msg_create(struct inode *,struct dentry *,int);
+static struct dentry *msg_lookup(struct inode *,struct dentry *);
+static int msg_unlink(struct inode *,struct dentry *);
+static int msg_setattr(struct dentry *dent, struct iattr *attr);
+static void msg_delete(struct inode *);
+static int msg_readdir(struct file *, void *, filldir_t);
+static int msg_remove_name(int id);
+static int msg_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
+static int msg_root_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
+static ssize_t msg_read(struct file *, char *, size_t, loff_t *);
+static ssize_t msg_write(struct file *, const char *, size_t, loff_t *);
+/* FIXME: Support poll on mq
+static unsigned int msg_poll(struct file *, poll_table *);
+ */
+static ssize_t msg_send (struct inode *, struct file *, const char *, size_t, long);
+static ssize_t msg_receive (struct inode *, struct file *, char *, size_t, long *);
+static int msg_flush (struct file *);
+static int msg_release (struct inode *, struct file *);
 
-/* one msg_receiver structure for each sleeping receiver */
-struct msg_receiver {
-	struct list_head r_list;
-	struct task_struct* r_tsk;
-
-	int r_mode;
-	long r_msgtype;
-	long r_maxsize;
+static void freeque (int id);
+static int newque (key_t key, const char *name, int namelen, struct mq_attr *attr, int msgflg);
 
-	struct msg_msg* volatile r_msg;
+static struct file_system_type msg_fs_type = {
+	.name		= "msgfs",
+	.get_sb		= msg_read_super,
+	.kill_sb	= kill_litter_super,
 };
 
-/* one msg_sender for each sleeping sender */
-struct msg_sender {
-	struct list_head list;
-	struct task_struct* tsk;
+static struct super_operations msg_sops = {
+	.read_inode=	msg_fill_inode,
+	.delete_inode=	msg_delete,
+	.put_super=	msg_put_super,
+	.statfs=		msg_statfs,
+	.remount_fs=	msg_remount_fs,
 };
 
-struct msg_msgseg {
-	struct msg_msgseg* next;
-	/* the next part of the message follows immediately */
+static struct file_operations msg_root_operations = {
+	.readdir=	msg_readdir,
+	.ioctl=		msg_root_ioctl,
 };
 
-#define SEARCH_ANY		1
-#define SEARCH_EQUAL		2
-#define SEARCH_NOTEQUAL		3
-#define SEARCH_LESSEQUAL	4
+static struct inode_operations msg_root_inode_operations = {
+	.create=		msg_create,
+	.lookup=		msg_lookup,
+	.unlink=		msg_unlink,
+};
+
+static struct file_operations msg_file_operations = {
+	.read=		msg_read,
+	.write=		msg_write,
+	.ioctl=		msg_ioctl,
+/* FIXME: Support poll on mq *
+	poll=		msg_poll,
+ */
+	.flush=		msg_flush,
+	.release=	msg_release,
+};
+
+static struct inode_operations msg_inode_operations = {
+	.setattr=	msg_setattr,
+};
+
+static LIST_HEAD(mq_open_links);
 
 static atomic_t msg_bytes = ATOMIC_INIT(0);
 static atomic_t msg_hdrs = ATOMIC_INIT(0);
@@ -67,33 +128,529 @@
 #define msg_lock(id)	((struct msg_queue*)ipc_lock(&msg_ids,id))
 #define msg_unlock(id)	ipc_unlock(&msg_ids,id)
 #define msg_rmid(id)	((struct msg_queue*)ipc_rmid(&msg_ids,id))
-#define msg_checkid(msq, msgid)	\
-	ipc_checkid(&msg_ids,&msq->q_perm,msgid)
-#define msg_buildid(id, seq) \
-	ipc_buildid(&msg_ids, id, seq)
+#define msg_get(id)   ((struct msg_queue*)ipc_get(&msg_ids,id))
+#define msg_buildid(id, seq) 	ipc_buildid(&msg_ids, id, seq)
 
-static void freeque (int id);
-static int newque (key_t key, int msgflg);
 #ifdef CONFIG_PROC_FS
 static int sysvipc_msg_read_proc(char *buffer, char **start, off_t offset, int length, int *eof, void *data);
 #endif
 
 void __init msg_init (void)
 {
+	struct vfsmount *res;
 	ipc_init_ids(&msg_ids,msg_ctlmni);
-
+	register_filesystem (&msg_fs_type);
+	res = kern_mount(&msg_fs_type);
+	if (IS_ERR(res)) {
+		unregister_filesystem(&msg_fs_type);
+		return;
+	}
 #ifdef CONFIG_PROC_FS
 	create_proc_read_entry("sysvipc/msg", 0, 0, sysvipc_msg_read_proc, NULL);
 #endif
 }
 
-static int newque (key_t key, int msgflg)
+static int msg_parse_options(char *options)
+{
+  int blocks = msg_ctlmnb * msg_ctlmni;
+  int inodes = msg_ctlmni;
+  umode_t mode = msg_mode;
+  char *this_char, *value;
+
+  this_char = NULL;
+  if ( options )
+      this_char = strsep(&options,",");
+  for ( ; this_char; this_char = strsep(&options,",")) {
+      if ((value = strchr(this_char,'=')) != NULL)
+          *value++ = 0;
+      if (!strcmp(this_char,"nr_blocks")) {
+          if (!value || !*value)
+              return 1;
+          blocks = simple_strtoul(value,&value,0);
+          if (*value)
+              return 1;
+      }
+      else if (!strcmp(this_char,"nr_inodes")) {
+          if (!value || !*value)
+              return 1;
+          inodes = simple_strtoul(value,&value,0);
+          if (*value)
+              return 1;
+      }
+      else if (!strcmp(this_char,"mode")) {
+          if (!value || !*value)
+              return 1;
+          mode = simple_strtoul(value,&value,8);
+          if (*value)
+              return 1;
+      }
+      else
+          return 1;
+  }
+/* FIXME *
+  msg_ctlmni = inodes;
+  msg_ctlmnb = inodes ? blocks / inodes : 0;
+ */
+  msg_mode   = mode;
+
+  return 0;
+}
+
+static int
+msg_fill_super (struct super_block *sb, void *data, int silent)
+{
+  struct inode * root_inode;
+
+/* FIXME *
+  msg_ctlmnb = MSGMNB;
+  msg_ctlmni = MSGMNI;
+ */
+  msg_mode   = S_IRWXUGO | S_ISVTX;
+  if (msg_parse_options (data)) {
+      printk(KERN_ERR "msg fs invalid option\n");
+      return -EINVAL;
+  }
+
+  sb->s_blocksize = PAGE_SIZE;
+  sb->s_blocksize_bits = PAGE_SHIFT;
+  sb->s_magic = MSG_FS_MAGIC;
+  sb->s_op = &msg_sops;
+  root_inode = iget (sb, SEQ_MULTIPLIER);
+  if (!root_inode)
+      return -ENOMEM;
+  root_inode->i_op = &msg_root_inode_operations;
+  root_inode->i_sb = sb;
+  root_inode->i_nlink = 2;
+  root_inode->i_mode = S_IFDIR | msg_mode;
+  sb->s_root = d_alloc_root(root_inode);
+  if (!sb->s_root)
+      goto out_no_root;
+  msg_sb = sb;
+  return 0;
+
+out_no_root:
+  printk(KERN_ERR "msg_fill_super: get root inode failed\n");
+  iput(root_inode);
+  return -ENOMEM;
+}
+
+static struct super_block *msg_read_super(struct file_system_type *fs_type,
+	       int flags, char *dev_name, void *data)
+{
+  return get_sb_single (fs_type, flags, data, msg_fill_super);
+}
+
+static int msg_remount_fs (struct super_block *sb, int *flags, char *data)
+{
+  if (msg_parse_options (data))
+      return -EINVAL;
+  return 0;
+}
+
+static inline int msg_checkid(struct msg_queue *msq, int id)
+{
+  if (!(msq->q_flags & MSG_SYSV))
+      return -EINVAL;
+  if (ipc_checkid(&msg_ids,&msq->q_perm,id))
+      return -EIDRM;
+  return 0;
+}
+
+static void msg_put_super(struct super_block *sb)
+{
+  int i;
+  struct msg_queue *msq;
+
+  down(&msg_ids.sem);
+  for(i = 0; i <= msg_ids.max_id; i++) {
+      if (!(msq = msg_lock (i)))
+          continue;
+      freeque(i);
+  }
+  dput (sb->s_root);
+  up(&msg_ids.sem);
+}
+
+static int msg_statfs(struct super_block *sb, struct statfs *buf)
+{
+  buf->f_type = MSG_FS_MAGIC;
+  buf->f_bsize = PAGE_SIZE;
+  buf->f_blocks = (msg_ctlmnb * msg_ctlmni) >> PAGE_SHIFT;
+  buf->f_bavail = buf->f_bfree = buf->f_blocks - (atomic_read(&msg_bytes) >> PAGE_SHIFT);
+  buf->f_files = msg_ctlmni;
+  buf->f_ffree = msg_ctlmni - atomic_read(&msg_hdrs);
+  buf->f_namelen = MSG_NAME_LEN;
+  return 0;
+}
+
+static void msg_fill_inode(struct inode * inode)
+{
+  int id;
+  struct msg_queue *msq;
+  id = inode->i_ino;
+  inode->i_op = NULL;
+  inode->i_mode = 0;
+
+  if (id < SEQ_MULTIPLIER) {
+      if (!(msq = msg_lock (id)))
+          return;
+      inode->i_mode = (msq->q_flags & S_IRWXUGO) | S_IFIFO;
+      inode->i_uid  = msq->q_perm.uid;
+      inode->i_gid  = msq->q_perm.gid;
+      inode->i_size = msq->q_cbytes;
+      inode->i_mtime = msq->q_stime;
+      inode->i_atime = msq->q_stime > msq->q_rtime ? msq->q_stime : msq->q_rtime;
+      inode->i_ctime = msq->q_ctime;
+      msg_unlock (id);
+      inode->i_op  = &msg_inode_operations;
+      inode->i_fop = &msg_file_operations;
+      return;
+  }
+  inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
+  inode->i_op    = &msg_root_inode_operations;
+  inode->i_fop   = &msg_root_operations;
+  inode->i_sb    = msg_sb;
+  inode->i_nlink = 2;
+  inode->i_mode  = S_IFDIR | msg_mode;
+  inode->i_uid   = inode->i_gid = 0;
+}
+
+static int msg_create (struct inode *dir, struct dentry *dent, int mode)
+{
+  int id, err;
+  struct inode *inode;
+  struct mq_attr attr, *p;
+  struct list_head *tmp;
+
+  attr.mq_maxmsg = 32;
+  attr.mq_msgsize = 64;
+  p = &attr;
+
+  down(&msg_ids.sem);
+  list_for_each(tmp, &mq_open_links) {
+      struct mq_link *l = list_entry(tmp, struct mq_link, link);
+      if (l->tsk == current) {
+          p = l->attr;
+          break;
+      }
+  }
+  err = id = newque (IPC_PRIVATE, dent->d_name.name, dent->d_name.len, p, mode);
+  if (err < 0)
+      goto out;
+
+  inode = iget (msg_sb, id % SEQ_MULTIPLIER);
+  if (!inode){
+  	err = -ENOMEM;
+  	goto out;
+  }
+  err = 0;
+  down (&inode->i_sem);
+  inode->i_mode = (mode & S_IRWXUGO) | S_IFIFO;
+  inode->i_op   = &msg_inode_operations;
+  d_instantiate(dent, inode);
+  up (&inode->i_sem);
+
+out:
+  up(&msg_ids.sem);
+  return err;
+}
+
+static int msg_readdir (struct file *filp, void *dirent, filldir_t filldir)
+{
+  struct inode * inode = filp->f_dentry->d_inode;
+  struct msg_queue *msq;
+  off_t nr;
+
+  nr = filp->f_pos;
+
+  switch(nr)
+  {
+  case 0:
+      if (filldir(dirent, ".", 1, nr, inode->i_ino, DT_DIR) < 0)
+          return 0;
+      filp->f_pos = ++nr;
+      /* fall through */
+  case 1:
+      if (filldir(dirent, "..", 2, nr, inode->i_ino, DT_DIR) < 0)
+          return 0;
+      filp->f_pos = ++nr;
+      /* fall through */
+  default:
+      down(&msg_ids.sem);
+      for (; nr-2 <= msg_ids.max_id; nr++) {
+          if (!(msq = msg_get (nr-2)))
+              continue;
+          if (msq->q_flags & MSG_UNLK)
+              continue;
+          if (filldir(dirent, msq->q_name, msq->q_namelen, nr, nr, DT_FIFO) < 0)
+              break;;
+      }
+      filp->f_pos = nr;
+      up(&msg_ids.sem);
+      break;
+  }
+
+  UPDATE_ATIME(inode);
+  return 0;
+}
+
+static struct dentry *msg_lookup (struct inode *dir, struct dentry *dent)
+{
+  int i, err = 0;
+  struct msg_queue* msq;
+  struct inode *inode = NULL;
+
+  if (dent->d_name.len > MSG_NAME_LEN)
+      return ERR_PTR(-ENAMETOOLONG);
+
+  down(&msg_ids.sem);
+  for(i = 0; i <= msg_ids.max_id; i++) {
+      if (!(msq = msg_lock(i)))
+          continue;
+      if (!(msq->q_flags & MSG_UNLK) &&
+          dent->d_name.len == msq->q_namelen &&
+          strncmp(dent->d_name.name, msq->q_name, msq->q_namelen) == 0)
+          goto found;
+      msg_unlock(i);
+  }
+
+  /*
+   * prevent the reserved names as negative dentries.
+   * This also prevents object creation through the filesystem
+   */
+  if (dent->d_name.len == MSG_FMT_LEN &&
+      memcmp (MSG_FMT, dent->d_name.name, MSG_FMT_LEN - 8) == 0)
+      err = -EINVAL;  /* EINVAL to give IPC_RMID the right error */
+
+  goto out;
+
+found:
+  msg_unlock(i);
+  inode = iget(dir->i_sb, i);
+
+  if (!inode)
+      err = -EACCES;
+out:
+  if (err == 0)
+      d_add (dent, inode);
+  up (&msg_ids.sem);
+  return ERR_PTR(err);
+}
+
+static inline int msg_do_unlink (struct inode *dir, struct dentry *dent, int sysv)
+{
+  struct inode * inode = dent->d_inode;
+  struct msg_queue *msq;
+
+  down (&msg_ids.sem);
+  if (!(msq = msg_lock (inode->i_ino)))
+      BUG();
+  if (sysv) {
+      int ret = 0;
+
+      if (!(msq->q_flags & MSG_SYSV))
+          ret = -EINVAL;
+      else if (current->euid != msq->q_perm.cuid &&
+           current->euid != msq->q_perm.uid && !capable(CAP_SYS_ADMIN))
+          ret = -EPERM;
+      if (ret) {
+          msg_unlock (inode->i_ino);
+          up (&msg_ids.sem);
+          return ret;
+      }
+  }
+  msq->q_flags |= MSG_UNLK;
+  msq->q_perm.key = IPC_PRIVATE; /* Do not find it any more */
+  msg_unlock (inode->i_ino);
+  up (&msg_ids.sem);
+  inode->i_nlink -= 1;
+  /*
+   * If it's a reserved name we have to drop the dentry instead
+   * of creating a negative dentry
+   */
+  if (dent->d_name.len == MSG_FMT_LEN &&
+      memcmp (MSG_FMT, dent->d_name.name, MSG_FMT_LEN - 8) == 0)
+      d_drop (dent);
+  return 0;
+}
+
+static int msg_unlink (struct inode *dir, struct dentry *dent)
+{
+  return msg_do_unlink (dir, dent, 0);
+}
+static int msg_setattr (struct dentry *dentry, struct iattr *attr)
+{
+  int error;
+  struct inode *inode = dentry->d_inode;
+  struct msg_queue *msq;
+
+  error = inode_change_ok(inode, attr);
+  if (error)
+      return error;
+  if (attr->ia_valid & ATTR_SIZE)
+      return -EINVAL;
+
+  if (attr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
+      if (!(msq = msg_lock(inode->i_ino)))
+          BUG();
+      if (attr->ia_valid & ATTR_MODE)
+          msq->q_flags = (msq->q_flags & ~S_IRWXUGO)
+              | (S_IRWXUGO & attr->ia_mode);
+      if (attr->ia_valid & ATTR_UID)
+          msq->q_perm.uid = attr->ia_uid;
+      if (attr->ia_valid & ATTR_GID)
+          msq->q_perm.gid = attr->ia_gid;
+      msq->q_ctime = attr->ia_ctime;
+      msg_unlock (inode->i_ino);
+  }
+
+  inode_setattr(inode, attr);
+  return error;
+}
+
+static int msg_root_ioctl (struct inode * inode, struct file * filp, unsigned int cmd, unsigned long arg)
+{
+  struct mq_open o;
+  struct mq_link link;
+  int ret;
+
+  if (cmd != MQ_OPEN)
+      return -EINVAL;
+  ret = -EFAULT;
+  if (copy_from_user(&o, (struct mq_open *)arg, sizeof(struct mq_open)))
+      goto out;
+  ret = -EINVAL;
+  if ((unsigned long)o.mq_attr.mq_msgsize > msg_ctlmnb ||
+      (unsigned long)o.mq_attr.mq_maxmsg > msg_ctlmnb ||
+      o.mq_attr.mq_msgsize * o.mq_attr.mq_maxmsg > msg_ctlmnb)
+      goto out;
+  link.attr = &o.mq_attr;
+  link.tsk = current;
+  down(&msg_ids.sem);
+  list_add(&link.link, &mq_open_links);
+  up(&msg_ids.sem);
+  /* FIXME: Shouldn't we check here whether mq_name is really a file within the msg filesystem?
+     Otherwise people tracing the open(2) syscall might miss this place... */
+  ret = sys_open(o.mq_name, o.mq_oflag, o.mq_mode);
+  down(&msg_ids.sem);
+  list_del(&link.link);
+  up(&msg_ids.sem);
+out:
+  return ret;
+}
+
+static int msg_ioctl (struct inode * inode, struct file * filp, unsigned int cmd, unsigned long arg)
+{
+  int ret = -EINVAL;
+  struct msg_queue *msq;
+  struct mq_sndrcv sr;
+
+  switch (cmd) {
+  case MQ_GETATTR: {
+      struct mq_attr attr;
+      memset(&attr, 0, sizeof(attr));
+      msq = msg_lock (inode->i_ino);
+      if (msq == NULL)
+          BUG();
+      attr.mq_maxmsg = msq->q_maxmsg;
+      attr.mq_msgsize = msq->q_msgsize;
+      attr.mq_curmsgs = msq->q_qnum;
+      attr.mq_flags = filp->f_flags & O_NONBLOCK;
+      msg_unlock (inode->i_ino);
+      ret = copy_to_user((struct mq_attr *)arg, &attr, sizeof(attr)) ? -EFAULT : 0;
+      break;
+      }
+  case MQ_SEND:
+      ret = -EBADF;
+      if (!(filp->f_mode & FMODE_WRITE))
+          break;
+      ret = -EFAULT;
+      if (copy_from_user(&sr, (struct mq_sndrcv *)arg, sizeof(sr)))
+          break;
+      ret = -EINVAL;
+      if (sr.mq_type <= 0)
+          break;
+      ret = msg_send (inode, filp, sr.mq_buf, sr.mq_len, sr.mq_type);
+      break;
+  case MQ_RECEIVE:
+      ret = -EBADF;
+      if (!(filp->f_mode & FMODE_READ))
+          break;
+      ret = -EFAULT;
+      if (copy_from_user(&sr, (struct mq_sndrcv *)arg, sizeof(sr)))
+          break;
+      ret = msg_receive (inode, filp, sr.mq_buf, sr.mq_len, &sr.mq_type);
+      if (!ret && put_user (sr.mq_type, &((struct mq_sndrcv *)arg)->mq_type))
+          ret = -EFAULT;
+      break;
+  case MQ_NOTIFY: {
+      struct sigevent sev;
+      struct msg_queue *msg;
+      ret = -EFAULT;
+      if (copy_from_user(&sev, (struct sigevent *)arg, sizeof(sev)))
+          break;
+      ret = -EINVAL;
+      if (sev.sigev_notify != SIGEV_SIGNAL && sev.sigev_notify != SIGEV_NONE)
+          break;
+      if (sev.sigev_signo <= 0 || sev.sigev_signo > _NSIG)
+          break;
+      msg = msg_lock(inode->i_ino);
+      if (!msg) BUG();
+      ret = 0;
+      if (msg->q_signo)
+          ret = -EBUSY;
+      else if (sev.sigev_notify == SIGEV_SIGNAL) {
+          msg->q_signo = sev.sigev_signo;
+          msg->q_sigval = sev.sigev_value;
+      } else
+          msg->q_signo = 0;
+      msg_unlock(inode->i_ino);
+      }
+  default:
+      break;
+  }
+  return ret;
+}
+
+static ssize_t msg_write(struct file * file, 
+	const char * buf, size_t count, loff_t *ppos)
+{
+  int ret = msg_send(file->f_dentry->d_inode, file, buf, count, MQ_DEFAULT_TYPE);
+  return ret ?: count;
+}
+
+static ssize_t msg_read(struct file * file, 
+	char * buf, size_t count, loff_t *ppos)
+{
+  return msg_receive(file->f_dentry->d_inode, file, buf, count, NULL);
+}
+
+static int msg_release (struct inode *ino, struct file *filp)
+{
+  struct msg_queue *msq = msg_lock(ino->i_ino);
+  if (!msq) BUG();
+  if (msq->q_signo && msq->q_pid == current->pid)
+      msq->q_signo = 0;
+  msg_unlock(ino->i_ino);
+  return 0;
+}
+
+static int msg_flush (struct file *filp)
+{
+  return msg_release(filp->f_dentry->d_inode, filp);
+}
+
+static int newque (key_t key, const char *name, int namelen, 
+	struct mq_attr *attr, int msgflg)
 {
 	int id;
 	int retval;
 	struct msg_queue *msq;
 
-	msq  = (struct msg_queue *) kmalloc (sizeof (*msq), GFP_KERNEL);
+	if (namelen > MSG_NAME_LEN)
+		return -ENAMETOOLONG;
+	msq = (struct msg_queue *) kmalloc (sizeof (*msq) + namelen, GFP_KERNEL);
+
 	if (!msq) 
 		return -ENOMEM;
 
@@ -113,18 +670,94 @@
 		kfree(msq);
 		return -ENOSPC;
 	}
+	msq->q_flags = (msgflg & S_IRWXUGO);
+	msq->q_perm.key = key;
 
 	msq->q_stime = msq->q_rtime = 0;
 	msq->q_ctime = CURRENT_TIME;
 	msq->q_cbytes = msq->q_qnum = 0;
 	msq->q_qbytes = msg_ctlmnb;
 	msq->q_lspid = msq->q_lrpid = 0;
+	msq->q_signo = 0;
+
 	INIT_LIST_HEAD(&msq->q_messages);
 	INIT_LIST_HEAD(&msq->q_receivers);
 	INIT_LIST_HEAD(&msq->q_senders);
+	msq->id = msg_buildid(id, msq->q_perm.seq);
+	if (name) {
+	  msq->q_maxmsg = attr->mq_maxmsg;
+	  msq->q_msgsize = attr->mq_msgsize;
+	  msq->q_qbytes = msq->q_maxmsg * msq->q_msgsize;
+	  msq->q_namelen = namelen;
+	  memcpy(msq->q_name, name, namelen);
+	} else {
+	  msq->q_qbytes = msg_ctlmnb;
+	  msq->q_maxmsg = msg_ctlmnb;
+	  msq->q_msgsize = msg_ctlmax;
+	  msq->q_flags |= MSG_SYSV;
+	  msq->q_namelen = sprintf(msq->q_name, MSG_FMT, msq->id);
+	}
 	msg_unlock(id);
 
-	return msg_buildid(id,msq->q_perm.seq);
+	return msq->id;
+}
+
+/* FIXME: maybe we need lock_kernel() here */
+static void msg_delete (struct inode *ino)
+{
+  int msgid = ino->i_ino;
+  struct msg_queue *msq;
+
+  down(&msg_ids.sem);
+  msq = msg_lock(msgid);
+  if(msq==NULL)
+      BUG();
+  freeque(msgid);
+  up(&msg_ids.sem);
+  clear_inode(ino);
+}
+
+static int msg_remove_name(int msqid)
+{
+  struct dentry *dir;
+  struct dentry *dentry;
+  struct msg_queue *msq;
+  int error, id;
+  char name[MSG_FMT_LEN+1];
+
+  down(&msg_ids.sem);
+  msq = msg_lock(msqid);
+  if (msq == NULL)
+      return -EINVAL;
+  id = msq->id;
+  if (msg_checkid (msq, msqid)) {
+      msg_unlock(msqid);
+      return -EIDRM;
+  }
+  msg_unlock(msqid);
+  up(&msg_ids.sem);
+  sprintf (name, MSG_FMT, id);
+  dir=msg_sb->s_root;
+  down(&dir->d_inode->i_sem);
+  dentry = lookup_one_len(name, dir, strlen(name) );
+  error = PTR_ERR(dentry);
+  if (!IS_ERR(dentry)) {
+      /*
+       * We have to do our own unlink to prevent the vfs
+       * permission check. We'll do the SYSV IPC style check
+       * inside of msg_do_unlink when we hold msg lock and
+       * msg_ids semaphore.
+       */
+      struct inode *inode = dir->d_inode;
+      down(&inode->i_sem);
+      error = msg_do_unlink(inode, dentry, 1);
+      if (!error)
+          d_delete(dentry);
+      up(&inode->i_sem);
+      dput(dentry);
+  }
+  up(&dir->d_inode->i_sem);
+  return error;
 }
 
 static void free_msg(struct msg_msg* msg)
@@ -139,7 +772,7 @@
 	}
 }
 
-static struct msg_msg* load_msg(void* src, int len)
+static struct msg_msg* load_msg(const char * src, int len)
 {
 	struct msg_msg* msg;
 	struct msg_msgseg** pseg;
@@ -191,9 +824,9 @@
 	return ERR_PTR(err);
 }
 
-static int store_msg(void* dest, struct msg_msg* msg, int len)
+static int store_msg(void* dest, struct msg_msg* msg, size_t len)
 {
-	int alen;
+	size_t alen;
 	struct msg_msgseg *seg;
 
 	alen = len;
@@ -213,7 +846,7 @@
 			return -1;
 		len -= alen;
 		dest = ((char*)dest)+alen;
-		seg=seg->next;
+		seg = seg->next;
 	}
 	return 0;
 }
@@ -272,7 +905,7 @@
 	expunge_all(msq,-EIDRM);
 	ss_wakeup(&msq->q_senders,1);
 	msg_unlock(id);
-		
+
 	tmp = msq->q_messages.next;
 	while(tmp != &msq->q_messages) {
 		struct msg_msg* msg = list_entry(tmp,struct msg_msg,m_list);
@@ -292,12 +925,12 @@
 	
 	down(&msg_ids.sem);
 	if (key == IPC_PRIVATE) 
-		ret = newque(key, msgflg);
+		ret = newque(key, NULL, MSG_FMT_LEN + 1, NULL, msgflg);
 	else if ((id = ipc_findkey(&msg_ids, key)) == -1) { /* key not used */
 		if (!(msgflg & IPC_CREAT))
 			ret = -ENOENT;
 		else
-			ret = newque(key, msgflg);
+			ret = newque(key, NULL, MSG_FMT_LEN + 1, NULL, msgflg);
 	} else if (msgflg & IPC_CREAT && msgflg & IPC_EXCL) {
 		ret = -EEXIST;
 	} else {
@@ -358,13 +991,6 @@
 	}
 }
 
-struct msq_setbuf {
-	unsigned long	qbytes;
-	uid_t		uid;
-	gid_t		gid;
-	mode_t		mode;
-};
-
 static inline unsigned long copy_msqid_from_user(struct msq_setbuf *out, void *buf, int version)
 {
 	switch(version) {
@@ -468,10 +1094,13 @@
 			return -EINVAL;
 
 		if(cmd == MSG_STAT) {
+			err = -EINVAL;
+			if (!(msq->q_flags & MSG_SYSV))
+				goto out_unlock;
 			success_return = msg_buildid(msqid, msq->q_perm.seq);
 		} else {
-			err = -EIDRM;
-			if (msg_checkid(msq,msqid))
+			err = msg_checkid(msq,msqid);
+			if (err)
 				goto out_unlock;
 			success_return = 0;
 		}
@@ -480,6 +1109,7 @@
 			goto out_unlock;
 
 		kernel_to_ipc64_perm(&msq->q_perm, &tbuf.msg_perm);
+		tbuf.msg_perm.mode &= S_IRWXUGO;
 		tbuf.msg_stime  = msq->q_stime;
 		tbuf.msg_rtime  = msq->q_rtime;
 		tbuf.msg_ctime  = msq->q_ctime;
@@ -500,7 +1130,7 @@
 			return -EFAULT;
 		break;
 	case IPC_RMID:
-		break;
+		return msg_remove_name(msqid);
 	default:
 		return  -EINVAL;
 	}
@@ -521,12 +1151,11 @@
 	    /* We _could_ check for CAP_CHOWN above, but we don't */
 		goto out_unlock_up;
 
-	switch (cmd) {
-	case IPC_SET:
-	{
+	if (cmd == IPC_SET) {
 		if (setbuf.qbytes > msg_ctlmnb && !capable(CAP_SYS_RESOURCE))
 			goto out_unlock_up;
 		msq->q_qbytes = setbuf.qbytes;
+		msq->q_maxmsg = setbuf.qbytes;
 
 		ipcp->uid = setbuf.uid;
 		ipcp->gid = setbuf.gid;
@@ -542,11 +1171,6 @@
 		 */
 		ss_wakeup(&msq->q_senders,0);
 		msg_unlock(msqid);
-		break;
-	}
-	case IPC_RMID:
-		freeque (msqid); 
-		break;
 	}
 	err = 0;
 out_up:
@@ -608,6 +1232,105 @@
 	return 0;
 }
 
+static int msg_do_send (struct msg_queue **msqp, int msqid,
+			struct msg_msg *msg, size_t msgsz, int nowait)
+{
+	struct msg_queue *msq = *msqp;
+
+	if(msgsz + msq->q_cbytes > msq->q_qbytes ||
+	   1 + msq->q_qnum > msq->q_maxmsg) {
+		struct msg_sender s;
+
+		if(nowait)
+			return -EAGAIN;
+
+		ss_add(msq, &s);
+		msg_unlock(msqid);
+		schedule();
+		current->state = TASK_RUNNING;
+
+		*msqp = msq = msg_lock(msqid);
+		if(msq==NULL)
+			return -EIDRM;
+		ss_del(&s);
+		
+		if (signal_pending(current))
+			return -EINTR;
+		return -EBUSY;
+	}
+
+	if(!pipelined_send(msq,msg)) {
+		/* noone is waiting for this message, enqueue it */
+		list_add_tail(&msg->m_list,&msq->q_messages);
+		msq->q_cbytes += msgsz;
+		msq->q_qnum++;
+		atomic_add(msgsz,&msg_bytes);
+		atomic_inc(&msg_hdrs);
+		if (msq->q_qnum == 1 && msq->q_signo) {
+			struct task_struct *p;
+			siginfo_t si;
+			read_lock(&tasklist_lock);
+			p = find_task_by_pid(msq->q_pid);
+			if (p) {
+				si.si_signo = msq->q_signo;
+				si.si_errno = 0;
+				si.si_code = SI_MESGQ;
+				si.si_pid = current->pid;
+				si.si_uid = current->euid;
+				si.si_value = msq->q_sigval;
+				if (!send_sig_info(msq->q_signo, &si, p))
+					send_sig(msq->q_signo, p, 1);
+			}
+			read_unlock(&tasklist_lock);
+			msq->q_signo = 0;
+		}
+	}
+
+	msq->q_lspid = current->pid;
+	msq->q_stime = CURRENT_TIME;
+	return 0;
+}
+
+static ssize_t msg_send (struct inode *ino, struct file *filp, const char *mtext, size_t msgsz, long mtype)
+{
+	struct msg_queue *msq;
+	struct msg_msg *msg;
+	int err = 0;
+	
+	if (mtype < 1)
+		return -EINVAL;
+	msq = msg_lock(ino->i_ino);
+	if (!msq) BUG();
+	if (msgsz > msq->q_msgsize)
+		err = -EMSGSIZE;
+	msg_unlock(ino->i_ino);
+	if (err) return err;
+
+	msg = load_msg(mtext, msgsz);
+	if(IS_ERR(msg))
+		return PTR_ERR(msg);
+
+	msg->m_type = mtype;
+	msg->m_ts = msgsz;
+
+	msq = msg_lock(ino->i_ino);
+	if (!msq) BUG();
+
+	do {
+		err = -EACCES;
+		if (msq->q_flags & MSG_SYSV && ipcperms(&msq->q_perm, S_IWUGO))
+			break;
+
+		err = msg_do_send(&msq, ino->i_ino, msg, msgsz, filp->f_flags & O_NONBLOCK);
+
+	} while (err == -EBUSY);
+
+	msg_unlock(ino->i_ino);
+	if (msg && err)
+		free_msg(msg);
+	return err;
+}
+
 asmlinkage long sys_msgsnd (int msqid, struct msgbuf *msgp, size_t msgsz, int msgflg)
 {
 	struct msg_queue *msq;
@@ -633,60 +1356,23 @@
 	err=-EINVAL;
 	if(msq==NULL)
 		goto out_free;
-retry:
-	err= -EIDRM;
-	if (msg_checkid(msq,msqid))
-		goto out_unlock_free;
-
-	err=-EACCES;
-	if (ipcperms(&msq->q_perm, S_IWUGO)) 
-		goto out_unlock_free;
-
-	if(msgsz + msq->q_cbytes > msq->q_qbytes ||
-		1 + msq->q_qnum > msq->q_qbytes) {
-		struct msg_sender s;
-
-		if(msgflg&IPC_NOWAIT) {
-			err=-EAGAIN;
-			goto out_unlock_free;
-		}
-		ss_add(msq, &s);
-		msg_unlock(msqid);
-		schedule();
-		current->state= TASK_RUNNING;
+	do {
+	  err= -EIDRM;
+	  if (msg_checkid(msq,msqid))
+    	  break;
 
-		msq = msg_lock(msqid);
-		err = -EIDRM;
-		if(msq==NULL)
-			goto out_free;
-		ss_del(&s);
-		
-		if (signal_pending(current)) {
-			err=-EINTR;
-			goto out_unlock_free;
-		}
-		goto retry;
-	}
+	  err=-EACCES;
+	  if (ipcperms(&msq->q_perm, S_IWUGO))
+    	  break;
 
-	msq->q_lspid = current->pid;
-	msq->q_stime = CURRENT_TIME;
+	  err = msg_do_send(&msq, msqid, msg, msgsz, msgflg & IPC_NOWAIT);
 
-	if(!pipelined_send(msq,msg)) {
-		/* noone is waiting for this message, enqueue it */
-		list_add_tail(&msg->m_list,&msq->q_messages);
-		msq->q_cbytes += msgsz;
-		msq->q_qnum++;
-		atomic_add(msgsz,&msg_bytes);
-		atomic_inc(&msg_hdrs);
-	}
-	
-	err = 0;
-	msg = NULL;
+	} while (err == -EBUSY);
 
-out_unlock_free:
-	msg_unlock(msqid);
+	if (msq)
+	  msg_unlock(msqid);
 out_free:
-	if(msg!=NULL)
+	if (msg && err)
 		free_msg(msg);
 	return err;
 }
@@ -710,127 +1396,169 @@
 	return SEARCH_EQUAL;
 }
 
+static struct msg_msg *
+msg_do_receive (struct msg_queue *msq, int *msqidp, size_t msgsz,
+      long msgtyp, int mode, int msgflg)
+{
+  struct msg_receiver msr_d;
+  struct list_head *tmp;
+  struct msg_msg *msg, *found_msg;
+  int msqid = *msqidp;
+
+  for (;;) {
+      if (msq->q_flags & MSG_SYSV && ipcperms (&msq->q_perm, S_IRUGO))
+          return ERR_PTR(-EACCES);
+
+      tmp = msq->q_messages.next;
+      found_msg = NULL;
+      while (tmp != &msq->q_messages) {
+          msg = list_entry(tmp,struct msg_msg,m_list);
+          if(testmsg(msg, msgtyp, mode)) {
+              found_msg = msg;
+              if(mode == SEARCH_LESSEQUAL && msg->m_type != 1)
+                  msgtyp = msg->m_type - 1;
+              else
+                  break;
+          }
+          tmp = tmp->next;
+      }
+      if (found_msg) {
+          msg = found_msg;
+          if ((msgsz < msg->m_ts) && !(msgflg & MSG_NOERROR))
+              return ERR_PTR(-E2BIG);
+          list_del(&msg->m_list);
+          msq->q_qnum--;
+          msq->q_rtime = CURRENT_TIME;
+          msq->q_lrpid = current->pid;
+          msq->q_cbytes -= msg->m_ts;
+          atomic_sub(msg->m_ts,&msg_bytes);
+          atomic_dec(&msg_hdrs);
+          ss_wakeup(&msq->q_senders,0);
+          msg_unlock(msqid);
+          return msg;
+      } else {
+          struct msg_queue *t;
+          /* no message waiting. Prepare for pipelined
+           * receive.
+           */
+          if (msgflg & IPC_NOWAIT)
+              return ERR_PTR(-ENOMSG);
+          list_add_tail(&msr_d.r_list,&msq->q_receivers);
+          msr_d.r_tsk = current;
+          msr_d.r_msgtype = msgtyp;
+          msr_d.r_mode = mode;
+          if(msgflg & MSG_NOERROR)
+              msr_d.r_maxsize = INT_MAX;
+          else
+              msr_d.r_maxsize = msgsz;
+          msr_d.r_msg = ERR_PTR(-EAGAIN);
+          current->state = TASK_INTERRUPTIBLE;
+          msg_unlock(msqid);
+
+          schedule();
+          current->state = TASK_RUNNING;
+
+          msg = (struct msg_msg*) msr_d.r_msg;
+          if(!IS_ERR(msg))
+              return msg;
+
+          t = msg_lock(msqid);
+          if(t == NULL)
+              *msqidp = msqid = -1;
+          msg = (struct msg_msg*)msr_d.r_msg;
+          if(!IS_ERR(msg)) {
+              /* our message arived while we waited for
+               * the spinlock. Process it.
+               */
+              if (msqid != -1)
+                  msg_unlock(msqid);
+              return msg;
+          }
+          if(PTR_ERR(msg) == -EAGAIN) {
+              if(msqid == -1)
+                  BUG();
+              list_del(&msr_d.r_list);
+              if (signal_pending(current))
+                  return ERR_PTR(-EINTR);
+              else
+                  continue;
+          }
+          return msg;
+      }
+  }
+}
+
+static int msg_receive (struct inode *ino, struct file *filp, char *mtext,
+          size_t msgsz, long *msgtypp)
+{
+  struct msg_queue *msq;
+  struct msg_msg *msg;
+  long msgtyp;
+  int err, mode, msqid = ino->i_ino;
+
+  if (msgtypp)
+      msgtyp = *msgtypp;
+  else
+      msgtyp = -MQ_DEFAULT_TYPE;
+  mode = convert_mode(&msgtyp, 0);
+  msq = msg_lock(msqid);
+  if (!msq) BUG();
+  if (msgtypp && msgsz < msq->q_msgsize) {
+      msg_unlock(msqid);
+      return -EMSGSIZE;
+  }
+
+  msg = msg_do_receive (msq, &msqid, msgsz, msgtyp, mode,
+                (filp->f_flags & O_NONBLOCK) ? IPC_NOWAIT : 0);
+  if (!IS_ERR (msg)) {
+      msgsz = (msgsz > msg->m_ts) ? msg->m_ts : msgsz;
+      if (store_msg(mtext, msg, msgsz))
+          msgsz = -EFAULT;
+      else if (msgtypp)
+          *msgtypp = msg->m_type;
+      free_msg(msg);
+      return msgsz;
+  }
+  if (msqid != -1)
+      msg_unlock(msqid);
+  err = PTR_ERR(msg);
+  switch (err) {
+  case -ENOMSG: err = -EAGAIN; break;
+  case -E2BIG: err = -EMSGSIZE; break;
+  }
+  return err;
+}
+
 asmlinkage long sys_msgrcv (int msqid, struct msgbuf *msgp, size_t msgsz,
 			    long msgtyp, int msgflg)
 {
 	struct msg_queue *msq;
-	struct msg_receiver msr_d;
-	struct list_head* tmp;
-	struct msg_msg* msg, *found_msg;
-	int err;
+	struct msg_msg *msg;
 	int mode;
 
 	if (msqid < 0 || (long) msgsz < 0)
 		return -EINVAL;
-	mode = convert_mode(&msgtyp,msgflg);
+	mode = convert_mode(&msgtyp, msgflg);
 
-	msq = msg_lock(msqid);
-	if(msq==NULL)
+	msq = msg_lock (msqid);
+	if (msq==NULL)
+		return -EINVAL;
+	if (!(msq->q_flags & MSG_SYSV)) {
+		msg_unlock (msqid);
 		return -EINVAL;
-retry:
-	err = -EIDRM;
-	if (msg_checkid(msq,msqid))
-		goto out_unlock;
-
-	err=-EACCES;
-	if (ipcperms (&msq->q_perm, S_IRUGO))
-		goto out_unlock;
-
-	tmp = msq->q_messages.next;
-	found_msg=NULL;
-	while (tmp != &msq->q_messages) {
-		msg = list_entry(tmp,struct msg_msg,m_list);
-		if(testmsg(msg,msgtyp,mode)) {
-			found_msg = msg;
-			if(mode == SEARCH_LESSEQUAL && msg->m_type != 1) {
-				found_msg=msg;
-				msgtyp=msg->m_type-1;
-			} else {
-				found_msg=msg;
-				break;
-			}
-		}
-		tmp = tmp->next;
 	}
-	if(found_msg) {
-		msg=found_msg;
-		if ((msgsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) {
-			err=-E2BIG;
-			goto out_unlock;
-		}
-		list_del(&msg->m_list);
-		msq->q_qnum--;
-		msq->q_rtime = CURRENT_TIME;
-		msq->q_lrpid = current->pid;
-		msq->q_cbytes -= msg->m_ts;
-		atomic_sub(msg->m_ts,&msg_bytes);
-		atomic_dec(&msg_hdrs);
-		ss_wakeup(&msq->q_senders,0);
-		msg_unlock(msqid);
-out_success:
+	msg = msg_do_receive (msq, &msqid, msgsz, msgtyp, mode, msgflg);
+	if (!IS_ERR (msg)) {
 		msgsz = (msgsz > msg->m_ts) ? msg->m_ts : msgsz;
-		if (put_user (msg->m_type, &msgp->mtype) ||
-		    store_msg(msgp->mtext, msg, msgsz)) {
-			    msgsz = -EFAULT;
-		}
-		free_msg(msg);
-		return msgsz;
-	} else
-	{
-		struct msg_queue *t;
-		/* no message waiting. Prepare for pipelined
-		 * receive.
-		 */
-		if (msgflg & IPC_NOWAIT) {
-			err=-ENOMSG;
-			goto out_unlock;
-		}
-		list_add_tail(&msr_d.r_list,&msq->q_receivers);
-		msr_d.r_tsk = current;
-		msr_d.r_msgtype = msgtyp;
-		msr_d.r_mode = mode;
-		if(msgflg & MSG_NOERROR)
-			msr_d.r_maxsize = INT_MAX;
-		 else
-		 	msr_d.r_maxsize = msgsz;
-		msr_d.r_msg = ERR_PTR(-EAGAIN);
-		current->state = TASK_INTERRUPTIBLE;
-		msg_unlock(msqid);
-
-		schedule();
-		current->state = TASK_RUNNING;
-
-		msg = (struct msg_msg*) msr_d.r_msg;
-		if(!IS_ERR(msg)) 
-			goto out_success;
-
-		t = msg_lock(msqid);
-		if(t==NULL)
-			msqid=-1;
-		msg = (struct msg_msg*)msr_d.r_msg;
-		if(!IS_ERR(msg)) {
-			/* our message arived while we waited for
-			 * the spinlock. Process it.
-			 */
-			if(msqid!=-1)
-				msg_unlock(msqid);
-			goto out_success;
-		}
-		err = PTR_ERR(msg);
-		if(err == -EAGAIN) {
-			if(msqid==-1)
-				BUG();
-			list_del(&msr_d.r_list);
-			if (signal_pending(current))
-				err=-EINTR;
-			 else
-				goto retry;
-		}
-	}
-out_unlock:
-	if(msqid!=-1)
-		msg_unlock(msqid);
-	return err;
+    	if (put_user (msg->m_type, &msgp->mtype) ||
+    	  store_msg(msgp->mtext, msg, msgsz))
+    	  msgsz = -EFAULT;
+    	free_msg(msg);
+    	return msgsz;
+	}
+	if (msqid != -1)
+    	msg_unlock(msqid);
+	return PTR_ERR(msg);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -841,16 +1569,16 @@
 	int i, len = 0;
 
 	down(&msg_ids.sem);
-	len += sprintf(buffer, "       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n");
+	len += sprintf(buffer, "       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime 	name(POSIX)\n");
 
 	for(i = 0; i <= msg_ids.max_id; i++) {
 		struct msg_queue * msq;
 		msq = msg_lock(i);
 		if(msq != NULL) {
-			len += sprintf(buffer + len, "%10d %10d  %4o  %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n",
+			len += sprintf(buffer + len, "%10d %10d  %4o  %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu %.*s%s\n",
 				msq->q_perm.key,
 				msg_buildid(i,msq->q_perm.seq),
-				msq->q_perm.mode,
+				msq->q_flags & S_IRWXUGO,
 				msq->q_cbytes,
 				msq->q_qnum,
 				msq->q_lspid,
@@ -861,7 +1589,10 @@
 				msq->q_perm.cgid,
 				msq->q_stime,
 				msq->q_rtime,
-				msq->q_ctime);
+				msq->q_ctime,
+				msq->q_namelen,
+				msq->q_name,
+				msq->q_flags & MSG_UNLK ? " (deleted)" : "");
 			msg_unlock(i);
 
 			pos += len;

^ permalink raw reply

* [Linux-ia64] Is there any good linux/ia64 kernel debugger?
From: Jassie Tsai @ 2002-10-27 15:21 UTC (permalink / raw)
  To: linux-ia64

I am tracing and modifying linux kernel-2.4.14-ia64.
Is there any good kernel source debugger to use?
Thanks very much!!!

Jassie Tsai



^ permalink raw reply

* [LARTC] Bad performance using source routing
From: Henrik Johansson - GlobeCom @ 2002-10-27 15:11 UTC (permalink / raw)
  To: lartc

Hi,

I have a performance problem when using source routing, following the
instructions in the Adv Linux Routing HOW-TO. I have a box with 3 NICs,
all connected to 3 different ip networks. The box look like this:

733MHz P3 CPU
256MB RAM
Mandrake Linux 9.0
3 3C905B NIC
                         _________
      1.2.3.93/29 ETH2 -|         |- ETH0 2.2.3.44/26
                         ---------
                             |
                           ETH 1
                        10.7.7.3/24

ETH2 gw 1.2.3.89
ETH1 is a "point-to-point" link to an NFS-server
ETH0 gw 2.2.3.6/26

I want traffic comming in on ETH2 and ETH0 to be routed back via their
respective default gw. This is not a problem, routing wise, it works just
as it should, but the problem is that when uploading to the box, it is
realllly slow. If I try and FTP to it, I get between 7-10MB/s download,
but only around 1-2MB/s upload. This is if I use source routing. As a
small problem fix, I use the following script, only applying source
routing to ETH2, and letting the "normal def gw"-rule apply for ETH0, that
way up- and download to ETH0 is basically 100Mbit.

The script I run is this:

# add instructions for eth2
/sbin/ip route add 1.2.3.88/29 dev eth2 src 1.2.3.93 table nic3
/sbin/ip route add default via 1.2.3.89 table nic3
/sbin/ip rule add from 1.2.3.93 table nic3
/sbin/ip route flush cache

I also had a similar script for ETH0, but as I said, the performance was
really bad. At least now, performace for ETH0 is okey both upstream and
downstream.

Anyone know where I should start looking?

Regards

Henrik Johansson
GlobeCom Network


_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply

* Re: [PATCH] [RFC] Advanced TCA SCSI Disk Hotswap
From: Rob Landley @ 2002-10-27 15:08 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley; +Cc: Steven Dake, linux-scsi, linux-kernel
In-Reply-To: <3DB887D5.5080500@pobox.com>

On Thursday 24 October 2002 18:52, Jeff Garzik wrote:
> James Bottomley wrote:
> >>n Advanced TCA (what spawned this work) a button is pressed to
> >>indicate  hotswap removal which makes for easy detection of hotswap
> >>events.  This is why there are  kernel interfaces for removal and
> >>insertion (so a kernel driver can be written to detect  the button
> >>press and remove the devices from the os data structures and then
> >>light a blue  led indicating safe for removal).
> >
> >OK, I understand what's going on now.  It's no different from those
> > hotplug PCI busses where you press the button and a second or so later
> > the LED goes out and you can remove the card.  10ms sounds rather a short
> > maximum time for a technician to wait for a light to go out....I suppose
> > Telco technicians are rather impatient.
> >
> >I really think you need to lengthen this interval.  The kernel is moving
> >towards this type of hotplug infrastructure which you can easily leverage
> > (or even help build), but it's definitely going to be mainly in user
> > space.
>
> Caveat coder -- you also have to handle the case where the device is
> already gone, by the time you are notified of the hot-unplug event.
>  Some ejections are less friendly than others...  though from a SCSI
> standpoint, hopefully that case is easier -- error out all I/Os in
> flight, and unregister the host and device structures associated with
> the recently-removed host.  The devil, of course, is in the details ;-)

Hmmm...  Not being familiar with the SCSI layer but sticking my nose in anyway 
on general block device/mount point hotplug issues:

How hard would it be to write a simple debugging function to lobotomize a 
block device?  (So that all further I/O to that sucker immediately returns an 
error.)  Not just simulating an a hot extraction (or catastrophic failure) of 
a block device, but also something you could use to see how gracefully 
filesystems react.

The reason I ask is there was a discussion a while back about the new lazy 
unmount (umount -l /blah/foo) not always being quite enough, and that 
sometimes what what you want is basically "umount -9 /blah/foo" (ala kill 
-9).  Close all files, reparent all process home directories and chroot mount 
points to a dummy inode, flush all I/O, drive a stake through the 
superblock's heart, and scatter the ashes at sea.  Somebody posted a patch to 
actually do this.  (Against 2.4, i think.)  I could probably dig it up if you 
were curious.  Let's see...

http://marc.theaimsgroup.com/?l=linux-kernel&m=103443466225915&q=raw

The eject command should certainly have an "umount with shotgun" option, so 
zombie processes can't pin your CD in the drive.  (Your average end-user is 
NOT going to be able to grovel through /proc to figure out which processes 
have an open filehandle or home directory under the cdrom mount point so it 
can kill them and get the disk out.  They're going to power cycle the machine 
and eject it while the bios is in charge.  I've done this myself a couple of 
times when I'm in a hurry.)

Anyway, if the block device under the filesystem honestly does go away for 
hotplug eject reasons, the obvious thing to do is umount -9 the sucker 
immediately so userspace can collapse gracefully (or even conceivably 
recover).  The main difference here is that the flushing would all error out 
and get discarded, and this wouldn't always get reported to the user, but 
thanks to write cacheing that's the case anyway.  (Use some variant of 
O_DIRECT or fsync if you care.)  The errors userspace does see switch from 
"all my I/O failed with a media error" to "all my filehandles closed out from 
under me" (and the directory I'm in has been deleted), but that's still 
relatively logical behavior.

Does this sound like it's off in left field?

>     Jeff

Rob

-- 
http://penguicon.sf.net - Terry Pratchett, Eric Raymond, Pete Abrams, Illiad, 
CmdrTaco, liquid nitrogen ice cream, and caffienated jello.  Well why not?

^ permalink raw reply

* setup firewall to allow Remote Desktop in XP???
From: Ben Tan @ 2002-10-27 15:05 UTC (permalink / raw)
  To: netfilter

[-- Attachment #1: Type: text/plain, Size: 239 bytes --]

hi,
    I am considering to setup a firewall to allow the remote desktop connection for internet. which ports to allow for INPUT, FORWARD, OUTPUT?

    There will be a DNAT for each connecton request to the internal client.

thanxs.

[-- Attachment #2: Type: text/html, Size: 808 bytes --]

^ permalink raw reply

* Re: rootfs exposure in /proc/mounts
From: Christoph Hellwig @ 2002-10-27 15:09 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andreas Haumer, linux-kernel, willy
In-Reply-To: <3DBC0007.8020005@pobox.com>

On Sun, Oct 27, 2002 at 10:02:31AM -0500, Jeff Garzik wrote:
> symlinks directly to /proc/mounts is fine with me -- just don't expect 
> any sympathy when userspace tools don't handle things like $subject.  :) 
>  The answer will be "fix the userspace tools" not "add special case code 
> to the kernel" :)

well, better link to /proc/self/mounts directly, that's where /proc/mounts
links to.  That's another reason why the /etc/mtab-concept is broken:
you might have very different mounts in different processes.


^ permalink raw reply

* Re: Patch(2.5.44 and 2.4.x): 6 files referenced pci_dev.driver_data instead of pci_{g,s}et_drv_data
From: Jeff Garzik @ 2002-10-27 15:06 UTC (permalink / raw)
  To: Adam J. Richter
  Cc: alan, andre, axboe, netwerk, jerdfelt, neilb, mikep, linux-tr,
	arjanv, henrique, linux-kernel
In-Reply-To: <20021027013619.A5918@baldur.yggdrasil.com>

Patch looks good to me.  I'll queue for Linus unless someone else has 
already done so.

could you be talked into doing a similar patch for 2.4.x?



^ permalink raw reply

* Re: Swap doesn't work
From: Alan Cox @ 2002-10-27 15:21 UTC (permalink / raw)
  To: Vladimír T Tý; +Cc: Alex Riesen, Linux Kernel Mailing List, clock
In-Reply-To: <000801c27dc8$044f43f0$4500a8c0@cybernet.cz>

On Sun, 2002-10-27 at 14:48, Vladimír Třebický wrote:
> > > That's not a badblock. That's an kernel IDE bug. Andre Hedrick and Alan
> > > Cox will love to see this.
> >
> > Not on a kernel built with an untrusted hand built tool chain
> >
> Well, I don't know what could possibly cause this kind of error except
> kernel.
> No matter what application I use to read or write /dev/hda6. Which part
> of my tool chain do you have in mind?

gcc and binutils. I get so many weird never duplicated reports from
linux from scratch people that don't happen to anyone else that I treat
them with deep suspicion.  Especially because it sometimes goes away if
they instead build the same kernel with Debian/Red Hat/.. binutils/gcc


^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.