Re: RFC: Optimizing putcs()

linux-fbdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Antonino Daplas <adaplas@pol.net>
To: fbdev <linux-fbdev-devel@lists.sourceforge.net>
Subject: Re: RFC: Optimizing putcs()
Date: 07 Aug 2002 13:25:46 +0800	[thread overview]
Message-ID: <1028697994.561.3.camel@daplas> (raw)
In-Reply-To: <1028584418.556.29.camel@daplas>

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

One of the reason why 2.4 console performance is good especially at low
bit depths is its ability to process more than 1 pixel per iteration and
its usage of mask arrays.  

I tried to generalize the above in cfbimgblt.c by incorporating the idea
in fbcon-cfb*.c. It's significantly faster but still not as fast as the
2.4 API. 

time cat /usr/src/linux/MAINTAINERs (40K text file) 
1024x768-8bpp, y-panning disabled 

2.5 old (with offscreen buffers) 

real    0m10.708s 
user    0m0.001s 
sys     0m10.707s 

2.5 new 

real    0m4.378s 
user    0m0.002s 
sys     0m4.375s 

2.4 

real    0m2.098s 
user    0m0.000s 
sys     0m2.070s 

I've only tested the implementation at 8, 16, 24, and 32 bpp.  24bpp is
slightly slower than 32 bpp :( 

Tony 

[-- Attachment #2: cfbimgblt.c --]
[-- Type: text/x-c, Size: 5558 bytes --]

/*
 *  Generic BitBLT function for frame buffer with packed pixels of any depth.
 *
 *      Copyright (C)  June 1999 James Simmons
 *
 *  This file is subject to the terms and conditions of the GNU General Public
 *  License.  See the file COPYING in the main directory of this archive for
 *  more details.
 *
 * NOTES:
 *
 *    This function copys a image from system memory to video memory. The
 *  image can be a bitmap where each 0 represents the background color and
 *  each 1 represents the foreground color. Great for font handling. It can
 *  also be a color image. This is determined by image_depth. The color image
 *  must be laid out exactly in the same format as the framebuffer. Yes I know
 *  their are cards with hardware that coverts images of various depths to the
 *  framebuffer depth. But not every card has this. All images must be rounded
 *  up to the nearest byte. For example a bitmap 12 bits wide must be two 
 *  bytes width. 
 *
 *  FIXME
 *  The code for 24 bit is horrible. It copies byte by byte size instead of
 *  longs like the other sizes. Needs to be optimized.
 *  
 *  Tony: 
 *  Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API.  This speeds 
 *  up the code significantly.
 *  
 *  Code for depths not multiples of BITS_PER_LONG is still kludgy, which is
 *  still processed a bit at a time.   
 *
 *  Also need to add code to deal with cards endians that are different than
 *  the native cpu endians. I also need to deal with MSB position in the word.
 *
 */
#include <linux/string.h>
#include <linux/fb.h>
#include <asm/types.h>

#include <video/fbcon.h>

#define DEBUG

#ifdef DEBUG
#define DPRINTK(fmt, args...) printk(KERN_DEBUG "%s: " fmt,__FUNCTION__,## args)
#else
#define DPRINTK(fmt, args...)
#endif

static u32 cfb_tab8[] = {
#if defined(__BIG_ENDIAN)
    0x00000000,0x000000ff,0x0000ff00,0x0000ffff,
    0x00ff0000,0x00ff00ff,0x00ffff00,0x00ffffff,
    0xff000000,0xff0000ff,0xff00ff00,0xff00ffff,
    0xffff0000,0xffff00ff,0xffffff00,0xffffffff
#elif defined(__LITTLE_ENDIAN)
    0x00000000,0xff000000,0x00ff0000,0xffff0000,
    0x0000ff00,0xff00ff00,0x00ffff00,0xffffff00,
    0x000000ff,0xff0000ff,0x00ff00ff,0xffff00ff,
    0x0000ffff,0xff00ffff,0x00ffffff,0xffffffff
#else
#error FIXME: No endianness??
#endif
};

static u32 cfb_tab16[] = {
#if defined(__BIG_ENDIAN)
    0x00000000, 0x0000ffff, 0xffff0000, 0xffffffff
#elif defined(__LITTLE_ENDIAN)
    0x00000000, 0xffff0000, 0x0000ffff, 0xffffffff
#else
#error FIXME: No endianness??
#endif
};

static u32 cfb_tab32[] = {
	0x00000000, 0xffffffff
};

static u32 cfb_tabdef[2];

void cfb_imageblit(struct fb_info *p, struct fb_image *image)
{
	int pad, ppw;
	int x2, y2, n, i, j, l = 8;
	unsigned long tmp = ~0 << (BITS_PER_LONG - p->var.bits_per_pixel);
	unsigned long fgx, bgx, fgcolor, bgcolor, eorx;	
	unsigned long end_mask, bit_mask, bytes = 4;
	u32 *tab = NULL;
	u8 *dst = NULL, *dst1;
	u8 *src;

	/* 
	 * We could use hardware clipping but on many cards you get around hardware
	 * clipping by writing to framebuffer directly like we are doing here. 
	 */
	x2 = image->dx + image->width;
	y2 = image->dy + image->height;
	image->dx = image->dx > 0 ? image->dx : 0;
	image->dy = image->dy > 0 ? image->dy : 0;
	x2 = x2 < p->var.xres_virtual ? x2 : p->var.xres_virtual;
	y2 = y2 < p->var.yres_virtual ? y2 : p->var.yres_virtual;
	image->width  = x2 - image->dx;
	image->height = y2 - image->dy;

	dst1 = p->screen_base + image->dy * p->fix.line_length + 
		((image->dx * p->var.bits_per_pixel) >> 3);

	ppw = BITS_PER_LONG/p->var.bits_per_pixel;

	src = image->data;	

	switch (ppw) {
	case 4:
		tab = cfb_tab8;
		break;
	case 2:
		tab = cfb_tab16;
		break;
	case 1:
		tab = cfb_tab32;
		break;
	}

	if (image->depth == 1) {

		if (p->fix.visual == FB_VISUAL_TRUECOLOR) {
			fgx = fgcolor = ((u32 *)(p->pseudo_palette))[image->fg_color];
			bgx = bgcolor = ((u32 *)(p->pseudo_palette))[image->bg_color];
		} else {
			fgx = fgcolor = image->fg_color;
			bgx = bgcolor = image->bg_color;
		}	

		for (i = 0; i < ppw-1; i++) {
			fgx <<= p->var.bits_per_pixel;
			bgx <<= p->var.bits_per_pixel;
			fgx |= fgcolor;
			bgx |= bgcolor;
		}

		/*
		 * BPP kludge, should be generalized/optimized later
		 */
		if (BITS_PER_LONG % p->var.bits_per_pixel) {
			bytes = (p->var.bits_per_pixel + 7) >> 3;
			tmp = ~0UL >> (BITS_PER_LONG - p->var.bits_per_pixel);
			tab = cfb_tabdef;
			tab[0] = 0;
			tab[1] = tmp;
			fgx = fgcolor;
			bgx = bgcolor;
			ppw = 1;
		}

		bit_mask = (1 << ppw) - 1;

		n = ((image->width + 7) >> 3);
		pad = (n << 3) - image->width;
		n = image->width % ppw;

		eorx = fgx ^ bgx;

		for (i = 0; i < image->height; i++) {
			dst = dst1;

			for (j = image->width/ppw; j--; ) {
				l -= ppw;
				end_mask = tab[(*src >> l) & bit_mask]; 
				fb_writel((end_mask & eorx)^bgx, (unsigned long *) dst);
				dst += bytes;
				if (!l) { l = 8; src++; }
			}

			if (n) {
				end_mask = 0;	
				for (j = n; j > 0; j--) {
					l--;
					if (test_bit(l, (unsigned long *) src))
						end_mask |= (tmp >> (p->var.bits_per_pixel*(j-1)));
					if (!l) { l = 8; src++; }
				}
				fb_writel((end_mask & eorx)^bgx, (unsigned long *) dst);
				dst += bytes;
			}
			l -= pad;		
			dst1 += p->fix.line_length;	
		}	
	} else {
		/* Draw the penguin */
		n = ((image->width * p->var.bits_per_pixel) >> 3);
		end_mask = 0;
	}
}

next prev parent reply	other threads:[~2002-08-07  5:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-05 22:04 RFC: Optimizing putcs() Antonino Daplas
2002-08-06 20:08 ` Geert Uytterhoeven
2002-08-07  0:17   ` Antonino Daplas
2002-08-07  5:25 ` Antonino Daplas [this message]
     [not found] <20020806054957.44715.qmail@web13004.mail.yahoo.com>
2002-08-06 18:11 ` Antonino Daplas
2002-08-08 18:31   ` James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1028697994.561.3.camel@daplas \
    --to=adaplas@pol.net \
    --cc=linux-fbdev-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).