* RFC: Optimizing putcs()
@ 2002-08-05 22:04 Antonino Daplas
2002-08-06 20:08 ` Geert Uytterhoeven
2002-08-07 5:25 ` Antonino Daplas
0 siblings, 2 replies; 6+ messages in thread
From: Antonino Daplas @ 2002-08-05 22:04 UTC (permalink / raw)
To: fbdev
[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]
With fbcon-accel and the new drawing functions in linux-2.5, console
performance degraded compared to the linux-2.4 implementation. This is
because putcs() has to to do 1 fb_imageblit() per character to be
drawn.
This can be optimized by letting putcs() initially construct the row of
text to be drawn into an offscreen buffer, then do a single
fb_imageblit() in the end. Performance wil increase for several
reasons:
1. Drawing can be done in "bursts" instead of "trickles"
2. For drivers that support accelerated drawing functions, the
offscreen buffer can be optionally placed in graphics (or AGP) memory,
which is better suited for most hardware that can only do blit's from
video memory to video memory.
3. Some level of asynchronicity can be achieved, ie, the hardware can
be blitting while fbcon-accel is constructing bitmaps. This would
require "walking" the offscreen buffer, and support for hardware
graphics sync'ing on demand.
I have included a patch for 2.5.27 that implements it in fbcon-accel.
It's preliminary, but I have tested it with cfb_imageblit and with
hardware imageblit, with buffers in System or Video memory.
The code is also present for hardware syncing on demand, though
unimplemented.
For drivers that uses cfb_imageblit or similar, a code such as the one
below can be inserted during initialization:
info->pixmap.addr = (unsigned long) kmalloc(BUFFER_SIZE, GFP_KERNEL);
info->pixmap.size = BUFFER_SIZE;
info->pixmap.offset = 0;
info->pixmap.buf_align = 1;
info->pixmap.scan_align = 1;
Some benchmarks:
time cat /usr/src/linux/MAINTAINERS (40K text file)
mode: 1024x768@8bpp, y-panning disabled.
cfb_imageblit - no offscreen buffer (default)
real 0m13.586s
user 0m0.001s
sys 0m13.585s
cfb_imageblit - with offscreen buffer in system memory
real 0m10.708s
user 0m0.001s
sys 0m10.707s
hardware imageblit - no offscreen buffer
real 0m6.036s
user 0m0.001s
sys 0m6.035s
hardware imageblit - with offscreen buffer in graphics memory
real 0m3.160s
user 0m0.001s
sys 0m3.160s
hardware imageblit - graphics offscreen buffer + hardware sync on demand
real 0m1.843s
user 0m0.000s
sys 0m1.843s
Tony
[-- Attachment #2: fb-pixmap.diff --]
[-- Type: text/x-patch, Size: 5107 bytes --]
diff -Naur linux-2.5.27/drivers/video/fbcon-accel.c linux/drivers/video/fbcon-accel.c
--- linux-2.5.27/drivers/video/fbcon-accel.c Mon Aug 5 21:15:56 2002
+++ linux/drivers/video/fbcon-accel.c Mon Aug 5 21:17:22 2002
@@ -70,9 +70,47 @@
image.width = fontwidth(p);
image.height = fontheight(p);
image.depth = 1;
- image.data = p->fontdata + (c & charmask)*fontheight(p)*width;
+ if (!info->pixmap.addr) {
+ image.data = p->fontdata + (c & charmask)*fontheight(p)*width;
+ info->fbops->fb_imageblit(info, &image);
+ }
+ else {
+ unsigned int d_size, d_pitch, i, j;
+ unsigned int scan_align = (info->pixmap.scan_align) ? info->pixmap.scan_align - 1 : 0;
+ unsigned int buf_align = (info->pixmap.buf_align) ? info->pixmap.buf_align - 1 : 0;
+ char *d_addr, *s_addr;
+
+ d_pitch = (width + scan_align) & ~scan_align;
+ d_size = d_pitch * image.height;
+
+ if (d_size > info->pixmap.size) {
+ BUG();
+ return;
+ }
+
+ info->pixmap.offset = (info->pixmap.offset + buf_align) & ~buf_align;
- info->fbops->fb_imageblit(info, &image);
+ if (info->pixmap.offset + d_size > info->pixmap.size) {
+#if 0
+ /* Some form of hardware sync'ing may be necessary here */
+ if (info->fbops->fb_sync)
+ info->fbops->fb_sync(info);
+#endif
+ info->pixmap.offset = 0;
+ }
+ s_addr = p->fontdata + (c & charmask)*fontheight(p)*width;
+ image.data = (char *) (info->pixmap.addr + info->pixmap.offset);
+ d_addr = image.data;
+
+ for (i = image.height; i--; ) {
+ for (j = 0; j < width; j++)
+ d_addr[j] = *s_addr++;
+ d_addr += d_pitch;
+ }
+
+ info->fbops->fb_imageblit(info, &image);
+ info->pixmap.offset += d_size;
+ }
}
void fbcon_accel_putcs(struct vc_data *vc, struct display *p,
@@ -87,15 +125,61 @@
image.bg_color = attr_bgcol(p, *s);
image.dx = xx * fontwidth(p);
image.dy = yy * fontheight(p);
- image.width = fontwidth(p);
image.height = fontheight(p);
image.depth = 1;
+ if (!info->pixmap.addr) {
+ image.width = fontwidth(p);
+ while (count--) {
+ image.data = p->fontdata +
+ (scr_readw(s++) & charmask) * fontheight(p) * width;
+ info->fbops->fb_imageblit(info, &image);
+ image.dx += fontwidth(p);
+ }
+ }
+ else {
+ unsigned int d_pitch, d_size, i, j;
+ unsigned int scan_align = (info->pixmap.scan_align) ? info->pixmap.scan_align - 1 : 0;
+ unsigned int buf_align = (info->pixmap.buf_align) ? info->pixmap.buf_align - 1 : 0;
+ char *s_addr, *d_addr, *d_addr0;
+
+ d_pitch = (width * count) + scan_align;
+ d_pitch &= ~scan_align;
+ d_size = d_pitch * image.height;
+
+ if (d_size > info->pixmap.size) {
+ BUG();
+ return;
+ }
+
+ info->pixmap.offset = (info->pixmap.offset + buf_align) & ~buf_align;
+
+ if (info->pixmap.offset + d_size > info->pixmap.size) {
+#if 0
+ /* Some form of hardware sync'ing may be necessary here */
+ if (info->fbops->fb_sync)
+ info->fbops->fb_sync(info);
+#endif
+ info->pixmap.offset = 0;
+ }
+
+ image.width = fontwidth(p) * count;
+ image.data = (char *) (info->pixmap.addr + info->pixmap.offset);
+ d_addr = image.data;
+
+ while (count--) {
+ s_addr = p->fontdata +
+ (scr_readw(s++) & charmask) * fontheight(p) * width;
+ d_addr0 = d_addr;
+ for (i = image.height; i--; ) {
+ for (j = 0; j < width; j++)
+ d_addr0[j] = *s_addr++;
+ d_addr0 += d_pitch;
+ }
+ d_addr += width;
+ }
- while (count--) {
- image.data = p->fontdata +
- (scr_readw(s++) & charmask) * fontheight(p) * width;
info->fbops->fb_imageblit(info, &image);
- image.dx += fontwidth(p);
+ info->pixmap.offset += d_size;
}
}
diff -Naur linux-2.5.27/include/linux/fb.h linux/include/linux/fb.h
--- linux-2.5.27/include/linux/fb.h Mon Aug 5 21:16:16 2002
+++ linux/include/linux/fb.h Mon Aug 5 21:17:48 2002
@@ -291,6 +291,14 @@
char *data; /* Pointer to image data */
};
+struct fb_pixmap {
+ __u32 addr; /* buffer pointer (system or video), NULL if none */
+ __u32 offset; /* offset to buffer */
+ __u32 size; /* size of buffer */
+ __u32 buf_align; /* buffer start alignment */
+ __u32 scan_align; /* scanline alignment, should be <= buf_align */
+};
+
#ifdef __KERNEL__
#if 1 /* to go away in 2.5.0 */
@@ -359,6 +367,10 @@
int (*fb_mmap)(struct fb_info *info, struct file *file, struct vm_area_struct *vma);
/* switch to/from raster image mode */
int (*fb_rasterimg)(struct fb_info *info, int start);
+#if 0
+ /* wait for blit idle, optional */
+ void (*fb_sync)(struct fb_info *info);
+#endif
};
struct fb_info {
@@ -371,6 +383,7 @@
struct fb_fix_screeninfo fix; /* Current fix */
struct fb_monspecs monspecs; /* Current Monitor specs */
struct fb_cmap cmap; /* Current cmap */
+ struct fb_pixmap pixmap; /* Offscreen pixmap */
struct fb_ops *fbops;
char *screen_base; /* Virtual address */
struct display *disp; /* initial display variable */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Optimizing putcs()
[not found] <20020806054957.44715.qmail@web13004.mail.yahoo.com>
@ 2002-08-06 18:11 ` Antonino Daplas
2002-08-08 18:31 ` James Simmons
0 siblings, 1 reply; 6+ messages in thread
From: Antonino Daplas @ 2002-08-06 18:11 UTC (permalink / raw)
To: James Simmons; +Cc: fbdev
[-- Attachment #1: Type: text/plain, Size: 556 bytes --]
On Tue, 2002-08-06 at 13:49, James Simmons wrote:
> Ah. Thank you for solving this problem for me. I
> haven't had time to figure it out. Also left to be
> done is 24 bpp support as well as drawing the penguin.
>
I took a crack at adding support for bpp24 for the cfb_* drawing
functions. I tried to keep the original code as much as possible, so
the result may not be optimal. My test shows though that bpp24 should
be as fast as (maybe a tad slower than) bpp32.
As for drawing the logo, will the source be containing indices to the
palette?
Tony
[-- Attachment #2: bpp24.diff --]
[-- Type: text/x-patch, Size: 5232 bytes --]
diff -Naur linux-2.5.27/drivers/video/cfbcopyarea.c linux/drivers/video/cfbcopyarea.c
--- linux-2.5.27/drivers/video/cfbcopyarea.c Tue Aug 6 17:55:48 2002
+++ linux/drivers/video/cfbcopyarea.c Tue Aug 6 17:56:19 2002
@@ -83,7 +83,7 @@
lineincr = -linesize;
}
- if ((BITS_PER_LONG % p->var.bits_per_pixel) == 0) {
+ if ((BITS_PER_LONG % p->var.bits_per_pixel) == 0) {
int ppw = BITS_PER_LONG / p->var.bits_per_pixel;
int n = ((area->width * p->var.bits_per_pixel) >> 3);
@@ -103,7 +103,6 @@
n -= end_index;
}
n /= bpl;
-
if (n <= 0) {
if (start_mask) {
if (end_mask)
@@ -219,4 +218,32 @@
}
}
}
+ else {
+ int n = ((area->width * p->var.bits_per_pixel) >> 3);
+ int n16 = (n16 >> 4) << 4;
+ int n_fract = n - n16;
+ int rows;
+
+ if (area->dy < area->sy
+ || (area->dy == area->sy && area->dx < area->sx)) {
+ for (rows = height; rows--; ) {
+ if (n16)
+ fast_memmove(dst1, src1, n16);
+ if (n_fract)
+ fb_memmove(dst1+n16, src1+n16, n_fract);
+ dst1 += linesize;
+ src1 += linesize;
+ }
+ }
+ else {
+ for (rows = height; rows--; ) {
+ if (n16)
+ fast_memmove(dst1, src1, n16);
+ if (n_fract)
+ fb_memmove(dst1+n16, src1+n16, n_fract);
+ dst1 -= linesize;
+ src1 -= linesize;
+ }
+ }
+ }
}
diff -Naur linux-2.5.27/drivers/video/cfbfillrect.c linux/drivers/video/cfbfillrect.c
--- linux-2.5.27/drivers/video/cfbfillrect.c Tue Aug 6 17:55:54 2002
+++ linux/drivers/video/cfbfillrect.c Tue Aug 6 17:56:23 2002
@@ -57,7 +57,7 @@
else
fg = fgcolor = rect->color;
- for (i = 0; i < ppw - 1; i++) {
+ for (i = 0; i < ppw-1; i++) {
fg <<= p->var.bits_per_pixel;
fg |= fgcolor;
}
@@ -161,45 +161,31 @@
break;
}
} else {
- /* Odd modes like 24 or 80 bits per pixel */
- start_mask = fg >> (start_index * p->var.bits_per_pixel);
- end_mask = fg << (end_index * p->var.bits_per_pixel);
- /* start_mask =& PFILL24(x1,fg);
- end_mask_or = end_mask & PFILL24(x1+width-1,fg); */
-
- n = (rect->width - start_index - end_index) / ppw;
+ char *dst2;
+ int bytes = (p->var.bits_per_pixel + 7) >> 3;
+ n = rect->width;
+ fg = fgcolor;
switch (rect->rop) {
case ROP_COPY:
do {
- dst = (unsigned long *) dst1;
- if (start_mask)
- *dst |= start_mask;
- if ((start_index + rect->width) > ppw)
- dst++;
-
+ dst2 = dst1;
/* XXX: slow */
+ /* YYY: extremely slow */
for (i = 0; i < n; i++) {
- *dst++ = fg;
+ *(unsigned long *) dst2 = fg;
+ dst2 += bytes;
}
- if (end_mask)
- *dst |= end_mask;
dst1 += linesize;
} while (--height);
break;
case ROP_XOR:
do {
- dst = (unsigned long *) dst1;
- if (start_mask)
- *dst ^= start_mask;
- if ((start_mask + rect->width) > ppw)
- dst++;
-
+ dst2 = dst1;
for (i = 0; i < n; i++) {
- *dst++ ^= fg; /* PFILL24(fg,x1+i); */
+ *(unsigned long *) dst2 ^= fg; /* PFILL24(fg,x1+i); */
+ dst2 += bytes;
}
- if (end_mask)
- *dst ^= end_mask;
dst1 += linesize;
} while (--height);
break;
diff -Naur linux-2.5.27/drivers/video/cfbimgblt.c linux/drivers/video/cfbimgblt.c
--- linux-2.5.27/drivers/video/cfbimgblt.c Tue Aug 6 17:55:41 2002
+++ linux/drivers/video/cfbimgblt.c Tue Aug 6 17:56:14 2002
@@ -47,9 +47,8 @@
int x2, y2, n, i, j, k, l = 7;
unsigned long tmp = ~0 << (BITS_PER_LONG - p->var.bits_per_pixel);
unsigned long fgx, bgx, fgcolor, bgcolor, eorx;
- unsigned long end_mask;
- unsigned long *dst = NULL;
- u8 *dst1;
+ unsigned long end_mask, bytes = 4;
+ u8 *dst = NULL, *dst1;
u8 *src;
/*
@@ -64,7 +63,7 @@
y2 = y2 < p->var.yres_virtual ? y2 : p->var.yres_virtual;
image->width = x2 - image->dx;
image->height = y2 - image->dy;
-
+
dst1 = p->screen_base + image->dy * p->fix.line_length +
((image->dx * p->var.bits_per_pixel) >> 3);
@@ -88,13 +87,26 @@
fgx |= fgcolor;
bgx |= bgcolor;
}
- eorx = fgx ^ bgx;
+
+ /*
+ * BPP kludge, should be generalized/optimized later
+ */
+ if (BITS_PER_LONG % p->var.bits_per_pixel) {
+ bytes = (p->var.bits_per_pixel + 7) >> 3;
+ tmp = ~0UL >> (BITS_PER_LONG - p->var.bits_per_pixel);
+ fgx = fgcolor;
+ bgx = bgcolor;
+ ppw = 1;
+ }
+
n = ((image->width + 7) >> 3);
pad = (n << 3) - image->width;
n = image->width % ppw;
+ eorx = fgx ^ bgx;
+
for (i = 0; i < image->height; i++) {
- dst = (unsigned long *) dst1;
+ dst = dst1;
for (j = image->width/ppw; j > 0; j--) {
end_mask = 0;
@@ -105,8 +117,8 @@
l--;
if (l < 0) { l = 7; src++; }
}
- fb_writel((end_mask & eorx)^bgx, dst);
- dst++;
+ fb_writel((end_mask & eorx)^bgx, (unsigned long *) dst);
+ dst += bytes;
}
if (n) {
@@ -117,8 +129,8 @@
l--;
if (l < 0) { l = 7; src++; }
}
- fb_writel((end_mask & eorx)^bgx, dst);
- dst++;
+ fb_writel((end_mask & eorx)^bgx, (unsigned long *) dst);
+ dst += bytes;
}
l -= pad;
dst1 += p->fix.line_length;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Optimizing putcs()
2002-08-05 22:04 RFC: Optimizing putcs() Antonino Daplas
@ 2002-08-06 20:08 ` Geert Uytterhoeven
2002-08-07 0:17 ` Antonino Daplas
2002-08-07 5:25 ` Antonino Daplas
1 sibling, 1 reply; 6+ messages in thread
From: Geert Uytterhoeven @ 2002-08-06 20:08 UTC (permalink / raw)
To: Antonino Daplas; +Cc: fbdev
On 6 Aug 2002, Antonino Daplas wrote:
> With fbcon-accel and the new drawing functions in linux-2.5, console
> performance degraded compared to the linux-2.4 implementation. This is
> because putcs() has to to do 1 fb_imageblit() per character to be
> drawn.
Yes, this will be shown badly after I'll have ported amifb to the new
framework, since chip RAM accesses are very slow and we use bitplanes...
> This can be optimized by letting putcs() initially construct the row of
> text to be drawn into an offscreen buffer, then do a single
> fb_imageblit() in the end. Performance wil increase for several
> reasons:
Yes, this is very nice! I was thinking about passing an array of images to an
fb_imageblit_multiple() or so, but yours may be better.
> For drivers that uses cfb_imageblit or similar, a code such as the one
> below can be inserted during initialization:
>
> info->pixmap.addr = (unsigned long) kmalloc(BUFFER_SIZE, GFP_KERNEL);
> info->pixmap.size = BUFFER_SIZE;
> info->pixmap.offset = 0;
> info->pixmap.buf_align = 1;
> info->pixmap.scan_align = 1;
>
> Some benchmarks:
>
> time cat /usr/src/linux/MAINTAINERS (40K text file)
> mode: 1024x768@8bpp, y-panning disabled.
[...]
Just for reference, did you run this benchmark on 2.4.x as well?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Optimizing putcs()
2002-08-06 20:08 ` Geert Uytterhoeven
@ 2002-08-07 0:17 ` Antonino Daplas
0 siblings, 0 replies; 6+ messages in thread
From: Antonino Daplas @ 2002-08-07 0:17 UTC (permalink / raw)
To: fbdev
On Wed, 2002-08-07 at 04:08, Geert Uytterhoeven wrote:
>
> Just for reference, did you run this benchmark on 2.4.x as well?
>
> Gr{oetje,eeting}s,
>
Sort of. The functions in fbcon-cfb*.c are already very fast, because
fbcon and character drawing are tightly integrated together, and
fbcon_cfb8_putcs() is very, very efficient, processing 4 bits per
iteration, instead of 1. I'm getting numbers like this:
real 0m2.098s
user 0m0.000s
sys 0m2.070s
which was faster(!) than my hardware implementation of putcs, and 5x
faster than 2.5. Since I'm using an i810 with Video in System RAM,
direct framebuffer access does not carry much overhead. I just have to
beat fbcon-cfb8, so I thought of placing text data in offscreen graphics
memory to take full advantage of hardware blitting.
At high bit depths (32 bpp), 2.5 with an offscreen buffer is as fast as
2.4.
Tony
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Optimizing putcs()
2002-08-05 22:04 RFC: Optimizing putcs() Antonino Daplas
2002-08-06 20:08 ` Geert Uytterhoeven
@ 2002-08-07 5:25 ` Antonino Daplas
1 sibling, 0 replies; 6+ messages in thread
From: Antonino Daplas @ 2002-08-07 5:25 UTC (permalink / raw)
To: fbdev
[-- Attachment #1: Type: text/plain, Size: 751 bytes --]
One of the reason why 2.4 console performance is good especially at low
bit depths is its ability to process more than 1 pixel per iteration and
its usage of mask arrays.
I tried to generalize the above in cfbimgblt.c by incorporating the idea
in fbcon-cfb*.c. It's significantly faster but still not as fast as the
2.4 API.
time cat /usr/src/linux/MAINTAINERs (40K text file)
1024x768-8bpp, y-panning disabled
2.5 old (with offscreen buffers)
real 0m10.708s
user 0m0.001s
sys 0m10.707s
2.5 new
real 0m4.378s
user 0m0.002s
sys 0m4.375s
2.4
real 0m2.098s
user 0m0.000s
sys 0m2.070s
I've only tested the implementation at 8, 16, 24, and 32 bpp. 24bpp is
slightly slower than 32 bpp :(
Tony
[-- Attachment #2: cfbimgblt.c --]
[-- Type: text/x-c, Size: 5558 bytes --]
/*
* Generic BitBLT function for frame buffer with packed pixels of any depth.
*
* Copyright (C) June 1999 James Simmons
*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file COPYING in the main directory of this archive for
* more details.
*
* NOTES:
*
* This function copys a image from system memory to video memory. The
* image can be a bitmap where each 0 represents the background color and
* each 1 represents the foreground color. Great for font handling. It can
* also be a color image. This is determined by image_depth. The color image
* must be laid out exactly in the same format as the framebuffer. Yes I know
* their are cards with hardware that coverts images of various depths to the
* framebuffer depth. But not every card has this. All images must be rounded
* up to the nearest byte. For example a bitmap 12 bits wide must be two
* bytes width.
*
* FIXME
* The code for 24 bit is horrible. It copies byte by byte size instead of
* longs like the other sizes. Needs to be optimized.
*
* Tony:
* Incorporate mask tables similar to fbcon-cfb*.c in 2.4 API. This speeds
* up the code significantly.
*
* Code for depths not multiples of BITS_PER_LONG is still kludgy, which is
* still processed a bit at a time.
*
* Also need to add code to deal with cards endians that are different than
* the native cpu endians. I also need to deal with MSB position in the word.
*
*/
#include <linux/string.h>
#include <linux/fb.h>
#include <asm/types.h>
#include <video/fbcon.h>
#define DEBUG
#ifdef DEBUG
#define DPRINTK(fmt, args...) printk(KERN_DEBUG "%s: " fmt,__FUNCTION__,## args)
#else
#define DPRINTK(fmt, args...)
#endif
static u32 cfb_tab8[] = {
#if defined(__BIG_ENDIAN)
0x00000000,0x000000ff,0x0000ff00,0x0000ffff,
0x00ff0000,0x00ff00ff,0x00ffff00,0x00ffffff,
0xff000000,0xff0000ff,0xff00ff00,0xff00ffff,
0xffff0000,0xffff00ff,0xffffff00,0xffffffff
#elif defined(__LITTLE_ENDIAN)
0x00000000,0xff000000,0x00ff0000,0xffff0000,
0x0000ff00,0xff00ff00,0x00ffff00,0xffffff00,
0x000000ff,0xff0000ff,0x00ff00ff,0xffff00ff,
0x0000ffff,0xff00ffff,0x00ffffff,0xffffffff
#else
#error FIXME: No endianness??
#endif
};
static u32 cfb_tab16[] = {
#if defined(__BIG_ENDIAN)
0x00000000, 0x0000ffff, 0xffff0000, 0xffffffff
#elif defined(__LITTLE_ENDIAN)
0x00000000, 0xffff0000, 0x0000ffff, 0xffffffff
#else
#error FIXME: No endianness??
#endif
};
static u32 cfb_tab32[] = {
0x00000000, 0xffffffff
};
static u32 cfb_tabdef[2];
void cfb_imageblit(struct fb_info *p, struct fb_image *image)
{
int pad, ppw;
int x2, y2, n, i, j, l = 8;
unsigned long tmp = ~0 << (BITS_PER_LONG - p->var.bits_per_pixel);
unsigned long fgx, bgx, fgcolor, bgcolor, eorx;
unsigned long end_mask, bit_mask, bytes = 4;
u32 *tab = NULL;
u8 *dst = NULL, *dst1;
u8 *src;
/*
* We could use hardware clipping but on many cards you get around hardware
* clipping by writing to framebuffer directly like we are doing here.
*/
x2 = image->dx + image->width;
y2 = image->dy + image->height;
image->dx = image->dx > 0 ? image->dx : 0;
image->dy = image->dy > 0 ? image->dy : 0;
x2 = x2 < p->var.xres_virtual ? x2 : p->var.xres_virtual;
y2 = y2 < p->var.yres_virtual ? y2 : p->var.yres_virtual;
image->width = x2 - image->dx;
image->height = y2 - image->dy;
dst1 = p->screen_base + image->dy * p->fix.line_length +
((image->dx * p->var.bits_per_pixel) >> 3);
ppw = BITS_PER_LONG/p->var.bits_per_pixel;
src = image->data;
switch (ppw) {
case 4:
tab = cfb_tab8;
break;
case 2:
tab = cfb_tab16;
break;
case 1:
tab = cfb_tab32;
break;
}
if (image->depth == 1) {
if (p->fix.visual == FB_VISUAL_TRUECOLOR) {
fgx = fgcolor = ((u32 *)(p->pseudo_palette))[image->fg_color];
bgx = bgcolor = ((u32 *)(p->pseudo_palette))[image->bg_color];
} else {
fgx = fgcolor = image->fg_color;
bgx = bgcolor = image->bg_color;
}
for (i = 0; i < ppw-1; i++) {
fgx <<= p->var.bits_per_pixel;
bgx <<= p->var.bits_per_pixel;
fgx |= fgcolor;
bgx |= bgcolor;
}
/*
* BPP kludge, should be generalized/optimized later
*/
if (BITS_PER_LONG % p->var.bits_per_pixel) {
bytes = (p->var.bits_per_pixel + 7) >> 3;
tmp = ~0UL >> (BITS_PER_LONG - p->var.bits_per_pixel);
tab = cfb_tabdef;
tab[0] = 0;
tab[1] = tmp;
fgx = fgcolor;
bgx = bgcolor;
ppw = 1;
}
bit_mask = (1 << ppw) - 1;
n = ((image->width + 7) >> 3);
pad = (n << 3) - image->width;
n = image->width % ppw;
eorx = fgx ^ bgx;
for (i = 0; i < image->height; i++) {
dst = dst1;
for (j = image->width/ppw; j--; ) {
l -= ppw;
end_mask = tab[(*src >> l) & bit_mask];
fb_writel((end_mask & eorx)^bgx, (unsigned long *) dst);
dst += bytes;
if (!l) { l = 8; src++; }
}
if (n) {
end_mask = 0;
for (j = n; j > 0; j--) {
l--;
if (test_bit(l, (unsigned long *) src))
end_mask |= (tmp >> (p->var.bits_per_pixel*(j-1)));
if (!l) { l = 8; src++; }
}
fb_writel((end_mask & eorx)^bgx, (unsigned long *) dst);
dst += bytes;
}
l -= pad;
dst1 += p->fix.line_length;
}
} else {
/* Draw the penguin */
n = ((image->width * p->var.bits_per_pixel) >> 3);
end_mask = 0;
}
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RFC: Optimizing putcs()
2002-08-06 18:11 ` Antonino Daplas
@ 2002-08-08 18:31 ` James Simmons
0 siblings, 0 replies; 6+ messages in thread
From: James Simmons @ 2002-08-08 18:31 UTC (permalink / raw)
To: Antonino Daplas; +Cc: fbdev, jsimmons
> I took a crack at adding support for bpp24 for the
> cfb_* drawing
> functions. I tried to keep the original code as
> much as possible, so
> the result may not be optimal. My test shows though
> that bpp24 should
> be as fast as (maybe a tad slower than) bpp32.
Thanks.
> As for drawing the logo, will the source be
> containing indices to the
> palette?
I believe so. I haven't thought much about it.
__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-08-08 18:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-05 22:04 RFC: Optimizing putcs() Antonino Daplas
2002-08-06 20:08 ` Geert Uytterhoeven
2002-08-07 0:17 ` Antonino Daplas
2002-08-07 5:25 ` Antonino Daplas
[not found] <20020806054957.44715.qmail@web13004.mail.yahoo.com>
2002-08-06 18:11 ` Antonino Daplas
2002-08-08 18:31 ` James Simmons
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).