PDA

View Full Version : Code optimization.


PJE
01-02-05, 10:57 AM
Hi Pete,

Just cross-referencing the source with the asm file and noticed that array access with a constant offset is very inefficient:
word U60in 16
..
if (U60in[5] <> $03)
PktLen = 6
else
PktLen = 4
endif

Produces:

MOVE #U60in,X0
LEA (SP)+
MOVE X0,X:(SP)
MOVE #5,X0
LEA (SP)+
MOVE X0,X:(SP)
MOVE X:(SP)-,N
MOVE X:(SP)-,R2
NOP
MOVE X:(R2+N),X0
CMP #$03,X0
JEQ L36
MOVE #6,X:PktLen
JMP L37
L36:
MOVE #4,X:PktLen
L37:

Changing that to:
word U60in 1
word U60in_1 1
..
word U60in_15 1
..
if (U60in_5 <> $03)
PktLen = 6
else
PktLen = 4
endif

Produces:

MOVE X:U60in_5,X0
CMP #$03,X0
JEQ L36
MOVE #6,X:PktLen
JMP L37
L36:
MOVE #4,X:PktLen
L37:

Which is much simpler...

Can the assembler do in line math on the addresses?

MOVE X:(U60in+5),X0
CMP #$03,X0
JEQ L36
MOVE #6,X:PktLen
JMP L37
L36:
MOVE #4,X:PktLen
L37:

Or generate the U60in_5 on the fly as required.

StatiC generates very nice clean and simple source code which is easy to work with, and I'd prefer not to have to add too many work arounds for code optimization.

Anyway, Happy New Year and keep up the good work.

PJE

petegray
01-02-05, 12:40 PM
Hi PJE,

Good suggestion. I'm re-working the instruction optimizer as part of the next release, and I'll be sure to address this.

Thanks,
-Pete.

PJE
01-02-05, 04:01 PM
Hi Pete,

I've been looking through the asm file again - nothing better to do on a wet Sunday afternoon...

array[index]=var

seems to generate too much code if index and is a simple variable and not expression. It generates a variable on the stack and then pops it off to set N rather than simply MOVE X:index,N.

var=array[index]

Generates even more stack maniplulation as 'array' is also stored to a temp variable before being popped out for R2.

Will you be adding simple

MOVE X:array,R2
MOVE X:index,N
MOVE X:var,X0
NOP
MOVE X0,X:(R2+N)

and

MOVE X:array,R2
MOVE X:index,N
NOP
MOVE X:(R2+N),X0
MOVE X0,var

special cases for these simple arrays?

Regards,
PJE

petegray
01-02-05, 04:47 PM
Hi PJE,

You're right. The original optimizer had a very tough time identifying optimizable code due to the primary/secondary/stack mechanism used by the compiler.

So, I re-wrote the compiler to use a "register pool" allocation scheme, and that's why I'm currently re-working the optimizer. The new compiler has a ton of other enhancements too. The code you're seeing right now doesn't look anything like the optimized code of the next release.

But on the subject at hand, it's not quite as simple as it seems - because it depends upon the variables being used. Global variables are easy, but parameters and local variables are referenced via a positive or negative offset to the frame pointer.

In any case, thanks for the feedback. If you have any code you'd like me to run through the new optimizer, feel free to send it to me and I'll let you know the results.

Thanks,
-Pete.

PJE
01-03-05, 07:48 AM
Hi Pete,

Will the new optimizer have special cases for global variables? My code is using global variables exclusively, and it would be nice for the program to be 50x faster than it needs to be rather than just 25x ;)

Also, I assume R3 is used for passing parameters. If a function does not have any parameters and merely manipulates the global variables does it need to push/pull/etc R3 before entry/exit/etc?... It would be nice to have a lightweight function call for this kind of function.

Finally, how does the interrupt function work? I'd like to create a couple of routines to catch the serial input and put the data into FIFOs, but there isn't any documentation on its use. Do I need to modify the vec805.asm to add the interrupt function name in the interrupt table and then add code to enable the relevant interrupts?

I'll send you a sample of my code when I get to work so you can see what I'm trying to do.

Regards,

PJE

petegray
01-03-05, 02:58 PM
Hi PJE,

Yes, the new optimizer has special cases for global variables.

R3 is actually used as the frame pointer. Not to be confused with the stack pointer, which is dynamic, the frame pointer is static for a particular frame - but changes from frame to frame. To ensure frame pointer integrity, the current frame pointer is pushed prior to a call, and restored after the call. All routines set the frame pointer equal to the stack pointer before they do anything - then any space required for local variables is created simply my growing the stack. Prior to returning, a called routine shrinks the stack (removing the space used for local variables), and returns. Upon returning, the calling routine pops the previously pushed frame pointer, and shrinks the stack depending upon the parameters previously passed to the called routine.

You're 100% correct about how the interrupts work.

I'll check for your email this evening, when I'm back at my real desk.

Thanks,
-Pete.

petegray
01-09-05, 11:00 AM
Speaking of optimization, I compiled and assembled the same program using Small C, StatiC (demo) and StatiC (v1), with the following results...

Small C = 411 instructions
StatiC (demo) = 130 instructions
StatiC (v1) = 111 instructions

...which really blows Small C out of the water.

As a better comparison between StatiC versions, I compiled and assembled the source code you sent me, with the following results...

StatiC (demo) = 867 instructions
StatiC (v1) = 725 instructions

Regards,
-Pete.

PJE
01-10-05, 10:53 AM
Originally posted by petegray
As a better comparison between StatiC versions, I compiled and assembled the source code you sent me, with the following results...

StatiC (demo) = 867 instructions
StatiC (v1) = 725 instructions

Regards,
-Pete.
That program had some coding changes with regard to array access to reduce the code size, so the real difference would be 900+ down to 725.

Regards,

Peter

petegray
01-10-05, 11:17 AM
Even better !

Did a lot of optimizer testing this weekend (the optimization level can be varied with the new -o switch). Didn't see any problems. Must be time for a Pre-Release ...