PDA

View Full Version : Execution speed of ServoPod


gtiani
09-16-08, 03:27 PM
I'm debating if it is worth writing some functions in assembly code rather than FORTH to make them run faster. But, before I download an assembler and spend a lot of time, could you give me a very rough guess about how much faster a routine will run if written in assembly rather than FORTH?

The FORTH word contains a bunch of DO LOOPs, IF THENs, compares and a lot of memory moves. I'm just looking for a wild guess. Could I expect to see a 10 fold or possibly 50 fold increase or would a factor of 2 be more likely?

When I run a simple routine like shown below, it looks like 600,000 FORTH operations per second. Since the DSP56F807 is running at around 40 million instructions per second, it looks like assembly code would run 66 times faster. Is that really possible?
Thank you,
Gary
: QQ
20000 0 DO
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
XXX @ DROP
LOOP
;

RMDumse
09-16-08, 08:51 PM
Your original estimate of 66x is a fair guess. If you'd asked me flat footed, I'd have said 40. But it's a bit more complicated as you might have suspected.

The Forth virtual machine needs to be maintained, and, often you are running pure machine coded primatives. In the case of many words, such as DUP DROP and @ and +, you're going straight from the name into the machine code. Words written in machine code are called primatives. But you can also get some ringers in there, where a word you don't think about is actually a high level word, so you make multiple passes through the virtual machine even though you think you're only executing a single word.

For instance, in your example loop, you are putting a number on the stack. What gets compiled is the word LIT and then your number. So LIT has to execute, change the data stack pointer, move the data from the program stack to the indexed stack, etc., and then also go through a version of NEXT that bumps the instruction pointer from the next location past the constant value.

Likewise, LOOP doesn't compile LOOP, but a primative (LOOP). It has to diddle with the return stack to maintain the indexing there, so there's more indirection you might not be thinking about, which can add to the total cycle count.

Now, by the way we came up with our Forth at first we also inherited some inefficiency. We we started, the only development tool built for the DSP series was the Code Warrior C package. Motorola (later Freescale) even made it terribly difficult to find out what the machine code op codes were for the DSPs. So we inherited about a 5x inefficiency from having to write Forth in C.

We've done tests after we got our assembler done, and have found we can probably pick up a 5x general improvement in our language speed if we write in assembler vs. their C. We are very close to done with that conversion by the way.

So originally we conservatively said about 200K high level instructions per second. With our assembler version, we expect 1MIP of high level instructions.

But yes, for tight loops, you could really pick up speed writing a few words in assembler.

It is very typical in Forth developments to come back and write 10 to 20% of words in assembler if you want to wring out most the speed you can from a system, but it seldom gains much redoing the remaining 80%, since most of the written code is seldom executed.

Remember these are generalities and rules of thumb I've been giving. "Your mileage may vary."

gtiani
09-17-08, 04:21 PM
Thank you very much. That was a very detailed response.
I hope the ServoPod gets updated to your faster code sometime.
For now, I will try writing the most time critical function in assembly code and see how it runs.
Sincerely,
Gary