1. What impact does running in protected mode have on my program?
There are several reasons why your program can run slower in protected mode:
To reduce the number of mode switches in your program, look for the places in your code where you signal the same interrupt or DOS function call repeatedly. Try to pass larger amounts of information for each call, in order to reduce the number of calls, therefore the number of switches to real mode. For example, if your program sends output to the screen a character at a time, you can save mode switches by sending output a line at a time. Some older C compilers have their printf function make a DOS call for each character of output, leading to a noticible delay in filling the screen with text. One of several ways to reduce the switching overhead, when you use one of these compilers, would be to use sprintf to create an entire line (or screen) of formatted output, and then call fputs to do the output (eliminating all but one switch). The Tenberry kernel does some optimization to reduce the number of mode switches. For example, if you have a tight loop that continually calls DOS function 2Ch to request the time, the kernel doesn't send each call to real mode. Instead, it tests to see if the timer has ticked since the program last called DOS function 2Ch. If not, it returns the same date and time that DOS returned from the last call.
Some important instructions run more slowly in protected mode than
in real mode. In particluar, any instruction that loads a segment
register has to access the descriptor table. These instructions are:
LDS, LES, MOV segreg, INT,
and the intersegment CALL,
JMP,
and RET
instructions. Use a compiler that makes
the best use of these instructions and limit the use of these instructions
in your own code.
Using the BIOS for screen output can be slow because of mode switches. Instead, write directly to video RAM. If your program positions the cursor for each character output to the screen, consider changing your code to reposition the cursor only when necessary - that is, when waiting for mouse our keyboard input. This can make sluggish displays snappy again.
The Tenberry kernel transfers I/O buffers to and from extended memory through the transfer buffer. If an I/O buffer is in low memory, the kernel does not have to copy the data to or from extended memory. But since even the slowest AT-compatible machines can transfer data to and from extended memory at over 3MB per second, you would have to read and write massive amounts of data before you'd notice a difference in performance. Modern machines can typically read or write 10 to 20 MB per second. The more important performance benefit of using low memory I/O buffers is that the kernel does not have to separate data into blocks that fit into the transfer buffer. If the transfer buffer is 8KB, and you write 40KB from extended memory to disk, the kernel performs the operation as 8KB writes. However, if the 40KB is in low memory, the kernel does not need to copy to the transfer buffer, and only one write occurs.
If you have an interrupt-driven application (e.g., asynchronous communications with an interrupt per character), handle interrupts in both protected and real modes, since the switch time could well be as long as the time between interrupts.
Unfortunately, the answer is: it depends!
The PPro is quite a bit better at running 32-bit code than is the Pentium -- perhaps as much as 30-40% faster.
The Pentium is faster at running 16-bit code than is the PPro -- maybe 10-25%.
There are a few instructions for which the PPro is quite a bit slower than the Pentium -- particularly segment overrides (ES:, etc.) These will rarely, if ever occur in 32-bit code, but can be frequent in 16-bit code.
Optimizing specifically for the Pentium can get results closer to the PPro. General optimization or optimization for 486 or 386 will favor the PPro.
So, while running a DOS/4GW application: