Machine Performance for Grand Prix Legends

Recently I've noticed quite a few folks in the GPL community upgrading their systems. I've had a few experiences and a lot of equipment that I think will shed some light on the values associated with GPL, so I'm writing them down. A bit of background. My Lotus had a 900 MHz AMD Thunderbird engine powering an nVidia GeForce GTS2 via an ASUS A7V transmission (uh, motherboard). I run GPL at 1280x1024 resolution using Direct3D and use the standard nVidia reference drivers (version 6.31) on Win98 SE. Sound is provided by a Creative Sound Blaster Live!, and the system has DVD and CD-RW. There are several hard disks, including an old 8 GB 5400 RPM IBM DeskStar, a 30 GB Quantum Fireball Plus and a brand new 46 GB 7200 RPM IBM DeskStar.

Video Card Has Little Impact

Last week my (home) office was the recipient of the prestigious California Rolling Blackout Award, and when the dust settled, it turned out that the GTS2 was dead. In desperation, I pulled its predecessor - a GeForce 256 SDR - out of cold storage. Curiously, I discovered that, for GPL, the old SDR performed pretty much identically to the GTS2! This is not what one would expect, since the SDR was the original GeForce implementation, and in all of the typically published gaming benchmarks, the SDR is the bottom of the GeForce pile.

In this case, I found that both boards would run 36 fps at Brands Hatch, except near the Brabham straight when there are a significant number of cars pitted, when the rates drop to 21 fps. Note that both boards drop from 36 fps to 21 fps at this location. It was clear that the video board wasn't the limiting factor in this configuration, since both achieved the same results. This caused me to wonder a lot about what contributes to GPL performance, so I ran some experiments.

GeForce Anti-Aliasing Causes Significant Degradation 2/19/01

After seeing some posts regarding anti-aliasing, I decided to try it out myself. My GeForce GTS2 is capable of full-screen antialiasing (FSAA) even when the software (GPL in this case) doesn't know what's going on. I first enabled FSAA on the Video Properties -> Settings -> Advanced -> GeForce GTS2 -> Direct3D Settings -> More D3D -> Anti-Aliasing tab. (Yes Virginia, that's seven levels deep!) There are quite a few options, mostly relating to the antialiasing size (1x2, 2x2, 3x3, 4x4, etc) and special algorithms. I found that the special algorithms, whatever they are, are quite incompatible with GPL. The display did weird things, including "swimming" and "smearing" when in special mode. I subsequently found a program called "GeForceAAset" that dives into your system tray and makes this much easier. 

Enabling AA basically means that the card has to operate at much higher resolution. A 1024x768 display with 2x2 anti-aliasing basically means that the card has to render at 2048x1536, and 4x4 means 4096x3172! However, based on the previous video results, I had guessed that this might not have any material impact on GPL, since nothing seems to faze the midrange to high end video boards. In fact, this is far from the case

I ran a full-field race at Bremgarten with 4x4 anti-aliasing set up and the video was pretty choppy. At the start, I read the minimum 21 fps as usual, but through the forests frame rate dropped from 36fps to 28 fps. The visual difference looked bigger than that, since it wasn't a smooth 28 fps - it was pretty jerky. I went back and ran the same full-field race with FSAA turned off, and other than the start, I had pretty well nowhere around the track where I had anything less than about 35 fps. I progressively backed off from 4x4 to 1x2 (said to be "good for car games") and found that I could basically never use FSAA at Bremgarten without significantly compromising the simulation.

The issues might have to do with texture map swapping--I have a 32MB board with 28MB dedicated to textures. But I use all of Bruce Johnson's (and others) hires wheel textures and various stuff like that, so I could well be using all of the texture map space up, especially at Bremgarten which is texture-intensive to begin with. The biggest problem was not the video speed. I've run 28 fps in other GPL configs before and had little trouble. The problem here is that it compromises responsiveness of the wheel, meaning that it is very hard to control.

At the end of the experiment, I'm turning FSAA off. I personally don't find the visible difference dramatic--I don't think it can improve my driving. From what I've read elsewhere on the 'Net (for example, at AnandTech), I doubt that users of the older GeForce boards will be able to use FSAA, either. I have heard people say that they can run GeForce Ultras with FSAA. They have 64MB memory, which might be enough. But for me, the visual difference isn't even remotely enough for me to spring for the $400+ video board.

CPU Power: Over-Clocking the CPU

Since it was pretty obvious that the video card wasn't the limiting factor, I concentrated on the CPU.  Fortunately, both the Asus A7V and the Thunderbird are very good at overclocking, so I whipped out my pencil and went to work. First I unlocked the clock multiplier on the CPU.  (Read about the details here.) This permits the motherboard to provide a clock signal to the processor internals, rather than using the internally provided clock. The A7V provides a wide variety of options to set the key CPU parameters: the internal clock, the bus clock, and the CPU voltage. Actually, the A7V provides too many options in one regard. There's a "jumper-free" mode, which permits selecting the options in the BIOS; there's also a set of DIP switches that perform the same function.  My experience matches those of others, which is that jumper-free mode is not very reliable, while the DIP switches are. I did all this with DIP switches.

I first cranked up the internal clock from 900 MHz to 1066 MHz by changing the multiplier from 9x to 10.5x. (Why it's reading 1066 MHz instead of 1050 MHz I'm not sure. I think the provided clock isn't precisely 100 MHz.) Now the same test at Brands drops to only 24 fps, which is about what one would expect. To make the system run without locking up, I had to increase the core voltage to 1.80v from the default 1.65v. Be very careful about increasing voltage. My understanding is that 1.85v is the limit, and that extended running over that level is fatal. I was glad I checked using the AsusProbe program. I had set the voltage to 1.85v using the jumpers, but AsusProbe reported that the voltage was actually 1.904v! I'll probably go back and try to see how far I can drop the voltage and stay stable, but 1.80v is definitely in the stable range.

I tried going higher than 1066 MHz using the multiplier. 1100 MHz was possible but the system locked up every time within about ten minutes. 1150 MHz didn't even boot. This is apparently the limit on my physical chip, although most people I know have been able to extract fairly similar results. Durons are even heartier in some ways. Most Durons, even those with 600 MHz ratings, will run at 800 MHz or so, and some few have been clocked over 1000 MHz. One of my other systems has a 600 MHz Duron that I've overclocked to 850 MHz using only the CPU multiplier.

Getting a Little More: Tweaking the System Clock

Before I quit on this machine, I tried using the other clock, namely the system multiplier. Unlike the CPU multiplier clock, the system bus multiplier affects more than the CPU. It affects memory, because memory transfers are timed by the system clock. It affects the video card, because the AGP is dependent on the system clock. It also affects PCI bus stability, because it too relies on the system clock through a divider.  PCI and AGP are usually the worst problems, because these busses are spec'ed at 33 MHz. Raising the system clock to (say) from 100 MHz to 120 MHz results in a PCI and AGP clock of 40 MHz, which is pretty far beyond the rated speeds of most components. 7 MHz may not seem like a lot these days until one considers that it's 21% over the spec...

The system clock default is nominally 100 MHz, and when I tried 105 MHz the system often failed to boot; even when it did, things broke shortly after Windows started. My memory is certified for PC-133, so that shouldn't have been a problem. I was able to back off to 103 MHz (104 MHz did not work), and the system seems to work. This results in the CPU running at 1098 MHz. At this speed, the system bottoms out at 24-25 fps in the Brands problem area. So far this has been OK. It's not at all clear to me why the CPU wouldn't run at 1100 MHz using the CPU multiplier but does run 2 MHz slower using the seemingly more difficult 103 MHz system multiplier, but I'm not asking too many questions at the moment...

Note that my new GTS2 video board is still broken. The old SDR video board was used for all of the over clocking tests, and it was able to keep up with the CPU in all cases. In this last test, the video board also got a mild overclock (3%), but this is well within the capabilities of most GeForce boards. In fact, many of the GeForce boards are shipped with an overclocking utility. (My SDR board is an Elsa Erazor-X, which has such a utility. My GTS2 is a Creative, which doesn't have one.)

1/30/01... Now I've got my GTS2 back, although of course now it's running in the overclocked system. As one would expect from the previous results, upgrading from GeForce SDR to GeForce GTS2 has no effect whatever on frame rates. I've still got 21 fps at the start at Bremgarten, and I still drop to 24 fps on the Brabham straight at Brands Hatch.

2/19/01... I fiddled with overclocking the GTS2, as well--the Detonator drivers make this really easy. As one would expect from the previous discussion (excepting FSAA), cranking up the GTS2 clock from 200MHz to 224MHz had no material impact on performance. (I also sped up the memory clock, from 350 MHz to 366MHz.)

Caution: CPU Voltage

There are a few things you need to pay close heed to in doing this overclocking. First, you may have to increase the CPU voltage, as I did. Don't exceed 1.85v by much for any period longer than a couple of minutes, or you'll be shortening the life of your CPU, quite possibly by a large margin. The maximum 1.85v is a pretty safe bet, though, because that is the specification for the 1200 MHz Thunderbird. It's OK for the lower clock samples too, because they're all made on the same line with the same process. After they're made, they're rated and "speed-binned" into different categories for sale. All of them, therefore, are designed to run indefinitely at 1.85v. And as you can see from the speed tests above, the lines of demarcation between what works and what doesn't are pretty sharp.

Caution: Thermal Measures

The other thing that demands attention to detail is cooling of the processor. All of the current CPUs are built in a semiconductor process called CMOS. This process has historically been used for "low" power consumption applications, because CMOS gates only consume power when changing state. Other processes, such as ECL, consume power regardless of state. But now, with clock rates at in the 600-1500 MHz range and climbing rapidly, even CMOS parts are consuming power on a pretty much constant basis. As a result, current processors are needing to shed 30-55 watts of power in the form of heat.

To ensure a reasonable life for the CPU, be absolutely sure that the heat sink and fan are designed for your processor. AMD processors (both Athlons and Durons) are generally somewhat more physically fragile than their Intel counterparts. I use an Alpha 6035, which is very effective. It keeps the T-bird at 47C or less, which is pretty good--especially at 1098 MHz. I've seen some other heatsink/fan combos that let the processors hit 55C. This isn't a good thing. The major drawback of the Alpha unit is that it makes a pretty high-pitched scream that is audible in the surrounding rooms and unfortunately sounds like neither a Ferrari nor a Lotus. Combined with the two other fans in the box and the power supply fan, the system howls even when GPL isn't running! But it stays cool and lets the CPU run FAST.

GPL and Hard Disks

GPL tends to eat disk space for lunch. In fact, it is consuming more disk space on my system than anything else except video capture data! I've got somewhere north of 1.4 GB worth of replays stored on hard disk, and at the moment this even exceeds digital photos by a small margin. The files, especially race replays, tend to be very large. I've got lots of them that are more than 25 MB, and virtually every single one that I save is at least 5-6 MB.

I figured that this would be precisely the sort of application that would benefit from fast hard disks, but after having tried all three of my disks with the same files, I'm pretty convinced that the extra money spent on faster disks is not interesting for GPL. Timing by hand, loading a 28 MB replay into Replay Analyzer 3.5 takes an uninterestingly different amount of time. Note that my 46 GB drive is connected to the latest ATA-100 interface, while the 30 GB and the 8 GB are on the primary and secondary ATA-33 interfaces. Even this doesn't make much of a difference. Amazingly, times to load replays are about the same. I'd say that the best GPL hard disk is the biggest, least expensive one that's available.

My Recommendations for a GPL Computer

After all of this, I have some pretty clear notions as to what GPL'ers should be driving to get the most out of those Ford and Weslake engines. The higher resolution video is enabled by using Direct3D or OpenGL, and their consequent dependence on the geometry engines and other acceleration provided by current-generation graphics processors. But once this is offloaded from the main CPU, it certainly doesn't seem to make much difference which of the boards you use. Given the current (January 2001) crop of boards, I'd pick the least expensive of the available ones, the GeForce 2 MX. Its capabilities exceed those of the SDR I'm currently running and it costs very little. This stands to reason, since GPL was designed with the TNT and TNT2 generation of graphics processors in mind. Remember that the recommended CPU for GPL was a 266 MHz Pentium-II (the "minimum" was 166 MHz!), and you can see that the standards should be far below what we have today. If you're buying for more than just GPL, Sharky Extreme has a good comparison of the high-end graphics cards.

Take the money that you save from buying the top-of-the-line graphics processors and invest it in the fastest processor you can afford. Rather than investing in the GeForce 2 Ultra at $450, I'd advise getting the $120 GeForce MX and putting another $330 into other components.

Right now the AMD processors are the fastest at running old code such as GPL. This is especially true on a clock-cycle-for-clock-cycle basis, because the AMD processors have faster floating point units. GPL does a lot of floating-point calculation to compute all of the physics equations that we value so greatly. (Tom's Hardware has many pages of discussion about this, including this page about floating point results.) If you're thinking solely of GPL, there's no question that the AMD Duron is the value champion and that the AMD Thunderbird is the performance leader. Intel's CPUs work fine, just slower and at a higher cost. And they may run other applications better (although not in my experience).

The Pentium-4 is a particularly poor choice for GPL because it uses a new internal architecture. This causes P4 to be dependent on recompiled and reoptimized code to go fast. New applications certainly will be tailored to go fast on the majority P4, but GPL was compiled and optimized several years ago, and I do not forsee it being re-issued to take advantage of the P4 architecture. On existing code, even the 1500 MHz P4 is slower than the 1200 MHz Thunderbird. Given that the Thunderbird goes for around $375, and the 1500 MHz P4 is running in excess of $1050 in most places, it's hardly what the GPL driver wants. (The T-bird itself usually runs around $250, but the P4 includes 128 MB memory so that Intel can be assured that you match it up with the right components, and I added the cost of 128 MB into the T-bird for comparison purposes. I checked out current pricing at Sharky Extreme.)

In the end, I think the best system is:

AMD Thunderbird 1200 MHz (overclocked as far as you can go)
an effective CPU fan
ASUS A7V-133 motherboard or other overclocker's equivalent (I've heard good things about the ABit offerings in this space)
GeForce 2 MX video card
128 MB PC-133 memory
Inexpensive, large hard disk (30 GB is pretty cheap now) - skip any premium for ATA66 or ATA100.

Confused? Disagree? email me.