byuu.org

This article was originally published on 2007-08-05, the day after bsnes v022 was released, which was the first version to boast 100% compatibility with commercially-released games that didn’t include special chips.

The State of Emulation, pt. II

Almost three years ago, I wrote an article on the state of emulation. I focused primarily on the SNES, as it is the system I am most familiar with. Unfortunately, that article has been lost to the sands of time. To summarize, I covered what I perceived as the primary problem with emulators: their focus for only worrying of software for said hardware ran inside the emulators properly. It didn’t matter if the emulation was hardware accurate, so it would seem. Most emulators took things even further, adding game-specific hacks to raise compatibility. Something I felt created more problems than it solved.

After writing this article, I decided to put my money where my mouth was. Realizing I was in the minority, I started work on my own SNES emulator to achieve my ideals – bsnes. I’ve since spent nearly three years learning how to program an emulator, and I’ve learned a lot during that time. Finally, as of yesterday, I have achieved my primary goal: I’ve managed to get 100% of all known games running with zero known bugs. And now, I feel a need to reflect on my achievements, their implications, and what I’ve learned. People’s opinions change over time, and after directly getting involved with emulation, mine certainly have. So now, I’m writing a sequel to my previous article.

What does 100% compatibility mean?

Right off the bat, I would like to apologize if this section appears arrogant or condescending to the hard work of others. I admit to being a bit too prideful of my claim here. I’m still on a bit of a high from having pulled it off.

True 100% compatibility is something that is both impossible to achieve, as well as impossible to verify.

First, it cannot be achieved. Emulation of actual hardware can get very, very close to perfect; but those last fine details can never be fully emulated. Eventually, you start to find cases where even the hardware itself returns different outputs for the exact same inputs. Randomization, essentially. But randomization from what? Physics. Things like hardware component tolerances and interference. Even essential components like the crystal clocks that drive the frequencies of processors can vary from console to console. Usually, the tolerances are so slight that they can be ignored, as any software that sensitive to change can be considered defective by design. Attempts to emulate these variances will fail for two reasons: first, how can you accurately emulate things affected by physics principles such as electromagnetism?; and second, randomization in emulation is inherently bad, and eliminates the possibilities for things such as movie playback, netplay, etc which rely very heavily on receiving the same output from the same input. But this issue isn’t a showstopper. You can simply average things like clock speeds, static values to set hardware uninitialized memory to, etc. The point here is that absolute hardware emulation is a misnomer.

Second, it cannot be verified. Even if a thousand people played through every game ever released, you can never verify that no bugs remain in emulation. What if a certain event in a certain game was simply never triggered through testing? What if there are bugs that exist, where there is simply no software that relies on that portion of emulation to work correctly? There is no litmus test to prove an emulator is perfect. Especially not for a system as complex as the SNES.

So then, what do I mean when I say bsnes has reached 100% compatibility? I mean that multiple people have tested every commercially released SNES game, looked for any bugs they could find, and I’ve fixed every last one found. It’s not the same thing as perfection, or 100% accuracy. It just means that I am not aware of any bugs. And unfortunately, this is the most verification we can possibly perform. So claiming 100% compatibility is the best any emulator can ever hope to achieve. This is the milestone I have achieved, and my emulator is the first SNES emulator to be able to claim this.

You will note that I conveniently exclude SNES cartridges that contain additional hardware inside of them, such as the SuperFX and SA-1 coprocessors. I can justify that, because I do not claim to have achieved a 100% compatible SNES game emulator, but a 100% compatible SNES hardware emulator. The truth is, virtually anything can be placed inside an SNES cartridge. Take a look at SNES copiers. Very amazing units with floppy drives, CD-ROM drives and even parallel interfaces to a PC. Would you consider an emulator incomplete because it did not emulate all known SNES copiers? Of course not. That’s not to say that it is not desirable to emulate these additional coprocessors; just that their emulation is separate to the emulation of the actual SNES hardware itself.

Finally, I don’t even claim to have true 100% compatibility. Merely, known compatibility. You see, I fully expect more bugs to be found with time. In fact, it’s practically inevitable that someone will discover a bug eventually. And I will do my best to fix said bugs at that time. I’ve merely claimed that I have achieved this compatibility rate at this specific point in time. Which is, as far as I know, a first for SNES emulation. Of course, it would be much easier to claim no known bugs if no software was tested on an emulator. The thing that stands out here is that all known software has been tested, and by two separate people not involved in the development of bsnes.

So, that’s what I mean by 100% compatibility. Certainly, other emulators for different systems have managed this in the past. The reason this is such a big deal is because of the sheer size of the SNES library. Unlike an arcade machine with 1 - 20 games running on its’ hardware, the SNES has a library of around ~3,000 unique commercially released games. As the number of games increase, the chances of fixing one bug and having that break other games also increases. With around 3,000 games, every emulator before now has fought a constant struggle of fixing game A, only to break game B. Fixing game B only to break games C and D. Fixing C and D, only to break both A and B. This accomplishment (hopefully) shows off the value of my approach to emulation. Again I offer my apologies for the shameless bragging above.

What does ‘no hacks’ mean?

A hack is a means of cheating with emulation. A way to artifically boost compatibility, while not actually emulating the hardware correctly. There are two types of hacks: global hacks and game-specific hacks.

A game-specific hack is the worst of all. Imagine a game that seems to crash in an emulator when the CPU runs at its’ normal speed. By analyzing the code, it is determined that the CPU’s timing is basically off from real hardware. A cheap workaround is to speed up the CPU, rather than to fix the timing problem itself. But this will break other software, so the solution is to make the emulator detect when a specific game is loaded, and adjust the CPU speed as necessary. This creates a myriad of problems. From inconsistent results with new software developed, to putting off the issue of actually emulating the hardware correctly, to having the hacks break as emulation is changed and improved upon later on. It is by far the most shameless way to raise claimed compatibility of an emulator.

A global hack is also very bad, but an unfortunate necessity. As I have explained above, 100% perfect emulation is impossible. Further, there are times when emulating something as hardware does becomes too complex to be worhwhile, even for someone as obsessed with accuracy as myself. I’m talking about minute edge cases that would require months, if not years, of research, to emulate a quirk that no software would ever attempt to make use of. There are also times when proper emulation would be so hardware demanding as to make an emulator completely useless for actually running software with. How fun would an emulator be if you could only run it at 1-2fps, right? So, eventually tradeoffs must be made. But global hacks can be abused, too. They can be used as game-specific hacks when only one game attempts to use a feature that is not properly emulated. The only advantage to using a global hack here is that it does not result in different output for the same input, depending on the video game in question loaded. This is perhaps the only redeeming quality of global hacks.

Unfortunately, while bsnes contains no game-specific hacks, it does contain global hacks as a matter of necessity. The most important global hack is the video renderer. A real SNES renders each pixel in real time, whereas bsnes, along with every other SNES emulator ever released, renders an entire scanline at a time. The reason is because while a scanline renderer runs at a mere ~15.75khz, a clock-based renderer would run at ~10.75mhz. That may not seem like a lot, but when you are writing an emulator that claims absolutely flawless interprocess communication, that means synchronizing two separate processors over ten million times a second. And even with my cooperative multithreading library, ten million context switches back and forth result in an astounding overhead of over 1200ms per emulated second on a mid-range Pentium IV processor. Not accounting for any actual emulation, we have already limited the maximum speed to less than full speed. I was able to use cooperative threading with the ~3.58mhz S-CPU and ~1.024mhz S-SMP very effectively as it was possible to run these processors out of sync, minimizing context switches to around ~20,000 per emulated second. Unfortunately, I have yet to come up with a model where I can run the S-CPU and S-PPU out of order. They are so intertwined as the S-CPU constantly monitors the S-PPU’s video render position to trigger IRQs, that it is extremely difficult to run the two out of order. I am still working on it, and I’m sure there’s a way to do it … but the solution has eluded me for the past few years now. Therefore, I was forced to utilize a scanline-based renderer. To allow all software to run with this, I had to implement several global hacks.

ppu.hack.render_scanline_position

The first hack is to work around games that write to S-PPU registers during active display. Many games will miss the horizontal blanking period and write to the registers too late, as part of the scanline is already rendered. Since bsnes et al render the entire scanline at once, if the register is not set to what the game expects, the entire scanline can potentially be rendered incorrectly, whereas the result on real hardware can be invisible because only the scanline up to that point in time would be rendered incorrectly, and that portion may have been transparent or otherwise not mattered. A good example of this phenomenon is with flickering scanlines. Ever noticed how many emulators have problems with Super Metroid’s status display in-game having a flickering line at the top? How about in Dai Kaijuu Monogatari II’s battles? Yeah, too obscure. Okay, how about in Super Mario Kart? Ever notice a flickering line in various emulators between the two split screens? This is the cause. But we have a global hack for this: we don’t actually render the scanline at the start or at the finish of the scanline. We emulate in the middle, and we hand-tailor the exact cycle position in an attempt to get as many games working as possible. But we avoid the real problem: our PPU renderers do not have enough precision, or accuracy, to properly emulate the software. bsnes’ magic value for this is to render scanlines at cycle position 512. Move that value two cycles to the left and Battle Blaze breaks. Move it four to the right, and another game will break.

ppu.hack.obj_cache

The second hack is with the OAM (sprite) register tiledata register. Many games attempt to draw an entire background from sprites, but there is not enough room for this. The trick is to modify the OAM tiledata offset in the middle of the display to point to another section of video RAM. Unfortunately, many games write to this register, and at different times. But there is no magic value for where to cache this value. To emulate all games that rely on this behavior would require, in the best case, twice the precision of a scanline renderer. Trying to create a hybrid scanline/cycle PPU renderer could be an interesting approach to solving this problem, but I have no interest in taking this approach myself. It’s either pure cycle renderer or pure scanline renderer. So, what did I do to help with bsnes’ scanline renderer? Another global hack, this one determines whether or not to cache the OAM tiledata offset one scanline in advance. It creates a see-saw effect. Enabled will cause a miss of one scanline of tiledata in some games, and disabled will cause a miss in other games. A scanline renderer simply lacks the precision necessary to work in all cases: it cannot be done.

ppu.hack.oam_address_invalidation and ppu.hack.cgram_address_invalidation

The third hack involves writing to OAM (sprite) or CGRAM (palette) during active display, or while the screen is rendering. Technically, the PPU itself attempts to read back this data to render the screen. It was forbidden to write to this data while the screen was drawing, yet that didn’t stop some developers from doing it anyway. Unfortunately, exactly where the writes go during active display is unknown. We have attempted to determine this in the past and failed. Why? We were attempting to determine very low level information, based on a lot of internal state, when all we had were scanline-based PPU renderers. To give a bad analogy, it was like trying to determine how capacitors work in a fuse box by poking at the metal box from the outside with a stick. It’s possible to determine where the writes go, but we need to understand more of the higher level concepts before we start mucking around so deeply into the internals of the PPU. That is, we need a cycle-based PPU renderer. Something nobody as yet has attempted. So then, how does bsnes handle this case? Yet another global hack. But this one is the most insidious of them all. As we do not know how to determine where the writes should go, I have mapped the writes to a specific, static memory address. This is not at all hardware accurate, but is much more accurate than the approach taken by any other emulator thus far: to simply allow the writes to go where the programmer expects. Doing that would cause people developing software under emulation to assume it was possible to write to OAM and CGRAM directly during active display with no consequences. By blocking this, even if the writes go to the wrong address, they are at least not going to the addresses a programmer would expect, and thusly they will expose the problem in the software, and the programmer can take appropriate action. Now, the thing that is so bad with bsnes is the addresses chosen to write to. Uniracers is the only commercial game known to write to OAM during active display. Every time, it expects the write to go to offset 0x0218. We do not know why the writes always go to this address. It could be due to other sprites already onscreen, the time when the writes occur, the way the PPU registers are setup, or all of these reasons and more. I very strongly doubt the Uniracers developers knew the reason, either. I believe the developers observed where the writes were going, and got it slipped past the quality assurance departments. Now, bsnes had two options: to map writes to OAM during active display to an arbitrary address, say, 0x0000, and leave Uniracers broken, or map them to 0x0218, and fix Uniracers. Neither would be more accurate. But you can see the moral dillema. This is essentially a game-specific hack disgused as a globak hack, something I talked about before. It was a very tough decision for me to set the write address to 0x0218, so I asked everyone who has supported me over the years what they thought I should do, and the decision was unanimous: I should use 0x0218, as it’s the best knowledge we have, and the only known example of this behavior on real hardware anyway. But make no mistake, I do consider it to be a hack. But it really isn’t game-specific. The same address is used regardless of the game loaded.

What can be done to eliminate these hacks?

The only thing I can do to remove the above hacks is to emulate the PPU with more precision. To implement a cycle-based renderer. As nobody else shows any interest in this, I will pretty much be on my own. Worse yet, I’ll potentially kill bsnes by attempting this. If I fail, and I’m not certain I can succeed, I’ll be left with a completely broken emulator. If I succeed, speed will be reduced so drastically that there will exist no computer in the world capable of running bsnes at full speed. So, no matter how you look at it, proceeding means the death of bsnes as a useful alternative to existing emulators. It will become nothing more than a reference to SNES hardware internals.

But I look back, and I realize I’ve achieved 100% compatibility already. What more can I really hope to improve? It’s no big surprise that the progress of bsnes has stagnated over the past year or so. With the S-CPU, S-SMP and S-DSP virtually flawless and already emulated to bus-level precision, the S-PPU is the only chip remaining. But it’s also the worst of them all. There’s a reason I saved it for last. So, my options are to proceed and continue to innovate, at the cost of destroying bsnes’ usefulness to gamers; or to continue to stagnate while other emulators continue to attempt to catch up to the accuracy of bsnes, all while optimizing much more than I have. In other words, to sit idly by watching as bsnes is slowly made irrelevent. The former stays true to my roots of pursuing accuracy at all costs. I’ll admit that I was distracted by the unexpected event of actual people actually using my software. Before bsnes, xkas was my most popular software, with a userbase of maybe ten people. I would feel really bad about ruining bsnes for everyone who has come to enjoy using it, by selfishly pursuing my own obsessive goals on accuracy.

At the moment, I’m hoping to split bsnes to allow both a scanline and a cycle based PPU renderer. The former for playing games, the latter for reference. Unfortunately, I don’t know if I can pull this off at all, and even if I do, I don’t know how long I will be able to maintain the two forks in unison. I believe that eventually, the two code paths will differ so greatly so as to become two separate emulators. And I absolutely do not have the time to maintain two emulators as a hobby. But I’m going to try the best I can.

What have I learned over the years?

Back in 2004, I believed the issue of emulation accuracy to be black and white. A little speed lost for more accuracy? Go for the accuracy! To hell with people and their continued use of 25MHz 486 DX2 processors! Unfortunately, as I discovered, it’s really not that simple. As stated twice now, obtaining 100% perfect emulation is impossible. But yes, you can get damn close. But speed and accuracy are direct tradeoffs. Some will say that they are not related. That you can have both speed and accuracy. Sure, optimizations can take you a long way, but I ultimately strongly disagree. They are in direct conflict with each other. You trade one for the other. But the tradeoff is not linear. It is much more of a half bell curve. That is, at one extreme, you can have a vast amount of speed, but with no compatibility at all. UltraHLE was a great example of this, with the only official release having a compatibility of only two games, and even then with countless game-specific hacks. But it was damn fast. But you give up just a little speed, and all of a sudden your compatibility grows by a lot. But you notice, the more speed you give up, the less accuracy you actually gain. Suddenly, the tables turn and you’re now giving up massive amounts of speed for very small compatibility gains. The effect is exponential, and the next thing you know, you’re struggling to get full speed on a top-of-the-line computer. But you still don’t have a fully accurate emulator! The truth is, you never can. I was able to achieve 100% compatibility in the end, but at the monstrous cost of requiring a Core 2 or better processor to achieve full speed. Something most people find completely unbelievable, as they look at emulation accuracy as being linear … as I did in 2004. And this isn’t even the worst of it. Now, with 100% compatibility, to take the next step up, I have to quite literally raise system requirements by an order of magnitude. And worst of all? It won’t fix a single game!! I can already run them all! So why the hell would I even consider a more accurate PPU? Because just as most of you undoubtedly love video games, I share that same love, but for the hardware itself. It is an intellectual pursuit for me, and I have to continue. I want to know everything there is to know about this hardware, and I want to follow that rabbit hole down as deeply as I possibly can, no matter the cost. Eventually, I’ll reach my limits, and won’t be able to proceed because, perfect emul … yeah, you know. But I’m not at my limit yet. I think I can take things further. And that’s why I want to proceed with a cycle-based renderer. I can handle your jeers about the ungodly system requirements – because I know the truth about emulation, and why my emulator is so slow. True, part of it may be my programming … but I challenge anyone out there to try and make an emulator with a compatibility rate equal to bsnes v0.022, with no game-specific hacks and with more than double its’ speed. I’ll even make it easy on you: all of my source code is available for reference. Until then, yeah. I can tolerate the armchair jabs at the speed of my software from people who still believe that you can determine the maximum processing power needed to emulate a system simply by multiplying the fastest processor in in it by a static value.

But I’m going off on a tangent now, aren’t I? Yes … what have I learned? That there is no one single ‘right’ approach to emulation. There are just a myriad of tradeoffs at all different points on the spectrum. ZSNES and SNES9x have aimed at maximizing speed, and by doing so, have enabled millions of people to enjoy and relive childhood memories of playing their favorite SNES games, and they deserve the utmost of respect for that, regardless of how ‘accurate’ the emulator itself is. Sadly, it seems my general outspokenness and attitude have encouraged some rather unfriendly things to be said about these emulators, and for that I am truly sorry. I realize my obsessive desire for accuracy puts me in the minority, and that most simply wish to replay their favorite games again through emulation. And really, there’s nothing wrong with that at all. Choice is the only really important thing. The lack of an accurate emulator at all would be quite saddening, indeed. But I’ve spent the last few years trying to fill that niche.

And yet now, I see my work has influenced the work of the entire community. Maybe not solely, but at least in part. After bsnes took the plunge at implementing the first cycle-accurate SNES CPU and SMP emulators, the community followed. Now, you can find these emulators in both SNEeSe and Super Sleuth. And I hear that both ZSNES and SNESGT are implementing cycle-accurate emulators as well. I’ll be honest that I’m a bit disappointed, as this will certainly raise all of their system requirements substantially. Any claim otherwise is simply dishonest. But if we look on the bright side, hardware is evolving. I’ll bet nobody expected an emulator like bsnes would ever be playable on any hardware back in 1996. And now today, the hardware exists and is relatively inexpensive. The accuracy gains in all of these emulators will come at little cost to today’s hardware, that can already run all of these emulators dozens of times faster than the real hardware would. So, in a way … I guess I should be happy. I’m seeing my dream slowly spread to the community as a whole. And who knows, maybe in ten years, hardware will exist that can run even a version of bsnes with a cycle-based PPU at full speed, and v0.022’s high system requirements will seem similarly laughable in comparison. Only time will tell, right?

Even if this cycle-accurate S-PPU emulator ends up destroying bsnes’ usefulness to end users due to completely unrealistic hardware requirements … hopefully the information I uncover can at least live on by means of aiding in the development of other emulators.