So Part 1 and Part 2 laid the groundwork for rendering signed distance fields. Now I’ll present a general shader for doing so. This is a reimplementation that retraces the work originally presented by Qin, et al (basically, I missed that paper during my original research).

(more…)

## 3/29/2015

### Rendering Signed Distance Fields, Part 3

## 3/19/2015

### Rendering Signed Distance Fields, Part 2

So in Part 1 of this series, I talked about how to create a shader to render a distance field texture that only has uniform scale, rotation or translation applied — no shear or perspective transformations. So now one might expect (as I promised) that I’d talk about how to create a general shader for distance fields. After looking at what I wrote, however, I realized that a) I need to explain *why* you might want to use a signed distance field texture and b) to explain the rest I need to bring everyone up to speed on multivariable derivatives — namely gradients and the Jacobian. So I’ll cover those in this post, and finish up with the full answer next time. That said, I did fix the bug in Skia, so if you know all this or are impatient, you can skip ahead and look at the source code.

(more…)

## 3/15/2015

### Rendering Signed Distance Fields, Part 1

So at the end of my GDC 2015 presentation “How to Draw an Anti-Aliased Ellipse” I mentioned that you could extend the same techniques to signed distance fields (SDFs), in particular for text. However, some details might be helpful. Because SDFs encode distance directly, the approach is slightly different than for an implicit field like ellipses. I also made a grandiose statement that you should use the shaders in Skia because all the others don’t work properly. After prepping this post, I realized that was half right — the case that we see 99% of the time in Chrome is correct, but the case that you might see in a 3D world — e.g., a decal on a wall — is not. I’ll talk about a better way to handle that in Part Two.

So first, go look at the presentation linked above, just as a quick review. I’ll wait.

(more…)

## 10/21/2013

### New Thoughts on Random Numbers, Part Two

Sorry about the long delay on this. I originally intended to post this right away, but then realized I should do some further testing… and for whatever reason that never happened until last week. In any case, as I mentioned in my previous post, I recently added a new pseudo-random number generator to the Skia test suite. The last post was about verifying that a new PRNG was needed, this one is about what I used to replace it.

My initial thoughts about how to replace the Linear Congruential Generator was to jump right to the Mersenne Twister. However, the Mersenne Twister requires a buffer of 624 integers. While this is not terrible, it’s probably overkill for what we need, and the large buffer is not great for low-memory situations. In addition, part of my research showed that it’s possible to get large random periods with other, simpler methods.

To cut the story short, because we do a lot of testing on mobile platforms I wanted something low-memory and super simple. I ended up taking one of Marsaglia’s generators and modifying it slightly. This is the “mother-of-all” generator that I mention in the random number generator chapter:

k=30345*(k&65535)+(k>>16);

j=18000*(j&65535)+(j>>16);

return((k << 16)+j);

where k and j are unsigned 32-bit integers, initialized to non-zero values.

When I ran this through the Tuftests suite, it passed the GCD and Birthday Spacings tests, but failed the Gorilla test on half of the bits. So I figured, what the hey, I’ll just combine the high bits of k into the low bits of the result as well, or

k=30345*(k&65535)+(k>>16);

j=18000*(j&65535)+(j>>16);

return(((k << 16) | (k >> 16)) + j);

Surprisingly, this passed all three tests. In addition, I ran a test to try to see what its period is. It ran over a weekend and still didn’t hit 2^64 values. While a good part of that time was running the test on each new value, it does show that in a game or your average application, we probably don’t need these large periods because practically we’ll never generate that many values (the exception might be a game server, which is up for days). For Skia’s particular bench and unit tests, we might generate a few thousand random values — as long as our period is more than that we should be okay.

I’ve since run the TestU01 suite on this, and it passes SmallCrush, fails one of the Birthday Spacing tests (#11) on Crush, and two other Birthday Spacing tests (#13 and #15) on BigCrush. Other than that it passes. So while it’s not as random as other generators, at this point it doesn’t seem worth it to change it and rebaseline all of our results again, just for those failures.

If that doesn’t work for you, an alternative worth trying might be Marsaglia’s KISS generator, which takes four different random number generators and mixes their bits together. He claims (or *claimed*: Dr. Marsaglia unfortunately passed on in 2011) that the 2011 version has a period of greater than 10^40000000. Regardless of quality, the 2011 version requires a large buffer which is probably not practical for low-memory mobile platforms. The 1993 version, on the other hand, requires only six values, and has a period of 2^123. It does still fail one Crush and one BigCrush test, but is still probably good enough for games.

One important note: KISS is not cryptologically safe, as proven here, and I strongly doubt the generator I describe above is either. And if you do need a cryptologically secure generator, it’s best to find an approved implementation and use that, as even slight changes in speed due to a naive implementation can be enough to give an attacker enough information to break your code. But of course, if the NSA was involved, you might want to be wary…

In any case, to get results suitable for game logic, you really don’t need Mersenne Twister. Using one of these simple generators will probably do, and are leaps and bounds better than any Linear Congruential Generator.

## 4/8/2013

### New Thoughts on Random Numbers, Part One

I recently (December 2012) switched jobs from being an Engine Programmer at Insomniac Games, to being a Software Engineer at Google. At Google, I’m now working on Skia, which is the 2D graphics library that underlies Chrome, Firefox and parts of Android. One of the first things that came to my attention when I arrived was the pseudorandom number generator (PRNG) used to generate test cases for regressions — namely, that there was some concern that it wasn’t very random. In particular, the least significant bit flips back and forth, and possibly other lower-order bits weren’t very random. “Aha,” I thought, “Here’s something small and isolated that I can take on as a new hire. And it won’t take long; I can just draw on the random number chapter from our book.” Turns out that was partially right.

The first step was determining how good or bad the original PRNG was. Looking at it, it was clear that it was a linear congruential generator (LCG), which do have notoriously poor randomness in lower-order bits. They also fail the spectral test, i.e. if you generate points in space they tend to fall along visible planes. Not so good for generating tests for a graphics library. But how to verify this?

In the book I discuss some batteries of tests, such as Marsaglia’s Diehard and Brown’s DieHarder. In doing some research I found a couple more. First, there’s TestU01, which is a more extensive suite released in 2009. And Marsaglia and Tsang have created a smaller set of tests called tuftests, which they claim are nearly as rigorous as the large suites — i.e. if a PRNG passes tuftests, it’s highly likely to pass Diehard. For our case, we aren’t trying to find the ultimate PRNG, just something fast and reasonably random. So tuftests should suffice.

Tuftests consists of only three tests: the GCD test (as opposed to the GDC test, which probably involves giving a coherent talk after a night of parties), the Gorilla test, and the Birthday Spacings test. The GCD computes a large series of two pseudorandom variables, finds their greatest common denominators, and then compares the distribution of cycles necessary to compute the GCDs to the cycles needed for truly random variables. The Gorilla test is a variant of the Monkey test, where random strings of bits are generated using a single bit position of a random variable. This is done multiple times. The count of strings not generated should fall in a normal distribution. The Birthday Spacings test generates a set of birthdays in a “year”, and then determines the spacings between them, which should fall in a Poisson distribution. For all of these, the generated distribution is compared with the expected distribution, using chi square, Anderson-Darling or other method.

Testing the LCD used by Skia, it was clear that it failed the Gorilla test for all but the highest order bits (it may have also failed the Birthday test, I don’t recall now). So the belief that Skia’s PRNG needed improvement was verified. The next step was to find a decent replacement that doesn’t require a lot of overhead (important for testing on mobile devices). And I’ll cover my search for that next time.

## 5/6/2012

### ECGC 2012

I’ve spoken at the East Coast Game Conference in the past, mostly doing a version of our physics tutorial (cut down to 1 hour, if you can believe it). This year I wanted to do something different and talk about something new, in this case the work I did to get stereo up and running in Ratchet & Clank: All 4 One. Attendance was a little lower than I expected, but it was still a good audience. I have to admit perhaps the title of the talk might have scared other people off: Hybrid Stereoscopy in Ratchet & Clank: All 4 One. Maybe the description will be less forboding:

This talk will cover Insomniac’s hybrid 3D stereo technique, using a combination of reprojection for opaque objects and standard stereo rendering for transparent objects. This gives something close to the speed of reprojection with fewer artifacts. The talk will cover the process followed to create the technique, equations used to match the reprojected and standard data, and a list of mistakes to avoid for anyone interested in stereo.

I do cover some math that a lot of other stereo talks brush over — so if you’re interested in stereo please click on the link and check it out.

## 3/11/2012

### More Errata

So Jacob Shields, a student at Ohio State, added a set of errata to the comments on another post. To address them, I’ve pulled them out one by one:

I came across this on page 117 of the Second Edition: “There is a single solution only if the rank of A is equal to the minimum of the number of rows or columns of A.”

If I’m interpreting this correctly, it follows that a 2×3 coefficient matrix with a rank of 2 has a single solution. In reality, it has 1 free variable and thus infinitely many solutions. Shouldn’t there be a single solution only if the rank of A is equal to the number of unknowns, i.e. columns?

Yeah, I was probably only thinking of square matrices when I wrote that. I’ll have to think of another way to put that. Saying that it’s equal to the number of columns doesn’t work if you’re solving xA = b.

Also, on page 126 it says: “The first part of the determinant sum is u_[0,0]*U~_[0,0].” Below, it displays this equation: “det(U) = u_[0,0]*U~_[0,0]”

Assuming I’m interpreting this correctly, this must be wrong. The determinant of a real matrix should be a real number, not another matrix. In both cases I think it means to say u_[0,0]*det(U~_[0,0]) — right?

Yes, that’s correct, it’s a typo. The notation should follow from that used on page 123.

[…] on page 130 it says: “It turns out the columns of R are the eigenvectors of A, and the diagonal elements of D are the corresponding eigenvectors.” Shouldn’t it be that “the diagonal elements of D are the corresponding eigenvalues”?

Yup, another typo.

Could you please explain what is meant by the first matrix multiplication shown on page 138 in Section 4.2.3, Formal Representation [of affine transformations]?

For example, where did the matrix come from? Its entries are vectors, and the dimension of the product is (m+1)x1. What is the significance of the product, and how does it apply to what was just said?

I understand the matrix multiplication below it. In this case, the dimension of the product is 1×1 and, when expanded out, is equal to T(P). On initial reading it seems like the first matrix multiplication and the second one are supposed to be equivalent (before the second it says, “We can pull out the frame terms to get…”), but they’re clearly different, so how did you go from one to the other?

I’m not sure what the issue is; it’s similar to the linear transformation discussion on 104-106. The first n columns of the first matrix are the T(v_j) terms and the last is the T(O_b) term. The product of the two matrices is the transformed result T(P). The last sentence on 138 sums it all up.

On page 180, the description in the first whole paragraph about rotating an object to demonstrate gimbal lock says to “rotate the object clockwise 90 degrees around an axis pointing forward … [then] rotate the new top of the object away from you by 90 degrees … [then] rotate the object counterclockwise 90 degrees around an axis point up …. The result is the same as pitching the object downward 90 degrees (see Figure 5.4).”

However, if I’m interpreting this correctly, the rotation described appears to be a fixed angle x-y-z (-pi/2,-pi/2,pi/2) rotation. The result of this is the same as pitching the object _upward_ 90 degrees. Furthermore, Figure 5.4 demonstrates a different rotation: a fixed angle x-y-z (pi/2,pi/2,pi/2) rotation.

If the x-axis is pointing away from you and you rotate clockwise (from your perspective) by 90 degrees, that’s the same as rotating around the x-axis by positive 90 degrees, using the right-hand rule. Similarly, rotating the top of the object away from you is rotating around the y-axis by positive 90 degrees. The result should be the same as the diagram.

And this is why math uses symbols rather than trying to explain it using natural languages…

I might be missing something here, but on pages 181-182 it says this: “Near-parallel vectors may cause us some problems either because the dot product is near 0, or normalizing the cross product ends up dividing by a near-zero value.”

For near-parallel vectors, wouldn’t the dot product be near +/- 1, not 0?

I’m not sure what I meant there. Possibly that the dot product magnitude being near 1 (not 0, as you point out) can, through floating point error, lead to a result with magnitude slightly greater than 1, which is not valid input to acos.

On page 178, under the heading “5.3.3 Concatenation” it says this: “Applying (pi/2, pi/2, pi/2) twice doesn’t end up at the same orientation as (pi, pi, pi).”

However, for fixed angle z-y-x (the most-recently mentioned convention in the book prior to this section), applying (pi/2, pi/2, pi/2) twice does appear to end up at the same orientation as (pi, pi, pi). This isn’t the case for fixed angle x-y-z, though (and presumably other conventions). That had me confused for a while!

I believe that should be: Applying (pi/4,pi/4,pi/4) twice doesn’t end up at (pi/2,pi/2,pi/2). Your confusion is understandable. Though as (very) weak defense, when I multiply it out on the computer, floating point error does give me slightly different results (some of the “zero” terms end up being very small f.p. values).

On page 191, about halfway down the page, it mentions vectors w_0 and w_1, and then says that if we want to find the vector that lies halfway between them, we need to compute (w_1+w_2)/2. I’m assuming that it really means (w_0+w_1)/2?

As an aside, what is the significance of the division by 2? That is, using this method, what is the expected length of the resultant vector (without normalizing)? For instance, let w_0 = (1,0,0) and w_1 = (0,1,0). Then the halfway vector is (1+0,0+1,0+0)/2 = (0.5,0.5,0), whose length is 1/sqrt(2). Clearly this points in the correct direction, but its length is not the length of w_0, the length of w_1, or the average length of them, so what is it used for?

Correct, it should be w_0 and w_1. And the division by two is just to represent the average. As I point out, you don’t need to divide by 2 if you want a normalized result — you can skip that and do the normalize step. The length is never used.

Equation 5.7 on page 191 has a hat over the q (i.e. unit quaternion notation). However, I don’t believe that it’s actually a unit quaternion–specifically, I think its length is something like 2*sqrt(2*cos(theta) + 2).

Yes, that’s a typo. No hat.

On page 191, when talking about converting from a rotation matrix to a quaternion, it says that “if the trace of the matrix is less than zero, then this will not work.” I was wondering if you could explain why?

While trying to figure out, I found another way of performing this conversion which may or may not be equivalent to your own method:

http://www.euclideanspace.com/maths/geometry/rotations/conversions/matrixToQuaternion/index.htm

In that case, it involves taking sqrt(Trace+1), so it actually fails if Trace+1<0. What is the difference between that method and your method which makes the difference between Trace<0 and Trace+1<0? Is it related to the fact that if the rotation was in a 4×4 affine transformation matrix then you’d get +1 when you take the trace?

The trace+derived axis+normalization technique will fail as theta gets closer to pi, because r approaches a zero vector as theta approaches pi (see page 183). In this case, you’d end up with something close to (-1,0,0,0), which is the inverse of the identity quaternion and probably not the correct result. So if the trace is close to -1, you need to do more work.

So why check against 0? In an abstract math world with infinite precision, doing this with a trace between -1 and 0 should be fine. But with floating point your derived rotation axis vector effectively becomes noise as it gets smaller and smaller. It avoids a lot of floating point headaches just to do the zero check.

The website also does a zero check; it’s basically equivalent to what I have but the normalization is handled by the 0.5f/sqrt(trace+1.0f) term (modified slightly for w). Theirs is probably faster as it doesn’t require squaring and summing the components to do the normalization.

## 2/21/2011

### A Pair of Pertinent Podcasts

These have been in the queue for a while, but the BBC podcast In Our Time recently (as in the past six months) covered two more mathematical topics related to the book. The first is on http://www.bbc.co.uk/programmes/b00tt6b2, which is useful if you’re interested in having a deeper understanding of quaternions. The second is on random and pseudorandom numbers, which parallels one of the chapters in the second edition. I highly recommend the podcast in general, as well. Each show is about 45 minutes long, and features a panel of three experts on each topic moderated by Melvin Bragg.

## 8/6/2009

### Mike Acton Takes on C++ Math Libraries

A couple weeks ago, Mike Acton, head of Core Technologies at Insomniac (where I’m now working, btw) posted this reaming of some standard C++ math code. I think it’s an excellent read if you’re interested in deeply optimizing your math code, in particular for consoles, and in particular for the PS3.

Of course, posting this at all is a bit uncomfortable from where I’m sitting. I think it’s safe to say that if you were to compare our library code to what he is, er, analyzing, you’d find some significant similarities. However, our libraries were mainly designed to teach *mathematical* concepts rather than optimization concepts. As such, you end up with classes that represent mathematical entities, e.g. Planes, Spheres, Lines, etc. and simple code that emphasizes clarity. That said — as Mike’s slides aptly demonstrate — if you’re trying to squeeze every cycle out of the machine there are other ways to organize your data and code, and in fairness we didn’t address that much at all. In retrospect this was an unfortunate oversight. I’d say more emphasis on optimization would be a good addition to the Third Edition, but I’m not sure what my family would think of another book…

## 6/10/2008

### Indexed

As part of the book review (see previous article), they have images from the Indexed weblog.Â Â If you like graphical math jokes, you might find them amusing.

### Probability Resources

For those who are not aware, there is a new chapter in the book about Random Numbers.Â Because it is only a 50 page chapter in a much larger work, I don’t go into as much detail as I would have liked about probability.Â So I recommend a couple of longer works dedicated to the subject.Â One is Grinstead and Snell’s *Introduction to Probability*, which I had the honor of being taught from (in a pre-release edition) by John Kemeny.Â The other is Larry Gonick’s *Cartoon Guide to Statistics*, which is a surprisingly informative little book.Â However, after seeing a review of *The Drunkardâ€™s Walk: How Randomness Rules Our Lives*, by Leonard Mlodinow, I may have a third.Â I recommend at least reading the review, and possibly checking out the book.

And if you’re into podcasts (and who isn’t, really?Â But I jest), the BBC’s In Our Time podcast has a episode about Probability.Â I haven’t had a chance to listen to it yet, but if it’s like their other programmes, I’m sure it will be quite interesting and informative.

## 6/7/2005

### Reparameterizing a Curve, Part Three

Now we come to the final part: merging Newton-Raphson and bisection together, and choosing the correct one depending on the situation.

(more…)

## 5/11/2005

### Reparameterizing a Curve, Part Two

Due to preparing for a week of vacation, and then going on a week of vacation, and then recovering from a week of vacation, this blog has fallen from the wayside a bit. But I’m back, and I’ll try to wrap up this topic in a couple of postings.

When last we left our intrepid heroes, we were discussing Newton-Raphson root finding, how it’s used for reparameterizing a curve, and when it fails (namely when the speed on the curve approaches zero). The solution hinted at was the bisection method.

(more…)

## 4/11/2005

### Reparameterizing a Curve

I’m still optimizing memory and speed so nothing exciting to report under the lighting topic. Maybe later this week or next week.

So let’s look at a bit of code from the book, shall we?

(more…)

## 3/25/2005

### Angular Dynamics: Matrices or Quaternions?

One of the questions asked during the tutorial was whether matrices or quaternions were more efficient for angular rigid body dynamics. My response was that a matrix would be more efficient, because a quaternion requires a conversion to a matrix to compute the transformed inertial tensor matrix. If we analyze the operation count, this certainly appears to be true:

(more…)

## 3/22/2005

### Post GDC Notes: Part 3

Having promised to post about center-of-mass calculations, and subsequently having put it off for a week, I’m going to punt.

First, you can read the seminal Brian Mirtich paper on the subject. Then Dave Eberly responds with a more efficient method. Both rely computing solid integrals across a polytope of constant density by treating them as surface integrals across the polytope’s faces. In the end they end up with total mass (assuming a density of 1), center of mass, and the inertial tensor matrix for the polytope.

If those haven’t scared you off, Jonathan Blow has a more intuitive approach, noting that the solid integral is basically a volume calculation. By breaking the object into tetrahedrons, we can compute the total solid integral as a weighted sum of tetrahedral solid integrals. The tetrahedrons can be computed by selecting a single point — the origin or the centroid will work — and using that point together with the points on each triangular face. As he notes, it’s likely that the Eberly and Mirtich approaches are more efficient, but not as easy to understand from a geometric standpoint. Blow doesn’t provide an implementation, but you can grab a similar one from Stan Melax here.

Finally, Erin Catto pointed out to me after the tutorial that you can compute the center of mass by computing the centroids of the tetrahedrons and then performing a weighted sum of all the centroids, where the weight is the volume of a particular centroid’s tetrahedron divided by the total volume of the polytope. Blow also covers this in his paper as well. To get an intuitive sense of this, I recommend messing around with some origin-centered triangles (e.g. (1,0),(-.5,-.5),(0,-.5),(.5,-.5)) and using areas instead of volumes.

## 12/29/2004

### Errata: Tensor Product

William Brennan points out that on page 84, the equation

is not correct. There are two ways to rewrite this. In one the intended order is correct, but is missing the transpose operator to indicate that it’s a row vector:

Alternatively, you can generate the same result with a column vector by doing:

In Chapter 3, we simplify the Rodrigues formula (page 123) and general planar reflections (page 128) by using the tensor product. Since the arguments of the tensor product are the same in these cases, the ordering doesn’t matter. However, the ordering of the arguments needs to be reversed in the shear matrices on page 132, so

and

Other tensor product arguments may need to be reversed elsewhere in the text, though I can’t find any at this time.

## 6/29/2004

### Getting Axis and Angle from Rotation Matrix

There is a bug in the code that gets the axis and angle information from a rotation matrix. Where the angle is 180 degrees (the last else statement) it computes the reciprocal value incorrectly. The original code is:

- float recip = 0.5f/s;

The fixed code should be:

- float recip = 1.0f/s;

This should be corrected in both IvMatrix33 and IvMatrix44.

## 5/15/2004

### Correction: Quaternion Creation

This is both a correction, and an improvement to the creation of a quaternion from two vectors. Or as it is commonly known, Shortest RotationArc.

The current code doesn’t handle the case where the two vectors are opposing; there’s an if statement to detect it, and a comment, but no recovery code. The problem is that there are an infinite number of orthogonal rotation axes — we just need to pick one. The solution is to check to see which axis the start vector (it could be the finish vector, it doesn’t matter) is generally pointing along. If it’s pointing generally along the -axis, then we take the cross product with the -axis to get our rotation axis. Otherwise we take the cross product with the -axis.

(more…)

## 5/4/2004

### Correction: Quaternions

Ah, our first errata. How bittersweet.

Eric Mumau sent me this one. On page 499, equation 10.11 is wrong. It should read:

Correspondingly, the following equation should read:

The rest of the page follows as before.

This affects the code which starts at the bottom of page 499. At the top of 500, the first three lines should be rewritten as:

- float vMult = 2.0f*(x*vector.x + y*vector.y + z*vector.z);
- float crossMult = 2.0f*w;
- float pMult = crossMult*w - 1.0f;

Note that the code on the CD is not the same as that given in the text, and in fact will actually work. However, it is slightly less efficient than what is given here.