Optimizing Fresnel reflection code

This is the Fresnel reflection code from WinOSi renderer:
float R_fnf(float n1, float n2, double ce1)
{
  if(fabs(n2-n1)<0.001) return 0.0f;
  double e1 = acos(fabs(ce1));
  if(e1<0.01f) return (float)sqr((n1-n2)/(n1+n2));
  double c1 = sin(e1) * n1/n2;
  if(c1 < 1.0) {
    double e2 = asin(c1);
    return 0.5f*(float)( sqr(sin(e1-e2))/sqr(sin(e1+e2)) + sqr(tan(e1-e2))/sqr(tan(e1+e2)) );
  } else return 1.0f;
}

I make it 2 times faster.
How? Lets go, I show you ;)


because sqr(a)/sqr(b)=sqr(a/b) then
return 0.5f*(float)( sqr(sin(e1-e2))/sqr(sin(e1+e2)) + sqr(tan(e1-e2))/sqr(tan(e1+e2)) )
can be rewriten like
return 0.5f*(float)( sqr(sin(e1-e2)/sin(e1+e2)) + sqr(tan(e1-e2)/tan(e1+e2)))
minus 2 calls to sqr()!

lets go further!
sin(e1+e2)=sin(e1)*cos(e2)+cos(e1)*sin(e2)
sin(e1-e2)=sin(e1)*cos(e2)-cos(e1)*sin(e2)

but because in R_fnf()
e1 = acos(fabs(ce1));
e2 = asin(c1);
fabs(c1)=cos(e1) and c1=sin(e2)
i. e. cos(e1)*sin(e2)=fabs(ce1)*c1

sin(e1)=sqrt(1-sqr(cos(e1))=sqrt(1-fabs(c1)*fabs(c1))
cos(e2)=sqrt(1-sqr(sin(e2))=sqrt(1-c1*c1)

let
A=sin(e1)*cos(e2)=sqrt((1-fabs(c1)*fabs(c1))*(1-c1*c1))
B=cos(e1)*sin(e2)=fabs(ce1)*c1
C=sin(e1-e2)=A-B
D=sin(e1+e2)=A+B
sqr(C)=sqr(A-B)=sqr(A)+sqr(B)-2*A*B
sqr(D)=sqr(A+B)=sqr(A)+sqr(B)+2*A*B

in my implementation
C2=sqr(C)
A2B2=sqr(A)+sqr(B)
AB2=2*A*B
D2=sqr(D)

let's go to our sqr(tan(e1-e2)/tan(e1+e2))!
tan(e1-e2)=sin(e1-e2)/cos(e1-e2)
tan(e1+e2)=sin(e1+e2)/cos(e1+e2)
tan(e1-e2)/tan(e1+e2)=(sin(e1-e2)/cos(e1-e2))/(sin(e1+e2)/cos(e1+e2))

but (var1/var2)/(var3/var4)=(var1/var2)*(var4/var3)=(var1*var4)/(var2*var3)
in our situation
var1=sin(e1-e2)=C
var2=cos(e1-e2)=sqrt(1-sqr(C))
var3=sin(e1+e2)=D
var4=cos(e1+e2)=sqrt(1-sqr(D))
(var1/var2)/(var3/var4)=(C*sqrt(1-sqr(D)))/(D*sqrt(1-sqr(C)))
sqr(tan(e1-e2)/tan(e1+e2))=sqr(C*sqrt(1-sqr(D)))/(D*sqrt(1-sqr(C)))=(sqr(C)*(sqr(sqrt(1-sqr(D))))/(sqr(D)*(sqr(sqrt(1-sqr(C)))=
=(sqr(C)*(1-sqr(D))/(sq r(D)*sqr(1-sqr(C))=(C2*(1-D2))/(2*D2*(1-C2))

... and simple optimizations!!!

my source code:
float R_fnf1(float n1, float n2, double ce1)
{
double c12,z,A2B2,AB2,C2,D2,ZC12,face1;

if(fabs(n2-n1)<0.001) return 0.0f;
face1=fabs(ce1);
if(acos(face1)<0.01f) return (float)sqr((n1-n2)/(n1+n2));
z=1-face1*face1;
c12=z*sqr(n1/n2);
if(c12<1.0)
   {
   ZC12=z*c12;
   A2B2=z+c12-ZC12-ZC12;
   AB2=sqrt((z-ZC12)*(c12-ZC12));AB2=AB2+AB2;
   C2=A2B2-AB2;D2=A2B2+AB2;
   return (float) (C2*(1+(1-D2)/(1-C2))/(D2+D2));
  }
else return 1.0f;
}

-5 function calls, but +2*/ what faster then original R_fnf() in 2 times!
another bottleneck in raytracers - intersection finding code.

---------------------------------------
Be optimized!
tigra
mail me
icq 429048706

Hosted by uCoz