| 
							- Speed:
 -     * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs).
 - 	Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but
 - 	less wall time.
 - 
 -     * experiment with compiler flags
 -         Special thanks to Oscar Lesta. He suggested some compiler flags 
 -         for gcc that make a big difference. They shave 10-15% off
 -         execution time on some systems.  Try some combination of:
 -                 -march=pentiumpro
 -                 -ffast-math
 -                 -fomit-frame-pointer
 - 
 -     * If the input data has no imaginary component, use the kiss_fftr code under tools/.
 -       Real ffts are roughly twice as fast as complex.
 - 
 -     * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,
 -     then you might want to experiment with the USE_SIMD code.  See README.simd
 - 
 - 
 - Reducing code size:
 -     * remove some of the butterflies. There are currently butterflies optimized for radices
 -         2,3,4,5.  It is worth mentioning that you can still use FFT sizes that contain 
 -         other factors, they just won't be quite as fast.  You can decide for yourself 
 -         whether to keep radix 2 or 4.  If you do some work in this area, let me 
 -         know what you find.
 - 
 -     * For platforms where ROM/code space is more plentiful than RAM,
 -      consider creating a hardcoded kiss_fft_state. In other words, decide which 
 -      FFT size(s) you want and make a structure with the correct factors and twiddles.
 - 
 -     * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation 
 -     on embedded targets.  "I'm happy to help anyone who is trying to implement KISSFFT on a micro"
 - 
 -     Some of these were rolled into the mainline code base:
 -         - using long casts to promote intermediate results of short*short multiplication
 -         - delaying allocation of buffers that are sometimes unused.
 -     In some cases, it may be desirable to limit capability in order to better suit the target:
 -         - predefining the twiddle tables for the desired fft size.  
 
 
  |