I unfortunatelly have never done DSP before, but read a lot of Articles on it by now. But I cannot get to the point of really understanding what I can do to solve my Problem, so here is what I got:

I used the Java Sound API to open a TargetDataLine through a Mixer, which is the line-in or mic port of my soundcard. What I get from that is a byte[].

And here is what I want to achieve:

I want an equalizer like App. which shows me the Volume of the single bands like an 8 band EQ. Visualizing the Data later on is not the Problem. But how can I get to the data that gives me for example the volume of the frequencies 30-80hz? I know I have to apply an Fast Fourier Transition to the Input Stream somehow, but how?

I found implementations of FFTs but they all just show itself by a generated Signal, not an input Stream, like I have. So how can I do a FFT on the byte[] Stream I get off my AudioSystem and how can I get the volume like for the example I named above?

Please Help me.

Greetings,

Sebastian

CR

Originally posted by Sebastian Herrlinger:

Sorry, missed that. Corrected.

I would be thinking in terms of 3 separate Threads (hurrah for Java Threads!)

1. input stream to byte[] objects sent to a queue of byte[] objects for 2.

(discard byte[] objects if queue is too full)

2. FFT or other algorithm processes blocks from the queue and deposits result objects in an output queue (likewise discard if queue is too full)

3. display Thread tries to keep up with output of 2

Let us know what you come up with, sounds fascinating.

Bill

Here is what I got:

How can I compute bytes?

Sorry if this sounds stupid...

Rancher

int v = 0xff & b[n] ; // masking so int values stay in 0 - ff range

float f = ((float) ( v - 0x80 ))/ 127.0 ;

x[n] = f ;

// guessing that byte values representing negative part of wave form are less than 0x80 ( 127 decimal )while positive part is 0x80 - 0xff

Bill

(0,0)(0,1)(1,0)(1,1)(2,0)(2,1)...

where, for example, (2,0) is the third sample in the first channel. Like most things to do with arrays, we index from 0 in both cases.

So your first job is to interpret the samples and channels. In the simple example above, this is easy. But it gets more complex if you have 16-bit stereo PCM (this, or 24-bit, is usual for high audio)---then you have:

(0a,0)(0b,0)(0a,1)(0b,1)...

where a and b are the two bytes which make up each sample (16 bits = 2 bytes). But depending on the PCM format, this could be big or little endian. Java is big endian; the most common PCM is little endian. So when you extract the data into Java ints/doubles/whatever, you have to swap the byte order (note that Java NIO can help you here). Second thing: PCM values can either be signed (the usual) or unsigned. So you have to take account of this conversion too, and for a 2-byte value you would need to use an int rather than a short if it's unsigned (unless you want lots of conversions using 0xFFFF masks everywhere that is).

The FFT is actually a simple transformation, but is fairly computationally expensive. As pointed out earlier, you need to do it on blocks of the above samples. Small enough blocks that your data doesn't take too long to buffer out (avoiding high latency), long enough that you don't have to run the FFT too often, as it is O(N log N) whereas running it on small blocks too often approaches the naive O(N^2). Don't be fooled by the simplicity of the equations: this can make considerable difference on blocks of even reasonable sizes (e.g. 512 or 1024). BTW, if you aren't already familiar with it, the Cooley-Tukey algorithm is your safest bet as it is O(N log N). In addition, rather than computing "(float) Math.cos (arg)" every time, you can use some elementary trig identities to reduce this to multiplications and additions only. Using cos and sin functions is computationally very expensive; simple times and addition are not.

Using the three threads sounds overkill. Rather than blocking to wait for input as you'd have to in that model (or risk out of memory since you'll read an entire uncompressed file into the queue at once), you can easily input, process and output in a single queue. All sound cards have an output buffer which helps avoid the jitter caused by delays in processing. You won't notice any problems even using a small buffer. Also if you have only 44100Hz (CD quality) as the sample rate, then your 2GHz processor has plenty of cycles spare to do the FFT computations! This is the "normal" way to do it... no need for multi-threading in a simple sound app.

Finally, there's something quite subtle you may have missed: if you intend to manipulate the results in Fourier (spectral) space, for example to do EQ processing, and output the results as samples, you'll need to use the IFFT to get back again. However, this will degrade the signal giving you crackles at the boundaries between frames. It's called Gibb's phenomenon and is caused by discontinuities you create. To avoid this, you'd need to use the overlap-add method or similar, or the audio will sound awful and you'll wonder why!

Hope that helps. This is a long and complicated subject which really takes weeks or months of study to understand fully. I think a strong background in mathematics makes it considerably easier

I have those two functions for the Input:

So, what exactly do I have to do with the byte[] I have to link it to an EQ Visualisation that visualizes the Volume of 8 frequency specs?

Sorry, I looked into tutorials and I ordered two books on the subject to really get into it, but I need to get this done somehow.

Ranch Hand

Right, so you've chosen W. This now needs to be the number of samples in all channels in your input. As you've done, it's best to start with a mono input (1 channel) or it'll just get complicated since you'll need to combine the channels in some way (or process them via FFT separately, but that's really costly). So the size of the input byte[] = W * (number of bytes in a sample) = W*S/8 since sample size, S, is usually expressed in bits. Overall therefore the reading loop takes the form:Now, the key here is to choose W optimally so that doing everything including the drawing (though the FFT is usually the expensive bit) is within the ability of your CPU. Otherwise data will start backing up and the sound card's buffer will overflow so input samples are lost. Swing will run graphs in a different thread, so you can just supply the graph with the double[] and call it's repaint() or whatever. Try to convert the fftMag method you've shown above to using double[] in and out as it'll be much more accurate. One thing we find quickly with FFT is that it is prone to rounding errors - sometimes float just isn't good enough, so double is best if possible.

Right, the final bit is to implement the decode(byte[],double[]) method which does the conversion you were after since the first post... This will depend on data extracted from the AudioFormat. Of course a general method can be written; in your case you need to observe:

And that's pretty much it I think... Let me know if that works for you!

Usual disclaimer: I have tested none of the above and it is all from memory. It may be slightly incorrect and may not compile. Use with care!

[ October 12, 2008: Message edited by: Charles Lyons ]

Ranch Hand

[ October 12, 2008: Message edited by: Charles Lyons ]

My brain smokes as I am trying to understand the formulas on wikipedia...

Originally posted by Sebastian Herrlinger:

Woaa... that is knowledge. Thanks for your help. I will try to get this to work. What still bugs me, is the fact that I don't really know what to do with the set of complex numbers... Are those the frequencies? How do I draw a Graph with them? damn... I sound stupid...

The results of an FFT is a set of complex pairs representing the real and imaginary parts of the Discrete Fourier Transform of the input data. How you are presented with these pairs depends on the FFT implementation but that is just an implementation detail. For example, my FFT implementations present the results as an array of alternating (real, imag) doubles.

Each complex result has a related frequency. Assuming you have taken the transform of N data points with a sample rate of S samples per second then the frequency resolution is S/N so the nth complex pair is associated with frequency n*S/N .

An estimate of the Power Spectral Density of the input data is given by the modulus squared of the complex values.

The fact that the FFT is taken over a finite number of points causes a distortion of the spectral density. In order to reduce the distortion it is necessary to use a 'window' function to pre-process the data before taking the FFT. There are a large number of 'window' functions (Hanning, Hamming, Poisson etc). Google will find you further information on this but I would suggest you first use 'Hamming' and only worry further about windowing if you have very large single frequency component in your data.

Retired horse trader.

Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.

Mhhh... looks nicer, if all the curves are drawn, but the screenshot didn't work. What I did was:

So thats the Waveform of the Signal? A single Signal?

How can I link it to a Form like:

Thank you really much to help me through this. I think I am on the way of understanding, though it is hard. Also I'm german and the english vocabulary confuses me sometimes

[ October 13, 2008: Message edited by: Sebastian Herrlinger ]

Ranch Hand

This is all true, but to add one further thing: it turns out that for a real input signal (such as your audio, though sometimes using a complex input in engineering is useful), the complex frequencies above S/2 are complex conjugates of those below S/2. You can prove this mathematically in just a couple of lines. Hence the modulus of the nth complex frequency is the same as that of the (S-n)th (up until n=S/2). This aliasing is the reason for Nyquist's theorem---that you sample at twice the maximum frequency contained in the signal. So at your 8kHz, you can examine only up to 4kHz in your signal. The other moduli (above S/2) you have to discard because they are redundant information. Your FFT algorithm does this already.Each complex result has a related frequency. Assuming you have taken the transform of N data points with a sample rate of S samples per second then the frequency resolution is S/N so the nth complex pair is associated with frequency n*S/N.

So in your case, you have the output out[n] from the FFT algorithm, which is already of size S/2 to take account of what I mentioned above. Each of these values is a modulus squared of the corresponding frequency n*S/W. So in your case, each value is n*8000/1024 ~= n*7.81 for n in [0,S/2] corresponding to [0,4kHz]. So your output increases in roughly 7.8Hz increments, which is fairly high resolution. You can see that on a 44100Hz (CD quality) signal, using 1024 as the window only gives you 43Hz increments, considerably lower resolution but often good enough nonetheless.

Yes, I suggest you read a lot into the topic. It can be fascinating. Really you need the know the mathematics behind this though, and there is substantial background in techniques (complex numbers, integrals and contour integrals, and some engineering principles) to understand it all fully. A scholarly book on DFTs or DSP will set you up. DSP itself then diverges from this point: you can look at filtering either in Fourier space or using convolution in time space (surprisingly, this is usually more costly than doing FFT then IFFT!). You'll also meet things like the windowing we mentioned, FIR filters, and the complex z-transform. You don't really need to know much detail about the inverse transform, though if you want the challenge, it's good to demonstrate it using contour integrals in the complex plane---indeed, this is where the z-transform originates.

Ranch Hand

This is something on the power density in the spectrum. At the moment you are pretty much there with it. To get a histogram like you have, you could simply add together the powers of the frequency components inside the ranges you require. That should do as a quick fix (you can poke around with the FFT algorithm a bit to get an actual power spectrum as described in the previous link). Alternatively you can use a smaller window size (W) and the DFT will do the work for you, though it'll be less accurate near the boundaries of your frequency blocks.How can I link it to a Form like:

But mathematics is universal? I understand what you mean though---grasping these concepts is difficult without a language barrier, and you're doing well to get this far!Also I'm german and the english vocabulary confuses me sometimes

I managed to get what I want... in a way. But the CPU is very busy for just having a visual EQ. Is there anything I can do to speed things up with the FFT or make it use less CPU? You wrote something about using log instead of sin() and cos(). How do I do that? Haven't read anything about that in the Articles I have here right now. Sorry again for my bad Math

I did a little Video and uploaded it (Big day, first youtube video =))

http://www.youtube.com/watch?v=MCJNoDE2vsM

There you go... you can see what I have so far. I also know now how I could make it 32 or 64 bands to display. I just let the "FFT Loop" call the custom paint()-Method of my EQ-AWT-Canvas-Object every cycle. Without the recording Tool I get about 60 fps on my 2Ghz/1GB (Single Core, about 7 years).

Another Question that came up was: How do I measure the volume of each frequency spectrum I have? Decibell? All I did for now, was to multiply my doubles with 100 and (int) them to have usable numbers for animation. Is there any kind of normalization I could do?

Sorry to bother you with another bunch of questions...

Thanks very much!!!

Ranch Hand

Actually, this kind of optimisation is a computer science problem; the mathematics is sound and it makes no difference if you use sin and cos or the trig trick. To optimise, we note sin and cos are expensive functions; multiplications and additions are relatively cheap however. So you need to use the addition/subtraction trig formulae shown on Wikipedia (the angle sum and difference identities). Given your algorithm and those identities, can you think how you might be able to replace the sin() and cos() in your FFT method with a single sin() or cos() evaluation and a set of multiplications and additions? I'll leave you to see if you can work it out.Is there anything I can do to speed things up with the FFT or make it use less CPU? You wrote something about using log instead of sin() and cos(). How do I do that? Haven't read anything about that in the Articles I have here right now. Sorry again for my bad Math

What you discover quickly is there is no single definition of what volume is. It has to be relative to some fixed value. The link I gave in my previous post gives you an example of a particular display. It is usual to display an EQ in terms of the power of the signal too, not the volume. Again, the link I gave in my previous post covers that. Decibel (dB) is only a logarithmic scale---which better represents how we hear volume---and again, it is relative so you have to decide what you need it relative to. Unless you have a specific need for units on your scale, you may only want to scale the axes logarithmically to give a more perceptive display.How do I measure the volume of each frequency spectrum I have? Decibell?

Ranch Hand

Also, I didn't look carefully at your algorithm. Is this a naive FFT implementation? If so, it'll be O(n^2). You can make it O(n log n) which is a massive improvement*. The Cooley-Tukey algorithm is the most common example of the latter. Google for some code examples if you can't figure out how to do it from the equations. You can still use the sin/cos trig identity trick above to reduce cycles even further.Is there anything I can do to speed things up with the FFT or make it use less CPU?

* As an example, I once used a naive FFT to try to solve a PDE---it took over 20 minutes. The equivalent O(n log n) FFT took under 1 minute! Some careful thought and attention to detail pays wonders when it comes to optimising algorithms!

[ October 14, 2008: Message edited by: Charles Lyons ]

**[Charles]: Is this a naive FFT implementation? If so, it'll be O(n^2).**

One could argue that's not really an FFT then. It's just an FT.

Ranch Hand

**D**FT to be precise). Hardly anyone uses a plain DFT due to its inefficiencies, so it's a slip of my tongue to refer to everything as an FFT... my mistake! Sometimes I just write IFT too when I really mean IFFT (and probably vice-versa occasionally). Indeed I should have said the naive DFT is O(n^2) while any FFT algorithm is O(n log n). Thanks for the correction.

Originally posted by Mike Simmons:

[Charles]: Is this a naive FFT implementation? If so, it'll be O(n^2).

One could argue that's not really an FFT then. It's just an FT.

Indeed. The Fast Fourier Transform (FFT) is just a fast method O( nlog(n))of implementing the Discrete Fourier Transform (DFT) O(n^2).

There are many optimizations one can make to make the FFT even faster. For example, one can save a load of time by pre-computing all the sin() and cosine() values and pre-computing the reorder indexes. With some clever indexing, the tables are only half the length of the data being transformed.

If the OP is willing to include some native code in his system then he should take a look at fftw which is written in C and assembler. This is 8 times faster than my Java based FFT and my Java based FFT is pretty fast. fftw is also about 8 times faster than my pure C++ based FFT and 6 times faster than my FORTRAN based FFT.

Retired horse trader.

Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.

**[Charles]: You're absolutely correct (though it's a DFT to be precise).**

Sure - but as FFT is discrete too, the D seemed irrelevant to the point I wanted to make, which was that an O(n^2) transform is not Fast.

**[Charles]: Hardly anyone uses a plain DFT due to its inefficiencies, so it's a slip of my tongue to refer to everything as an FFT**

Yeah, that's probably what happened in the original post as well, talking about FFT as a general synonym for Fourier Transform.

I need to calculate the volume in decibel unit of selected audio wave file under the time domain. It is ok to print the volume in decibel unit with time. But I have no idea about how to implement and have no knowledge of formula.

I read this but i can't understand.

Any help with source code is appreciated.

If you have to make a calculation of the volume in decibels but you don't have any idea of the theory and mathematics behind what you need to program, then it's going to be very hard. How can you write a program if you don't understand what it's supposed to do? Before you start programming, do some research to learn exactly what your program is supposed to calculate.