Tuesday, August 15, 2006

Complexity and Big O

Ok so you are a programmer. And you have coded thousands of lines. But are you sure you need to have neccessarily coded all those lines. In computer science, often the question is not how to solve a problem, but how to solve a problem well.

Whenever you are studying algorithms, you find that there are always two issues to be addressed:
1. Efficiency / Performance
2. Complexity

Efficiency or performance deals with the CPU time, available memory, the disk and network usage when the program is run. So this will depend on the machine hardware, the code generated by the compiler and the user code.

Complexity
, on the other hand, deals with resource allocation to the program and how it scales with size of the problem. In other words, what happens when the size of the program gets larger.

As complexity increases, the CPU usage and memory usage increases, thereby affecting performance. But a change in machine hardware or increase in memory will not cause a change in complexity. A much faster CPU may execute the algorithm faster, but it is doing the same amount of work as a slower CPU. Hence we can safely say that:
Complexity affects performance but not vice versa

Measuring Complexity
Complexity analysis attempts to characterize the relationship between the number of data elements and resource usage (time or space) with a simple formula approximation.
Why is the realtion between size of the problem and resource usage important? Well, right from the time when the first program was written, many programmers have had ugly surprises when they moved from small test data to large data sets.

The Big O Notation
The definition from NIST is as follows:
Definition: A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation f(n) = O(g(n)) means it is less than some constant multiple of g(n). The notation is read, "f of n is big oh of g of n".
Ok I admit, I had a tough time understanding that. A simpler explanation would be that Big O is concerened with what happens to the algorithm for very large values of N. Hence it will NEVER have constants or lower order terms. These terms are dropped. This is due to the fact that when N becomes large, these terms do not have a significant effect on the total complexity.
For example, if the number of operations is N^2 - N, for large values of N, the single N term is insignificant compared to N^2, therefore one of these sorts would be described as an O(N^2) algorithm. Similarly constant multipliers are ignored, so a O(4*N) algorithm is equivalent to O(N).
A list of common orders and corresponding examples can be found here!!

Can we blindly trust Big O
The answer is NO!! There are some problems with complexity analysis:
1. Too hard to analyze: Many algorithms are too hard to analyze mathematically
2. Average case unknown. There may not be sufficient information to know what the most important "average" case really is, therefore analysis is impossible.
3. Unknown constant. Both walking and traveling at the speed of light have a time-as-function-of-distance big-oh complexity of O(N). Altho they have the same big-oh characteristics, one is rather faster than the other. Big-oh analysis only tells you how it grows with the size of the problem, not how efficient it is.
4. Small data sets. If there are no large amounts of data, algorithm efficiency may not be important.

References:
1. Complexity and Big O Notation
2. uw-madson notes
3. Algorithms and Big O Notation

Monday, June 12, 2006

UnsupportedEncodingException

Java is an exceptional language. While working on my main project regarding search engines, I was drowned in various exceptions that were mericilessly thrown to me by Java.
The latest exception that I encountered was UnsupportedEncodingException cp437
It took some time for me to find out the cause.

If you have encountered this exception and still wondering what to do about it, then all you have to do is install the Java Runtime Environment (JRE) with support for all languages and fonts.

I think I got an endofblog exception now.

Monday, May 29, 2006

Meet pagefile.sys

My first acquaintance with the infamous pagefile.sys was when windows rudely announced that I had no more free space on my C:\ drive. I was working on Photoshop and had just finished processing a new wallpaper for the college. A quick scan with showman told me that my pagefile.sys was around 1.5GB
My first reaction was to delete that space hogger from the face of my hard disk. But wait, I had a bad feeling about this. My geek instincts told me that I had to check on this first.

What is pagefile.sys
Do you what is RAM?
Do you know what is a hard disk?
Do you know what is a processor?
If you answered "No!!" to all the questions mentioned above, you may stop reading this page NOW. May the gods of computers strike ye down with lightning from a shorted SMPS.
If you answered "Yes!!" to any one of the questions, please treat yourselves to a cup of coffee and read on.

The simplest explanation to the purpose of pagefile.sys is that it acts like a swap file. Pagefile.sys is how Windows handles virtual memory using demand paging. Okay now you want to know what is demand paging and virtual memory...

Virtual Memory and Demand Paging
Virtual memory is an attempt to fool the computer to think that it has more RAM.
The Memory Management Unit(MMU) on the CPU has the ability to substitute space on the hard disk for actual RAM. This space is called swap space. And this mechanism is called demand paging.The swap space is contained in a file called swap file.
Hence swap files allow the operating system to simulate extra memory.

Physical memory + swap file = virtual memory

Why extra memory you greedy operating system?
I will share a profound truth, that dawned on me when in deep contemplation with lomax.
"Memory is like clean underwear. You never get to have enough of it."

Suppose you don't have enough RAM on your machine, you can create a large swap file and hence get a larger virtual memory. The advantage here is that you can load larger programs into your memory, and run more programs concurrently.

The downside is that if you have serious memory hogging programs, they will pull down the performance due to frequent swapping of files between RAM and swap disk.

Can I configure pagefile.sys settings
Sure you can!!
Please read: Tweaking and optimizing pagefile.sys

Other Useful Links
Page File Information
Optimising PageFile Performance
Purpose of PageFiles
Another Blog that contains useful information

Sunday, May 28, 2006

Hello World

/**
* I am starting this blog as a part of my venturing into the programming world.
* I will share my code and terrible...umm terrific insight on various other codes
* that I come across.
*/

#include
void main()
{
helloworld.programming();
}