Wednesday, May 2, 2012

Strings are from earth and StringBuilder from mars.

Introduction


I was happily married to string for a long time until I came to know the reality that “Strings are immutable” and not suitable for all scenarios. Recently I was working on a heavy HTML parser application and the program used to go out of memory frequently. The completely HTML parsing logic was using string variables.

After reading around I came to know the main reason was the immutable behavior of string. Immutable means once the data is assigned cannot be changed.

For instance if you are looping using a string variable like the code given below. Every assignment to the string creates new copies of variables and the previous copy is sent for garbage collection. So the below for loop generates different memory copies of data and the recently created is the current value.

   

Now you must be wondering why this absurd behavior. Any lame person (like me) can conclude this is not efficient and neither looks logical.

The sacrifice for thread safety


Before I start with the solution I wanted to understand why Microsoft team thought about this weird behavior. Thanks to http://stackoverflow.com/questions/2365272/why-net-string-is-immutable things started looking logical.

If you are using string variables in multithreaded scenarios every thread modification will create new copy of memory ensuring that you do not land in to multi-threaded issues. In other words thread safety is built-in by itself when new copies of data are created.

Not all work on ships


The next thing which started itching me is what if my application is not multi-threaded. What if my main motive is to save memory resources and ensure that I do not go out of memory issues?. Here’s comes the hero from mars “StringBuilder”.

“Stringbuilder” are not immutable, in other words if you change the variable data the same memory location is modified. VOW, that looks lot of memory saving during heavy concatenation operation as compared to string.

 

I wanted to see for myself that earth is flat


As a curios developer it was difficult for me to digest that internally string creates different copies of data. Out of curiosity I downloaded the CLR Profiler and ran two test of code as shown below.

One for string as the below.

string x ="";

for (inti = 0; i< 10000; i++)
{
x = "Shiv"+ x;

}


 

One for string builder.

StringBuilder x = newStringBuilder();

for (inti = 0; i< 10000; i++)
{
x.Append("Shiv");

}



Watch the allocated bytes, 400235631 bytes is way greaterthan 136597bytes.

Watch the video below for the real demo


If you do not believe what I have written see the actual video demo as follows

No comments: