At Sun's suggestion, I wanted to get some feedback from the developer community regarding an implementation enhancement to the StringBuffer class. Please tell me what you think. Here's the issue:
The current implementation of StringBuffer maintains an internal char[] to hold the characters. StringBuffer will automatically increase the size of its char[] as new characters are added. It does this by allocating a new char[] whose size is ALWAYS at least twice that of the previous char[]. This behavior can be hugely wasteful, especially for large StringBuffers. Here's an example:
StringBuffer sb = new StringBuffer(50000);
System.out.println(sb.capacity());
for(int i = 0; i < 50001; ++i)
sb.append('M');
System.out.println(sb.capacity());
This code inserts 50,001 characters into a StringBuffer whose length is 50,000 characters. The StringBuffer class responds by DOUBLING the size of its internal character array (actually, it doubles the current size plus one) to accommodate the 50,001st character. In the above example we end up with a character array of 100,002 when only 50,001 are needed.
What I am proposing is that the StringBuffer implementation be modified so that it does not automatically and wastefully double the size of the internal char[]. I believe we need a more refined, controlled mechanism for increasing the internal char[] size. Simply doubling the char[] is a very primitive, wasteful and brut force approach (IMHO).
I don't want to discuss possible solutions at this point, I simply want your feedback as to whether or not a more elegant approach is needed.
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 8, 2001 12:37 PM
(reply 2
of 25) (In reply to
#1 )
I was just thinking this, although I wasn't aware that it doubles the original capacity.
I was wondering why there isn't a double argument constructor like Vector's:
public StringBuffer(int initialCapacity, int capacityIncrement);
Multiplying the storage capacity by a constant (2 in the case of doubling) is needed to ensure that adding N characters isn't an O(N^2) operation. It's exactly this doubling in capacity that makes a StringBuffer so much faster than a String when you need to build up a string in a loop.
Perhaps StringBuffer could be changed so that it's more like ArrayList, which increases storage by 50% each time it runs out of space. However, people have complained that ArrayList is nearly twice as slow as Vector, which doubles in capacity just like StringBuffer usually does.
As is often the case, it's a tradeoff between space and time -- saving storage space will mean that programs run more slowly.
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 8, 2001 1:45 PM
(reply 4
of 25) (In reply to
#3 )
How come it dosent just get the size of the string that
is being appended to it and increase the buffer size
according to that+1. I guess something like this:
publicvoid append(String appStr)
{
int nSize = capacity+(appStr.length()+1);
/* then set the char[] to this size instead of wasting proc speed.
*/
}
To make this method more efficient you may want to remove the allocation that happens everytime this method is called. You can do this by sharing the StringBuffer object across calls to this method:
This does save an allocation, however, interanally the StringBuffer will re-allocate the internal char[] when you set the length to 0. This defeats the purpose of saving the allocation.
Perhaps the StringBuffer should keep the char array around and re-use it, instead of re-allocating it? Does this make any sense?
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 8, 2001 10:42 PM
(reply 6
of 25) (In reply to
#5 )
You are 100% correct, but what you describe is a flat out bug. I recently submitted this bug (ID 4524848) and am waiting for it to appear in the "Bug Parade".
The issue surrounding the doubling of the size of the char[], however, is not a bug. It's simply a very inefficient implementation (IMHO).
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 9, 2001 7:38 AM
(reply 7
of 25) (In reply to
#4 )
How come it dosent just get the size of the string
that
is being appended to it and increase the buffer size
according to that+1. I guess something like this:
publicvoid append(String appStr)
{
int nSize = capacity+(appStr.length()+1);
/* then set the char[] to this size instead of
wasting proc speed.
*/
}
Just an idea.
Because that would mean appending characters to a StringBuffer in a loop would take O(N^2) time instead of O(N) time. Read my post above.
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 9, 2001 7:43 AM
(reply 8
of 25) (In reply to
#5 )
Perhaps the StringBuffer should keep the char array
around and re-use it, instead of re-allocating it?
Does this make any sense?
Not really. When you call toString() on a StringBuffer, the String uses the char[] that the StringBuffer was using. The next time you want to use that StringBuffer, it will have to create a new char[] anyway because it can't change the one that it's sharing with the String.
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 9, 2001 7:46 AM
(reply 9
of 25) (In reply to
#6 )
The issue surrounding the doubling of the size of the
char[], however, is not a bug. It's simply a very
inefficient implementation (IMHO).
To be precise, it's inefficient in terms of the memory used, but very efficient in terms of the time used. As I point out above, there's a time-space tradeoff. If you make StringBuffer more memory efficient, it will become slower.
Because more people complain about Java being slow than about Java taking too much memory, that's probably the way StringBuffer should probably be implemented. Perhaps there could be a StringBuffer constructor added that takes a "growth factor" such as 50% or 20% so that a programmer could choose a slower but more memory efficient StringBuffer growth pattern.
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 9, 2001 9:50 AM
(reply 12
of 25) (In reply to
#11 )
Perhaps there should be more constructors for StringBuffer to allow finer control over its allocation. Just think of all the threads we could have arguing about which was the most efficient!
Re: PROPOSED StringBuffer ENHANCEMENT -- COMMENTS???
Nov 9, 2001 10:28 AM
(reply 13
of 25) (In reply to
#10 )
schapel, I see your point here.
As was mentioned in another post Vector has a second
constructor that allows a capacity increment
argument.
Are you suggesting that in general we shouldn't use
that constuctor with values larger than 1?
I'm not trying to be being argumentative. I take your
advice very seriously.
No, I'm not saying that... if you do use Vector and use the capacityIncrement, then you need to be aware that adding many elements to the Vector one at a time could become an O(N^2) operation rather than an O(N) operation. If you're using a Vector with thousands of elements, using capacityIncrement could be orders of magnitude slower than letting the Vector double in size each time it needs to grow.
The default behavior of a data structure that uses an array to store elements and that can have elements added to it (Vector, ArrayList, StringBuffer, BitSet, ...) should be to grow in size by a significant factor (2 or 1.5 for all of the data structures mentioned) each time it needs to become larger. When you add N elements, it will take O(N) time and O(N) wasted space.
It makes sense to allow a programmer to fine-tune this growth to a different factor, or to cause the data structure to grow by a constant number of elements instead. Depending on exactly how the data structure will be used, and whether memory or execution time is being optimized, any one of those growth patterns will be most efficient. A smaller growth factor will cause it to take more time to add elements, but will result in less wasted space. Using a capacity increment will cause the time to explode to O(N^2) but the wasted space will be only O(1).
Yes, definitely a new approach is needed. I ran into a
problem with this already, and it seems to be a big problem
if this can not be resolved. One solution that I am
thinking which may not be the best or even possible (but
seem very feasible) is to have an array of pointers to
fixed (or variable if you want more complex) pieces of memories.
The array of pointers can grow double up just like now.
Each item in the array points to a fixed piece of memory.
This approach:
-very fast appending: because no coping of more memory is
needed(every time you double, you have to copy the content
to a new one, and destroy the old one, very slow, a big
problem right now).
- Very efficient use of memory space: I have 512 Megs of physical RAM, but can
not have a 40 Megs string. Pitiful for modern computer
(shouldn't it be able to allocate much more than 512 megs of
ram String?) Why more efficient? Because it can use small
pieces of memory instead of a large continuous one. (of
course, a good operating system may do this well, and we
don't need this, but with Microsoft's 97% of desktop and
their screw up OS, you have to invent a better solution).
- Fast indexing. Just need an
extra operation to find which pointers it is. Simply a
divide a nd a mod.
- Need to give users control of increment size
Ofcourse, there could be a variation of this that provide even better architecture.
This topic has
25
replies
on
2
pages.
1
|
2
|
Next »