Tuesday, July 17, 2007

Re: How to write Iterators really REALLY fast (in Java)

Browsing through the last week entries of my subscriptions, I stumbled across an interesting post entitled "How to write Iterators really REALLY fast". It is about simulating the yield functionality available in C# iterator blocks as of .NET 2.0 - as you might have guessed: it allows to write iterators really fast. =)

The author presents an example featuring a utility class Yielder, which provides the same ease-of-use completely without pre-compilers and keyword overrides/additions. However, the implementation details were not published with it - but promised for next week (as of now 'this' week). Feeling eager how this can be done, I tried implementing the interface based on the descriptions in the text and the mentioned examples (one repeated here for convenience):

public Iterator iterator() {
return new Yielder() {
public void yieldNextCore()
{
for (Object nextItem : coll) {
if (pred.evaluate(nextItem)) {
yieldReturn(nextItem);
}
}
}
};
}

However, it turned out that the task was not as easy as I thought. Actually, it took me over one hour to write one and I didn't succeed in making it as efficient as a true "yield".


Probably, the author will soon present a smart solution to the two challenges I encountered:

1. How to create more than one Iterator from a single Yielder instance?

Obviously, yieldReturn is a method of the Yielder class, so it is not (automatically) aware of the a specific Iterator instance. Therefore, I moved the method (along with a yieldBreak equivalent) to an interface which will be passed to the iterator-block (yieldNextCore).


2. How to avoid multiple iterations without an additional collection as storage?

Maybe my intuition is wrong, but the Next in yieldNextCore implies to me that the method will be called repeatedly. Probably until it exits after a yieldBreak call or without one to yieldReturn. Hence, in the example above it would be the number of elements the predicate pred evaluates to true. Taking the worst case: One call for each of the n elements in the collection coll. Ergo the time scale rises from O(n) to O(n^2)!

Anyway, I might be wrong and it is only called once. Thus however, an element passed to yieldReturn has be stored until the corresponding Iterator.next call releases it. In a single threaded environment, the means all of the (predicated) elements since there is no way to suspend the for-loop. This may be practicable for iterations over a few elements. But again, the worst cast the complexity rises from O(n) to O(n^2) - this time in memory consumption!

As a result, the only possible implementation idea I got was utilizing a dedicated thread to execute the iterator-block (yieldNextCore). This can block the iteration while the iterators' method can return. However, it turned out one thread is not enough, since it could "wait forever", if the iteration is not completed. Another had to monitor a weak reference to it. But again, the the problem is that the loop will always be executed, which leads to O(n) even if the iteration is not continued after retrieving the first element.

1 comment:

Unknown said...

Ok, it seems the author cheated a bit using bytecode manipulations. Anyway, the result is quite nice, have a look at the project!