blog://micha.monoid.net: July 2007

Browsing through the last week entries of my subscriptions, I stumbled across an interesting post entitled "How to write Iterators really REALLY fast". It is about simulating the yield functionality available in C# iterator blocks as of .NET 2.0 - as you might have guessed: it allows to write iterators really fast. =)

The author presents an example featuring a utility class Yielder, which provides the same ease-of-use completely without pre-compilers and keyword overrides/additions. However, the implementation details were not published with it - but promised for next week (as of now 'this' week). Feeling eager how this can be done, I tried implementing the interface based on the descriptions in the text and the mentioned examples (one repeated here for convenience):


public Iterator iterator() {
  return new Yielder() {
    public void yieldNextCore() {
      for (Object nextItem : coll) {
        if (pred.evaluate(nextItem)) {
          yieldReturn(nextItem);
        }
      }
    }
  };
}

However, it turned out that the task was not as easy as I thought. Actually, it took me over one hour to write one and I didn't succeed in making it as efficient as a true "yield".

Probably, the author will soon present a smart solution to the two challenges I encountered:

1. How to create more than one Iterator from a single Yielder instance?

Obviously, yieldReturn is a method of the Yielder class, so it is not (automatically) aware of the a specific Iterator instance. Therefore, I moved the method (along with a yieldBreak equivalent) to an interface which will be passed to the iterator-block (yieldNextCore).

2. How to avoid multiple iterations without an additional collection as storage?

Maybe my intuition is wrong, but the Next in yieldNextCore implies to me that the method will be called repeatedly. Probably until it exits after a yieldBreak call or without one to yieldReturn. Hence, in the example above it would be the number of elements the predicate pred evaluates to true. Taking the worst case: One call for each of the n elements in the collection coll. Ergo the time scale rises from O(n) to O(n^2)!

Anyway, I might be wrong and it is only called once. Thus however, an element passed to yieldReturn has be stored until the corresponding Iterator.next call releases it. In a single threaded environment, the means all of the (predicated) elements since there is no way to suspend the for-loop. This may be practicable for iterations over a few elements. But again, the worst cast the complexity rises from O(n) to O(n^2) - this time in memory consumption!

As a result, the only possible implementation idea I got was utilizing a dedicated thread to execute the iterator-block (yieldNextCore). This can block the iteration while the iterators' method can return. However, it turned out one thread is not enough, since it could "wait forever", if the iteration is not completed. Another had to monitor a weak reference to it. But again, the the problem is that the loop will always be executed, which leads to O(n) even if the iteration is not continued after retrieving the first element.

Yesterday, I browsed once again through Neal Gafter's Blog... It turned out to be a great mistake, as I got an idea which kept floating in my mind almost the whole night and day: A way to abstract loops or other code blocks without closures, but by inlining special methods around them (seems like the inverse of a closure to me).
Still not being 100% convinced by function types, especially their syntax, I tried to write myself: A proposal for extending the Java Language Specification

Actually, I think the idea behind it is great, although the proposal itself might not - I didn't sleep much last night and have/had plenty other things to work, so please forgive the small mistakes in the pseudo code and text. However, if there is interest, I'll try to find the time to revise it. So please let me know what you think about it!
[ Collaborators are welcome - its on Google Docs you know ;-) ]

Abstract:
This proposal shows that it is not necessary to add full blown closures or function types to the Java programming language in order to allow control abstraction.
The main motivation behind it is the agreement, that control abstraction cannot be implement with the current language constructs, such as anonymous classes - only syntactic improvements won't help. On the other side however, there is the fear that the introduction of nameless functions will remove one of the languages strongest points: its readability - raw type sequences everywhere. The proposed solution is to reuse existing syntax and add a single new semantic, which is simple-to-use, (relatively-)simple-to-implement, but yet very powerful. In short, the main idea is to inline special methods surrounding a (closure-)block, which remains in the lexical scope where is called.

Please have a look at the full document.

Usage-Example: Map enhanced for-loop.
(see document for the declaration of 'forEach')

Usage-Example: with a.k.a. closeAfter.
(see document for the declaration of 'with')

Usage-Example: blocked asynchronous reading.
(see document for the declaration of 'readFully')

blog://micha.monoid.net

Tuesday, July 17, 2007

Re: How to write Iterators really REALLY fast (in Java)

Saturday, July 07, 2007

Inline-Methods and Closure-Blocks