问题描述:

I'm implementing something similar to a Spreadsheet engine in Haskell.

There are ETables, which have rows of cells containing expressions in the form of ASTs (e.g. BinOp + 2 2), which can contain references to other cells of ETables.

The main function should convert these ETables into VTables, which contain a fully resolved value in the cell (e.g. the cell BinOp + 2 2 should be resolved to IntValue 4). This is pretty easy when cells have no external references, because you can just build the value bottom up from the expression AST of the cell (e.g. eval (BinOpExpr op l r) = IntValue $ (eval l) op (eval r), minus unboxing and typechecking) all the way to the table (evalTable = (map . map) eval rows)

However, I can't think of a "natural" way of handling this when external references are thrown into the mix. Am I correct to assume that I can't just call eval on the referenced cell and use its value, because Haskell is not smart enough to cache the result and re-use it when that cell is independently evaluated?

The best thing I came up with is using a State [VTable] which is progressively filled, so the caching is explicit (each eval call updates the state with the return value before returning). This should work, however it feels "procedural". Is there a more idiomatic approach available that I'm missing?

网友答案:

Haskell doesn't memoization by default because that would generally take up too much memory, so you can't just rely on eval doing the right thing. However, the nature of lazy evaluation means that data structures are, in a sense, memoized: each thunk in a large lazy structure is only evaluated once. This means that you can memoize a function by defining a large lazy data structure internally and replacing recursive calls with accesses into the structure—each part of the structure will be evaluated at most once.

I think the most elegant approach to model your spreadsheet would be a large, lazy directed graph with the cells as nodes and references as edges. Then you would need to define the VTable graph in a recursive way such that all recursion goes through the graph itself, which will memoize the result in the way I described above.

There are a couple of handy ways to model a graph. One option would be to use an explicit map with integers as node identifiers—IntMap or even an array of some sort could work. Another option is to use an existing graph library; this will save you some work and ensure you have a nice graph abstraction, but will take some effort up front to understand. I'm a big fan of the fgl, the "functional graph library", but it does take a bit of up-front reading and thinking to understand. The performance isn't going to be very different because it's also implemented in terms of IntMap.

Tooting my own horn a bit, I've written a couple of blog posts expanding on this answer: one about memoization with lazy structures (with pictures!) and one about the functional graph library. Putting the two ideas together should get you what you want, I believe.

相关阅读:
Top