Category: Code

Pages: 1 2 3 4 5 6 7 8 9 10 11 ... 35 >>

17/06/10

Permalink 09:04:36 am, 1223 words
Categories: Code

F# Warts

Now that I’ve been using F# for real code for almost a month, I’ve noticed some warts. That’s pretty fast for a new language, but I am coming from Haskell, which is very similar. So some of these are warts on Haskell too, and others are complaints that F# is not Haskell.

Warts

  1. let for top-level functions instead of top-level pattern matching. Haskell’s pattern matching is the right default, but F# is too much an ML to change it. (I bet that Don Syme thought seriously about it at some point, though.)
  2. F# is a huge language because of .NET/ML dual heritage. There is massive redundancy so that both .NET people and ML people can pick up the language easily. It evokes memories of C++. At least F# 3.0 is unlikely to bolt on a whole NOTHER programming paradigm. I hope.
  3. As a side effect of the dual heritage, type syntax is not iconic like Haskell’s. Again, I’m sure the F# designers considered a change, but .NET programmers are already used to list<int> and, unlike Haskell, you also have the related types int[] and seq<int>.
  4. Another side effect of the dual heritage is that there are LOTS of keywords. There are a lot to memorise and usually two or three matching keywords to type every time you use them. while/do, for/in/do, match/with, etc.
  5. Infix functions and prefix operators are not as convenient as Haskell. This is really minor, but F# doesn’t have operator sections, and you have to fake the infixing backquotes with a reverse/pipe operator pair. Given the other syntax borrowed from Haskell, I had hoped that at least operator sections might make it in.

Those are mostly syntactic warts that I noticed. Nothing major, and CERTAINLY much improved from O’Caml, whose designers have paid very little attention to syntax. The semantic warts are a little more serious.

  1. The top-to-bottom, left-to-right compilation and baroque type system* restrict the order of expression somewhat.
  2. The monomorphism restriction means a lot of lambda-lifts (or type annotations).
  3. Even more lambda lifts occur because methods and properties are not functions (and the type checker gets dramatically dumber around OO constructs).
  4. A baroque OO type system instead of type classes, plus a love of explicitness, means that monads are much less generic/polymorphic than in Haskell. Theoretically, this is a huge demerit, because it makes it hard to write, for example, a truly generic mapM or sequence. Practically, however, monads are used less in F# anyway, and F# prefers explicitness—witness how frequently modules are used for namespacing compared to Haskell. So this is a matter of taste.. Have you heard of the BASIC monad? Typeclass abuse in Haskell can be quite scary.**

Examples

  1. No top-level pattern matching:
    let rec merge firstArg secondArg =
      match firstArg,secondArg with
      | [],[] -> ...
  2. Dual .NET/ML heritage:

    // struct
    type Point3D =
      struct
        val x: float
        val y: float
        val z: float
      end
    // record
    type Point3D = { 
      x : float; 
      y: float; 
      z: float; 
    }
  3. Non-iconic types: list<int> instead of [int] and int * string instead of (int,string).
  4. Too many keywords: while/do, for/in/do, match/with, class, struct, do, do!, yield, yield!, let, let!, return, return!, fun, function, try/finally, try/with…it’s a good thing that the indentation-based light syntax at least gets rid of the block-ending keywords.
  5. No infix/prefix flexibility:
    Seq.append (Seq.filter ((=) foo) x) seq2
  6. Strict order of compilation:
    let f l = l |> Seq.filter ((<>) foo)
    let main = printfn "%s" (f [1;2;3])
    // `main` has to appear after `f` since it references `f`. 
    // But that means the type of `l` can't be (completely) inferred 
    // `because `f` has no context for it.
    
  7. Monomorphism restriction:

    // both fail because generic functions as values are not allowed
    let f = g >> h >> i
    let csv = sepBy (pchar ',') (many anyChar)
  8. No 1st-class methods/properties:

    // both fail because neither properties nor methods 
    // can be used as functions
    let xvalues = Seq.map Point3D.x points
    let allsplits = Seq.map (string.Split [| ',' |]) filelines
  9. No generic monads:
    let sequence : ??? = ???
    I’m pretty sure even if you could write generic code for sequence, the type is impossible to express in .NET: list<’m<’a>> -> ‘m<list<’a>>.

Workarounds

  1. No top-level pattern matching:
    let rec map f = function
    | [] -> []
    | (x::xs) -> f x :: map xs
    // NOTE: Example only! Do NOT write List.map this way.
        

    The last argument is the most likely to be pattern-matched anyway, probably about 90% of the time.

  2. Dual .NET/ML heritage: Same as C++: Learn the whole language, then subset.
  3. Non-iconic types: Get used to it. It’s not that bad. Or just use arrays everywhere; you don’t usually write array<int>.
  4. Too many keywords: Subset! But this is not much of a workaround.
  5. No infix/prefix flexibility:
    Seq.filter ((=) foo) x |>Seq.append<| seq2
    But that only gives you infix functions, it doesn’t make prefix operators any prettier.
  6. Strict order of compilation: Put everything in a class—I think? Even if this solves the problem, it’s not a natural solution for some problems.
  7. Monomorphism restriction:

    // make a syntactic function or annotate type
    let f x = x |> g |> h |> i
    let csv : Parser<char,unit> = sepBy (pchar ',') (many anyChar)
  8. No 1st-class properties or methods:

    let xvalues = points |> Seq.map (fun p -> p.x)
    let allsplits = filelines |> Seq.map (fun s -> s.Split [| ',' |])

    Note that type inference doesn’t work unless the sequence being mapped are to the left of the function wrapping the property or method. There has been a tiny bit of discussion about specific syntax for this, so that something like this would be possible:

    let xvalues = points |> Seq.map Point3D#x
    let allsplits = filelines |> Seq.map (#Split [| ',' |])
    
  9. No generic monads: I don’t know that there is a workaround for this. I thought I had blogged earlier about a workaround, but I can’t find it now. In any case, this may not be a practical problem, since F# is generally used for larger systems with more developers than Haskell. There, a extra explicitness/verbosity is a decent tradeoff.

So far I like F# quite a bit. It’s really easy to pick up if you already know Haskell and/or O’Caml, and state is convenient at times, especially in an eager language. The semantics are solid. I’m really happy that Microsoft includes F# with Visual Studio now. They’re pushing the idea that the mainstream language doesn’t have to be a over-simplified lowest-common-denominator.

*It’s a perfectly good type system—for an OO type system. For a functional language (Hindley-Milner style), though, an OO type system adds a lot of complexity without giving as much as power as type classes (for example).

**I can’t find it now, but the basic idea is this: make a Num instance that wraps the Continuation monad, storing the “line number” in a map Int -> Continuation. Define a function goto that takes an int and looks up that continuation, calling it. I think. That’s conjecture, because I never saw the code, and I can’t find anything online. I think it was a joke going on the one time I stepped into #haskell.

main = do
  100 (putStrLn "Hello World")
  200 (goto 100)

16/06/10

Permalink 08:33:58 am, 876 words
Categories: Code

Developing Fing, part 4: command line interface

The previous entry in this series finished (sort of) the parser. Two things have been hurting my motivation to continue blogging. First is that I needed to blog about the F# version of the parser, since the version I’ve already written about is Haskell. I don’t really want to write about the same thing twice. Second, and more important, is that I was busy working on the actual code instead of blogging about it. There is a decent version online at http://github.com/sandersn/fing. It’s still missing three major features (web app, subtyping and type aliases), but it works for everything in FSharp.Core.

Now I’m at the point of bringing tests up-to-date with the code, so, you guessed it, I have more motivation to blog again. But I still don’t have the gumption to talk about the parser. I’m sick of parsing for now, so maybe I’ll write it up in six months, or when I have time to revamp the core token parser, which is super ugly.

Instead I’m going to write about simple stuff: the command line interface. It’s straightforward, but there is at least one interesting principle in there. I’ll work up to the more complicated parts of the code later.

module Main

[<EntryPoint>]
let main args = 
  let args,argmap = Opt.parse args
  let getrefs s = Seq.choose id (Opt.mapGet s argmap Seq.empty)
  let references = getrefs "r" |>Seq.append<| getrefs "reference"
  Fing.addReferences references
  match args with
  | [| t |] -> Fing.textSearch t
  | _ -> printfn "Fing is F# API Search. (rest of usage message...)"

There isn’t much surprising here: first I call Opt.parse to produce a list of arguments and keyword arguments. Then I define a function to arguments matching a string s, and use it to build a sequence of “-r” and “–reference” arguments. Well, wait a minute, that’s kind of weird, a nested function. I could have done it this way:

  let rs = Seq.choose id (Opt.mapGet s argmap Seq.empty)
  let references = Seq.choose id (Opt.mapGet s argmap Seq.empty)
  Fing.addReferences (rs |>Seq.append<| references)

And that’s not bad style, it just isn’t ideal. In the second way, I name the two values, repeating the code that generates them. In the first way, I name the code that generates the two values and repeat only the name I gave it. There’s a little more repetition the second way, with not much benefit.

(I also get a chance to show off the infix-function trick that I discovered. I’m still very proud of that.)

This illustrates one of the subtler aspects of functional programming. The three aspects of functional programming that I think of first are:

  1. Immutability
  2. Functions as data
  3. Functions everywhere

Well, ‘functions as data’ looks like a subset of ‘functions everywhere’, but both have a specific meaning. ‘functions as data’ refers to uses of functions that doesn’t occur at all in imperative or OO style: passing them around, saving them in data structures. ‘functions everywhere’ refers to the size and frequency of functions, which is far higher than in imperative or OO styles. Even though the ideal for all styles is many small functions (or methods), functional languages make it natural to achieve this ideal. It is definitely not natural in normal imperative, structured code, and all the “short methods are good for you” advice books indicate to me that it’s not natural in OO either. In any case, if you compare functional languages and OO languages, I think you will find a much greater percentage of the former make short function definition natural.

So, to finish up:

  match args with
  | [| t |] -> Fing.textSearch t
  | _ -> printfn "Fing is F# API Search. (rest of usage message...)"

There’s not much to say here, except that I am glad pattern matching works on arrays and I wonder why it doesn’t work on sequences. I can think of two reasons: first, seq (aka IEnumerable) is an interface, and pattern matching is a concrete operation. Second, it might be inefficient to pattern match the tail of a sequence. I don’t grok the efficiency model of iterators, but I think that, because they’re not defined recursively, a pattern-match, which treats them as recursive, must create a new enumerator to point to the tail of the matched sequence.

Regardless, it’s an inconvenience and likely one that can be fixed with an active pattern. That’s another F# feature I still need to learn.


Post-script: I had to write Opt.parse myself. I couldn’t find one built-in to .NET. Maybe I was just didn’t search hard enough, but all I found was half-baked code online. I started with one example, intending to translate it to F# in order to understand it. Once I understood it, I saw that it was buggy and incomplete, and wrote my own from scratch. (I just want back, read the comments below the code, and found a link to a reflection/annotation-based library called Consolery. Nothing in the standard library though.

Anyway, continuing with my earlier theme of “I’ll blog about whatever I want to blog about", I’ll probably talk about Opt.parse next. You might be able to help me iron out some of its ugly parts.

05/06/10

Permalink 09:16:53 pm, 899 words
Categories: Code, Linguistics

Stack Overflow, The Death of Words and Type Classes

C S Lewis has a perceptive essay called “The Death of Words". You can find the title screen here; you can poke around and probably convince Google to let you read most of the essay. Anyway, the problem that he’s complaining about is that the meaning of words tend to change over time. A lot of words that meant something just mean “good” now, partly because we liked the original thing. This is well-known in historical linguistics, so maybe Lewis picked it up from hanging around Tolkien. Lewis captures this in the quote:

“And as long as most people are more anxious to express their likes and dislikes, this must remain a universal truth about language.”

He starts giving examples and a couple of paragraphs hits upon the word which, in the 21st century, is almost beyond dead. People have almost stopped using it, even to mean “good", except in computer science. Lewis says, “Modern … has ceased to be a chronological term … and often means little more than ‘efficient’".

A recent question on Stack Overflow about dynamic scope has an answer by Eli Bendersky. The answer is great, but it’s also a good example of how dead ‘modern’ is. Here’s the end of his post (emphasis mine).

That said, lexical binding is IMHO much better for 99% of the cases. Note that modern Lisps are not dynamic-binding-only like Emacs lisp.

  • Common Lisp supports both forms of binding, though the lexical one is used much more
  • The Scheme specification doesn’t even specify dynamic binding (only lexical one), though many implementations support both.

In addition, modern languages like Python and Ruby that were somewhat inspired by Lisp usually support lexical-binding in a straightforward way, with dynamic binding also available but less straightforward.

Now, Bendersky frequently gives insightful answers about all kinds of Lisps and the P-languages descended from Lisp. I am sure he knows that Scheme (1975) is 10 years older than Emacs Lisp (1985) and Common Lisp is a year older (1984). I am sure he also knows that Python (1991) is only 6 years younger and Ruby (1995) 10 years. The reason E-Lisp didn’t have lexical scope is not because it’s less modern than the other languages. It’s because it was a mediocre scripting language from its inception. The fact that it hasn’t added lexical scope in the intervening 25 years means that it’s now a bad scripting language. Look: Python didn’t have any nested scope until 2.0. But at least Guido added it. No such luck with E-Lisp.

The word “modern” is dead. It just means “good” here, specifically “has good features". But this meaning doesn’t survive from previous usage; it’s picked up from context. And it really annoys me. It would be a lot better just to say “Note that good Lisps…” and “In addition, “good languages like Python and Ruby…". Even better, be substantive in your dislike: “In addition, “Note that full-featured Lisps…” and “In addition, well-designed languages like Python and Ruby…”

By the way, those substantive reasons mean that Emacs would be better off with any of the languages in the preceding paragraph besides E-Lisp, by the way. They’re all better languages, even Common Lisp, which hasn’t changed since 1994. If somebody could port all the millions and millions lines of E-Lisp, or least provide backward compatibility, it would probably extend Emacs’ popularity ten-fold. (I was going to say “life span", but I’m convinced that’s converging on co-equal with that of the human race at this point.)

In conclusion, Lispers, don’t mince words: E-Lisp is a bad Lisp, not because it’s not modern, but because it’s badly designed and feature-free.


The second question I noticed today is related to my recent post trying to emulate some uses of typeclasses with “standard” (ie Java) OO. Here, the question is “how similar are Go’s interfaces and Haskell’s type classes?”

Answer: just as similar as standard OO’s interfaces, because Go’s interfaces are no more powerful than Java’s. They’re just type-checked structurally. That’s cool, because implementation is implicit and can be done by anybody, not just the original author. But it’s nowhere near as powerful as typeclasses.

Of course systems with delayed typing (like Ruby’s mixins and C++’s templates) can be as powerful as typeclasses; by the time the type is checked, the type variables have been fully instantiated so the checker sees nothing but concrete types. But…it also means your error messages also contain nothing but concrete types; that’s the trade-off.

I guess the only reason for this question is that Go only has two tutorial documents, and the concept of structural typing is so unfamiliar that people stop to re-examine their assumptions about what you can and can’t do. I did, but after re-reading the Go tutorial I’m convinced that you don’t get any extra power above Java.

Update: Norman Ramsey chimed in a few minutes ago to point out that Haskell’s type system is equivalent in power to cut-free Prolog, ie “pure” Prolog, ie mini-Kanren. So if I ever want to subject myself to that pain again, I know I can do it without leaving Haskell.


Edit:

Lispers have been abusing ‘modern’ for a long time. While packing for my upcoming move, I ran across Eugene Charniak’s AI book from 1985. The Lisp tutorial chapter refers to Scheme as a “modern” Lisp and Franz Lisp as a “less modern” Lisp. Yup, still backwards: Scheme: 1975; Franz Lisp: 1978. (Although it is based on MacLisp, which is an older design.)

03/05/10

Permalink 10:39:41 am, 459 words
Categories: Code

Scaling simple file I/O

I previously came to the conclusion that Haskell and Python are about the same in terms of proximity to the (non-existent) One True Programming Language. Where one expresses things better, the other suffers, and vice versa.

Still, I find that, in terms of sheer brevity, Haskell scales a little better than Python. (about 15% better, in case you’re wondering. I measured it last time.) This is true even where Python wins initially. Take this simple script for example:

# Python
print(withFileLines(scale, sys.argv[0]))
-- Haskell
getArgs >>= head & withFileLines scale >>= print

(assume appropriate definitions for the undefined functions withFileLines and scale*)

Now we improve the formatting to print line-at-a-time.

for line in withFileLines(scale, sys.argv[0]):
    print(line)

getArgs >>= head & withFileLines scale >>= mapM_ putStrLn

Now we improve the script to combine more than one input file:

for line in scale(concat(withFileLines(cdr, arg) for arg in sys.argv)):
    print(line)

getArgs >>= mapM (withFileLines tail) >>= concat & scale & mapM_ putStrLn

(The first line of each file is the title and must be dropped, although id would work almost as well. Functional utilities for Python are below.*)

Haskell requires a lot more up-front knowledge, but once you’ve learned all the combinators, they are more powerful, so they require less restructuring when expanding. The principal difference here is that the Haskell code stays almost as flat as it was when it started, the pipeline just gets longer. In Python the shape of the code jumps around because things don’t combine quite as well. (Though it’s really close.)

In 2003, when I first started using it, I tried to explain to a classmate why Python is better than Java. The main reason is that it gives the programmer a larger vocabulary, a larger set of abstractions to use in building programs. A larger vocabulary makes for shorter programs. It turns out I have a nearly unlimited hunger for new vocabulary, and a lot of the people coming up with new words are working in Haskell.

PS: Yes, these stupid posts will continue until I finish my dissertation. I don’t have time to do a serious write-up of parsing until then. Fortunately it’s only two more weeks.

*Here are some example implementations:

def withFileLines(f, file):
  return f(iter(open(file)))
def scale(lines):
  l = list(lines)
  return (str(float(line) / len(l)) for line in l)

def cdr(it):
  it.next()
  return it
def concat(it):
  return (y for x in it for y in x)
def id(x):
  return x

and Haskell:

withFileLines f = readFile >=> lines & f & return
scale lines = lines |> map ((read :: Double) & (/ len) & printf "%.5f")
  where len = fromIntegral (length lines)

23/04/10

Permalink 10:10:28 am, 1217 words
Categories: Code

Faking type classes in C#

Note:This is a post that derived from my attempt to understand how and why type classes are more powerful than interfaces. I am posting even though I haven’t double-checked the C# code—most of the samples are rewritten from my memory of the examples I wrote last weekend. Once I finish my dissertation, I’ll come back and test the sample code to make sure what I say fails actually fails. In the meantime, if I say anything particularly egregious, you can correct me in the comments.

Interfaces and type classes are pretty similar, but interfaces require implementation via inheritance, so their signatures are not as strict.

Specifically, if you know about interfaces but don’t know about type classes, then this:

interface Showable {
  public abstract string toString();
}
class Whatever : Showable {
  public override string toString() {
    return "whatever, man";
  }
}

is equivalent to this:

class Show a where
  show :: a -> String
data Whatever = Man
instance Show Whatever where
  show Man = "whatever, man"

This is an exact equivalence between these two pieces of code. Notice, however, that the typeclass doesn’t declare a data type like the interface does. It just defines the methods needed for data types to implement the type class. So I define the data type first, then declare that it’s an instance of Show. This is not inheritance, interface or implementation; data Whatever is not related to any other Show instances in the way that class Whatever is to other Showable implementers. This will be crucial later.

(This instance declaration can be written anywhere by anyone, which is another difference from interfaces, where inheritance forces the author of the type to implement it.)

However, more complex uses of type classes can’t be translated directly, because interface inheritance makes implementers too interchangeable. For example:

class Plus a where
  plus :: a -> a -> a
  zero :: a
instance Plus Int where
  plus a b = a + b
  zero = 0
instance Plus (Set a) where
  plus = Set.union
  zero = Set.empty
-- legal --
> plus 1 2
3
> plus (Set.fromList [1,2]) (Set.fromList [2,3])
Set.fromList [1,2,3]
> plus 1 zero
1
> plus (Set.fromList [1,2]) zero
Set.fromList [1,2]
> zero :: Int
0
-- illegal --
> plus 1 (Set.fromList [1,2])
    Couldn't match expected type `Int' against inferred type `Set t'
> zero
    Ambiguous type variable `a' in the constraint:
    Plus a

But this doesn’t translate to an interface equivalent:

interface Plus {
  public abstract Plus plus(Plus other);
  public abstract Plus zero; // error, abstract fields not allowed
  public abstract Plus zero(); // make it a thunk then...
}
// ok, soldiering on
class Z : Plus {
  private int i = 0;
  public Z(int i) {
    this.i = i;
  }
  public override Plus plus(Plus b) {
    return i + b; // error, how do we know b can be used with +?
  }
  // hey, uh...shouldn't this be static?
  public override Plus zero {
    return new Z(0);
  }
}

(Note: I use the class name ‘Plus’ because I am not mathemetician enough to know the right name. It might be Monoid? It’s the one with mplus and mzero…)

This is actually a case much more similar to generics. In fact, it’s something like “generics with ad-hoc polymorphism” or something. So there’s no direct support for it in the “standard” OO languages. Although C++20x6’s concepts were basically this feature, before the Good Features Purge of x09.

At any rate, this means that an indirect translation based mostly on generics rather than interfaces will be the right answer. You might be able to patch the Plus example above to Plus<T>, but that still doesn’t solve the problem of overriding zero. A better approach is to implement type classes the way that Haskell does. This unfortunately does not come with the syntax sugar whereby the Haskell compiler automatically passes around the type class dictionary.

abstract class Plus<T> {
  public abstract T plus(T a, T b);
  public abstract T zero();
}

Notice that plus’s arguments are now of type T, not Plus (or Plus<T>).

class PlusInt : Plus<int> {
  public override int plus(int a, int b) {
    return a + b;
  }
  public override int zero() { return 0; }
}
class PlusSet<T> : Plus<Set<T>> {
  public override Set<T> plus(Set<T> a, Set<T> b) {
    return a.Union(b); // assume a version of set with immutable Union
  }
  public override Set<T> zero() { return new Set<T>(); }
}

This gets us the simple uses of type classes in Haskell. To use it, though, you have to get an instance of the correct concrete type class:

PlusInt intinstance = new PlusInt();
PlusSet<int> setinstance = new PlusSet<int>();
// legal //
intinstance.plus(1, 2);
intinstance.plus(1, intinstance.zero());
setinstance.plus(new Set<int>(new {1, 2}), new Set<int>(new {2, 3}));
setinstance.plus(new Set<int>(new {1, 2}), setinstance.zero());
// illegal //
intinstance.plus(1, new Set<int>(new {1, 2}));
intinstance.plus(1, setinstance.zero());

Because the generics system allows us to say a function takes two arguments that are exactly type T, instead of some subclass of type Plus, which is what you get with an interface.

Unfortunately, this technique doesn’t extend to more complicated uses of type classes because C# doesn’t support parametrised type variables. F# does, I think, and C++ does sort of by mistake because its generics are just text-based replace, which means type-checking happens a little later than in a real generics system: by the time the C++ type checker sees the program, all the type arguments have been instantiated. (That’s also why the error messages are so flagrantly unreadable.)

For example, here’s a monad type class in Haskell (note: these are not the Haskell names, but ones used elsewhere which don’t interfere with C# keywords):

class Monad m where
  bind :: m a -> (a -> m b) -> m b
  unit :: a -> m a

And here’s a translation to the type class encoding from above:

abstract class Monad<M> {
  M<A> Bind<A,B>(M<A> m, Func<A, M<B>> f);
  M<A> Unit<A>(A a);
}

But we get compile errors because M<A> is not legal in C#. I admit that Bind’s signature is pretty complicated, but here’s the catch: Bind is SelectMany from LINQ. This complicated type is how you can select from lists in one Linq statement, from XML in the next, and from SQL in another. So how does C# support Linq when interfaces don’t have the necessary power?

As far as I know, it just special cases monads, (that is, Linq implementations), doing ad-hoc checking when you use Linq to make sure that the object being selected from implements all the necessary methods. It just skips the normal interface machinery.

This is too bad. Monads are important, but there are plenty of other uses for M<A> where both M and A are type variables. Still, if you try hard enough, I think you can find other encodings. One attempt is here: I haven’t read and understood the code, but my understanding is there is an interface for M which is not parametrised by T. Then the implementation of M for some concrete type, say List, can then dynamically cast its parameter to List<T>.

1 2 3 4 5 6 7 8 9 10 11 ... 35 >>

blog soft