I just started work at Bing, so don’t expect many (any) blog posts for the next month or two.
Actually, the real reason is that I have stopped working on Fing until I can finish my dissertation revisions. There aren’t many—I estimate about a month’s worth—but who knows how much furniture/clothes/computer/phone shopping will interfere on the weekends. If there’s one thing I’ve learned during the move, it’s that serious shopping takes FOREVER. Also that it’s really tiring.
But that means that I won’t have anything super interesting to write about in the form of Fing code analysis. Or even anything at all, since I guess “super interesting” is about three notches about the normal fare for this blog.
Oh, and I just went through the Mandatory Online Legal Training today. So let me remind you: I do not speak for Bing or Microsoft on this blog. Even if I do discuss Bing, I do not do it in an official capacity, and my statements should not be taken to be indicative of any Bing policies or future directions. I’m not even going to tell you my microsoft e-mail address, except to say that it’s like 10 times better than the alphabet soup I got from Indiana University. I must have been at least the 23rd Nathan Sanders to go to IU.
And now that I know all about trademark law, who knows whether Fing will continue to be called Fing. I may have to change the name to F#$`ing after all.
It’s about time some changes were made around here.
The first you’ll notice is that the title of this site is no longer Journal, but Blogg. I didn’t like the word ‘blog’ when it was new, but I eventually got used to it. If I’m going to use it, though, it’s going to be the Scandinavian version.
Why Scandinavian? I just finished my doctorate with a dissertation on Swedish dialects. I don’t actually speak Swedish*, but I’ve picked up a little while getting my code to process it. So the change commemorates the fact that I am now (almost) officially Dr Sanders. Or Herr Doktor Sanders, a title bestowed upon students whose advisors are German.
The reason it’s (almost) official is that I have a few revisions to make on the dissertation. When those are done, I’ll upload the final version here. If you really want to see it before then, contact me.
Second, Fing 0.1 is out at github. There’s a binary in case you don’t want to download the source. There’s also a script to load Fing into fsi for interactive use. 0.1 can find functions even if you get the order of arguments wrong. You can also have it search your own assemblies, not just FSharp.Core. But type aliasing doesn’t work right yet, so if you rely on aliases with type variable arguments, you are out of luck until the next version.
The last announcement is that (if you didn’t already know), I’m starting a job at Bing at the beginning of July. Now is a good time to remind everyone that I don’t represent any of my employers, past, present or future, and that this blog is not connected officially or unofficially to Microsoft.
*Jag talar inte Svenska, med jag talar Swedish Chef, darfor vi har de har valdigt har bork bork bork!
So, the only parser left undefined in the previous parsers is typeSuffixP. It's used in typeP, which is the top-level single-type parser.
typeSuffixP = choice [arrayP <?> "an array type"
, tok "lazy" >> return (Generic (Id "lazy") . list)
, tok "when" >> constraintP
, (do id <- identP; return (Generic (Id id) . list))
<?> "a stupid suffix type"]
typeP = do
t <- termP
option t (return . nestTypes t . reverse =<< many typeSuffixP)
Type suffixes break down into four categories: .NET-style arrays, lazy values, constraints and Caml-style "stupid suffix types". Yeah. One reason I don't like them is that they were hard to parse in the first version of my code: it's hard to see where they end. Lazy values return the same structure as generics, because they really are just a special case of a generic type. They just get a keyword for some reason (probably left over from O'Caml). So that leaves only arrays and constraints.
arrayP = do dims <- between (tok "[") (tok "]") (many (tok ",")) return $ Array (1+length dims)
Arrays are simple, just brackets around some commas. A 2D array of ints is written int[,], a 4D one is int[,,,]. That leaves the most complicated for last: constraints. Constraints are a rabbit hole. Like I said last time, they take about 3 times the code of the rest of the parser, for types that will not occur the majority of the time. Oh well. I should stop complaining and show you constraintP. Remember, from typeSuffixP, we know this is a normal type followed by 'when' and then a constraint. For example, list<'a> when ....
constraintP = typevarChoiceP <|> singletypevar
where singletypevar = do
Var var <- typevarP
subtype var <|> (tok ":" >>
choice [tok "null"
>> return (Constraint (Null var))
, tok "struct"
>> return (Constraint (Struct var))
,try (toks (words "( new : unit -> ' T )")
>> return (Constraint (DefaultConstructor var)))
, toks (words "not struct")
>> return (Constraint (NotStruct var))
, enum var
, traitP var
, delegate var])
subtype var = do
tok ":>"
t <- arrowP
return $ Constraint (Subtype var t)
enum var = do
tok "enum"
t <- between (tok "<") (tok ">") arrowP
return $ Constraint (Enum var t)
delegate var = do
tok "delegate"; tok "<"
leftT <- arrowP
tok ","
rightT <- arrowP
tok ">"
return $ Constraint (Delegate var leftT rightT)
Yowza! That's kind of long. Uh...OK. The first choice is a high level choice between a single constraint or multiple constraints. I still don't understand the semantics of multiple constraints, but it looks like
list<'a> when ('a or 'b) : (read: 'a -> 'b)
Maybe. I'll get back to multiple constraints later. For now, look at singletypevar. It first parses a typevar (eg 'a, ^a or _), then parses the constraint on that typevar. There are eight kinds of constraints; the simple ones are included inside constraintP. The simplest is the subtype constraint, for example
list<'a:> when 'a :> Microsoft.Collections.IComparable
All the other constraints have an intervening tok ":". Most of these are quite simple, like the null, struct, not-struct and default-constructor constraints. (Disclaimer: I'm still not sure of the semantics of some of these, but they are definitely easy to parse). Enum and delegate constraints are almost as simple to parse. They look like
list<'a> when 'a : enum<int>
or
list<'a> when 'a : delegate<int,int>
That leaves one constraint that leads us further down the rabbit hole: traitP. Traits are structural typing shoehorned into a nominal type system, so they are naturally pretty complicated. In fact, the grammar calls for attributes, which can contain arbitrary expressions, to be allowed in the grammar at one point. At this point this parser does not support that, but I suppose some day it will have to in order to be correct.
The basic idea of trait is that you specify a method that a type must contain. The types do not need be explicitly related by an inheritance relation; all subtypes are implicitly related because they support the same set of methods. For example:
^a when ^a : (toString : ^a -> string)
Is the structural specification for "all types that have a toString method". O'Caml's object system works exactly this way, and Haskell's type classes are pretty close. It's not used much in F# because of the already existing nominal type system used for inheritance. It's basically type-checked duck typing, although from my fading memory of TAPL, there are some drawbacks in complexity of compile times and in the details of inheritance.
Aside: At least the F# compiler actually checks its structural types. C++ appears to support structural typing for its generics, but really it's just untyped text substitution. Type errors happen in the SECOND stage of compilation, where everything is a concrete type and has the longest, ugliest name in the world. C++ was supposed to get traits, but -sigh- they wanted to make sure it remained the unintentionally least friendly language in the world.
So anyway, that's two paragraphs explaining traits. Here is the parser for them:
traitP var =
between (tok "(") (tok ")") (memberSigP var)
memberSigP var = do
id <- identP
(vars,constraint) <- option ([],Nothing) typevarDefnsP
tok ":"
sig <- curriedSigP
property <- option Function accessorsP
let t = if null vars then Id id else Generic (Id id) vars
case constraint of
Nothing -> return $ Constraint (Sig var t sig property)
Just con -> return $ Constraint (Sig var
(Constraint (TyparConstraint con) t)
sig
property)
Let's see...a trait is a member sig for some var, surrounded by parentheses. The actual member sig starts off with an identifier, something like toString in the above example. Then there's the option to introduce MORE type variables, which can themselves have constraints. (More on that below.) Then there's the actual sig (called curriedSigP, you'll see why in a second), followed by an option modifier that specifies that the whole thing is actually a property instead of a method. In other words, you can say this:
^a when ^a : (Stringed : ^a -> string with get)
and the with get means that Stringed is a property, not a function. This is necessary because of the sugary nature of properties, hiding two functions behind one, etc. I suspect this makes it hard to pass properties around as first-class functions, but I don't have an F# REPL running in front me right now to check. Anyway, here accessorsP to show you how the with get business is parsed. It's super simple.
accessorsP = do
tok "with"
choice [try (toks (words "get , set") >> return GetSet)
, try (toks (words "set , get") >> return GetSet)
, tok "get" >> return Get
, tok "set" >> return Set]
Anyway, the real heart of the thing is curriedSigP. I feel a little bad about it, because it's basically a copy of the main grammar loop of arrowP/tupleP/typeP. The difference is that, um, instead of "arrowSigP", I named it "curriedSigP", which is a different way to say the same thing...oh no wait, it's the spec's fault. The spec calls it that. ANYWAY, here is the code:
curriedSigP = do
arg <- argsSpecP
tok "->"
args <- sepBy1 argsSpecP (tok "->")
return $ Arrow (arg:args)
argsSpecP = do
args <- sepBy1 argSpecP (tok "*")
case args of
[arg] -> return arg
args -> return $ Tuple args
argSpecP = do
argspec <- optionMaybe (try argNameSpecP)
t <- typeP
case argspec of
Just (optional,name) -> return $ NamedArg name t optional
Nothing -> return t
argNameSpecP = do
optional <- optionMaybe (tok "?")
argname <- identP
tok ":"
return $ (isJust optional, argname)
The first real difference from the main loop is that curriedSigP requires the toplevel type to be a function. It parses one argsSpecP, an arrow, then one or more argsSpecPs. That's because the structural typing of the F# compiler can't guarantee the presence of data members? I guess? That's a good thing from an OO perspective at least.
The second difference is that you can name the args, and named arguments are allowed to be optional. So, to continue with the toString example, you could name the argument:
^a when ^a : (toString : ?zelf : ^a -> string with get)
And I made zelf optional for kicks, even though that makes no sense here and probably wouldn't really compile.
Commentary: coming from strict and minimal languages like Haskell and Scheme, I think that optional arguments are kind of over-complicated, especially in a language that already has ad-hoc polymorphism via overloading. I'd rather have a smaller language and a smarter type inferencer. Because, you know, Haskell is sort of OO, and I can count on one hand the times I've had to supply type annotations. In F#, which is OO plus so much more, I have to drop in type annotations at least once a file, more if I interact with .NET code a lot.
OK, that leaves one last crazy constraint. Seriously, I'm not sure why you'd need a type this constrained. Here's an example:
^a when ^a : (something<'b,'c> : ^a * 'c -> 'b)
So, yeah, you can introduce new type variables. But that's not all! You can also introduce recursive constraints! Like so:
^a when ^a : (something<^b
when ^b : (ohNoNotAgain : ^b -> ^a)> : ^a -> ^b)
It's fully recursive, all right. You can structurally constrain the types that are bound in the scope of your structurally constrained methods. (Don't worry if you didn't understand that, you might not be human.)
typevarDefnsP = do
tok "<"
defns <- sepBy1 typevarP (tok ",")
constraints <- optionMaybe typevarConstraintsP
tok ">"
case (defns,constraints) of
(_,Nothing) -> return (defns, Nothing)
([defn],Just cs) -> return (defns,Just $ nestTypes defn cs)
(defns,Just cs) ->
return (defns,Just$nestTypes (Var (Choice (map devar defns))) cs)
typevarConstraintsP = do
tok "when"
sepBy1 constraintP (tok "and")
devar (Var var) = var
list x = [x]
The code is pretty straightforward, just brackets around some comma-separated type variables (typevarDefnsP), optionally followed by when and some and-separated constraints. It does some fancy processing when you do have constraints, but that's because I need a unique way to represent each one of these structures internally. It's a representation that's probably wrong: best not get too attached to them; they'll probably all have to change when I start actually USING them.
By the way, if you're hoping to get in there on a beta of Fing and start with some CRAZY RUDE constraints on structurally constrained types, you may be disappointed by the first few versions. My priority is to be able to search for obvious things first, like int -> seq<'a> -> 'a, then for subtype relations so that you can also search for list<'a>, ResizeArray<'a>, etc.
OK, one last thing: type var choices. These are, as far as I can tell, genuinely ambiguous. You can write something like
map<^a,^b> when (^a or ^b) : (toString : ^a -> string)
I think we can all agree that's pretty messed up. Is it 'a or 'b that has to have a toString member? Who knows? Put an or between 'em and call it done! There is one quirk, though: only structural constraints are allowed. So this is illegal (hilarious though it may be):
map<^a,^b> when (^a or ^b) : (^a :> ^b)
The code should be easy to read; it's a lot like the other parsers.
typevarChoiceP = do
typars <- between (tok "(") (tok ")") (sepBy typevarP (tok "or"))
let typars' = map devar typars
tok ":"
Constraint (Sig _ id member prop) _ <- liftM ($ Id "DUMMY") (traitP Anonymous)
return $ Constraint (Sig (Choice typars') id member prop)
So that wraps up the Parsec version of the parser. The entire thing is available to download. The FParsec version is a lot longer, because I had to write the low-level backtracking parser tok by hand there. (I might be able to fix it now, but I'm also sick of parsing now.) I will post the FParsec version at the top of my next post, where I'll start to look at the data structure that the parser builds.
That may be a little while, because my next step is to get a simple version running. This data structure is quite a bit more complicated than the toy version I got from kvb.
I may have reviewed Daxter in passing in my PSP review, but here’s a complete look at it.
Daxter is a PSP-based gaiden to the Jak and Daxter trilogy, which was supposed to have ended after three games. Certainly Naughty Dog’s involvement ended; the creators of Daxter are Ready At Dawn: Daxter is their first game. However, they did well enough with Daxter to go on to make the PSP port of God of War, which I have heard is quite good.
In the time-honoured tradition, Daxter is a cut-down portable clone, so if you’ve played Jak and Daxter, you have a really good guess what the game is like without Jak. I’ll just cover what’s missing. First, because the PSP isn’t as powerful as the PS2, the overworld is missing the people and vehicles that made navigation so fun in the second two games. Second, presumably because Ready At Dawn is a newish company without the funding of Naughty Dog, the gameplay is not as fun.
Why not? The biggest reason is that the tasks aren’t as varied or interesting; there are a couple of hovercraft levels, but most of the game is platforming much more in the style of the first Jak game than the second two. Even then, the level design is much more cramped and linear than the first Jak game—the levels reminded me a lot of Crash Bandicoot (as did the hero, come to think of it). A distant third is interface gaffes that are a sign of (1) new developers, (2) rushed developers or (3) Bioware developers*—a lot of interactions are perfunctory and assume a deep capacity to guess what a game designer might do. Basically, the completely opposite of playing a Mario game, where everything is well-lit and given the proper affordances.
As an aside: This contrasts greatly with the second two Jak games, which were front-runners in the kitchen-sink school of game design. Sure, the world was basically a rip-off of GTA, and none of the activities were new either, but crammed into one smoothly paced, well executed package, nobody could resist it. (At least, I couldn’t.)
While playing, I kept thinking “this is pretty impressive for a portable game", and it is, doubly so since the budget was probably miniscule compared to the PS2 Jak games. But that doesn’t change the fact that the game is a bit boring. It was a good diversion while visiting my parents for Thanksgiving, but that’s just a sign that I need to buy a laptop that can play any game besides Starcraft.
*Or any RPG designer that has never met a human. Origin games are even worse (but a lot older).
This post comes with a ton of disclaimers.
Here are the diffs in question:
Apply them by saying
patch viper-keym.el viper-keym.diff && patch viper-cmd.el viper-cmd.diff
So. I switched to viper-mode a couple of years ago after my vim-using friend told me the main reason vim is better is the vi keyset. I didn’t believe him, but I’ve mellowed in my old age: I like to check out both sides of the argument before arguing that one is right. After using vim to edit a few config files, I switched viper-mode on at Level 5 (Supreme Wizard), printed a cheat sheet and spent the next week learning the various keys. I haven’t gone back except for 3 months at Microsoft because Visual Studio doesn’t have a built-in vi emulation (although viemu is a commercial plugin I’ve heard good things about).
It turns out that for non-Lisp languages, the vi keyset works better. There are two reasons: first, most key combinations are sequential, not simultaneous. You type d then e, not Alt-d. This is easier on your hands and provides an illusion of speed. Second, and more important, is that the Emacs keys revolve mostly around s-expressions. This is Good and Right if you are using a s-exp language, like Emacs Lisp, Common Lisp, Scheme, Clojure, Arc, or even something similar like an XML-based language. You can approximate structured editing by using commands that cut/paste/transpose s-expressions.
In contrast, vim thinks of documents as an ad-hoc collection of text. It knows about s-expressions, but its primary operations are much closer to regular expressions. Vi’s approach is Bad and Right: it turns out that as soon as you introduce complex syntax, the complexity of editing goes way up, and the only way to make edit operations fast again is to make them complex. Hence vi’s fearsome learning curve.
Since these days I mostly use Python, Haskell and C++, the vi learning curve is worthwhile because it speeds up editing the complex syntaxes of these languages. After a year or so, though, I noticed some inconsistencies. This shouldn’t be surprising; vi has SO many closely related ad-hoc commands. At least some of those fit together in sub-optimal ways, similar to any other complex system.
But! I don’t use vim, I use viper-mode. That means that (1) I don’t care about compatibility with other instances of vim and (2) I can dig around in the source and fix the inconsistencies myself. Of course you can do (2) in vim too, but I don’t think vim users expect to carry a customisation directory around for life. Emacs users do, largely because vanilla emacs has only slightly better defaults than vanilla Windows. (Now that’s an idea…carry around a directory full of .reg files forever in order to make new instances of Windows behave)
So I fixed three things: swap p and P, make ftFT go one character further and make eE go one character further.
The first thing I fixed was to swap p and P. I don’t see why the default pastes one character AFTER where the cursor is. The only good feature of p is line-based: ddp deletes the current line and pastes it after the one below, effectively giving you a transpose-line command. Of course that’s just C-x C-t in emacs (I just found out), so even that feature is dubious in viper. In any case, you can still get it via ddP.
Swapping p and P is easy. In viper-keym.el, swap the symbols ‘viper-put-back and ‘viper-Put-back. There’s probably a customisation hook for this but I didn’t read enough of the documentation to find out.
--- viper-keym.original.el 2009-12-18 09:18:38.000000000 -0500 +++ viper-keym.el 2009-12-18 09:18:57.000000000 -0500 @@ -400,7 +400,7 @@ (define-key viper-vi-basic-map "M" 'viper-window-middle) (define-key viper-vi-basic-map "N" 'viper-search-Next) (define-key viper-vi-basic-map "O" 'viper-Open-line) -(define-key viper-vi-basic-map "P" 'viper-Put-back) +(define-key viper-vi-basic-map "P" 'viper-put-back) (define-key viper-vi-basic-map "Q" 'viper-query-replace) (define-key viper-vi-basic-map "R" 'viper-overwrite) (define-key viper-vi-basic-map "S" 'viper-substitute-line) @@ -435,7 +435,7 @@ (define-key viper-vi-basic-map "m" 'viper-mark-point) (define-key viper-vi-basic-map "n" 'viper-search-next) (define-key viper-vi-basic-map "o" 'viper-open-line) -(define-key viper-vi-basic-map "p" 'viper-put-back) +(define-key viper-vi-basic-map "p" 'viper-Put-back) (define-key viper-vi-basic-map "q" 'viper-nil) (define-key viper-vi-basic-map "r" 'viper-replace-char) (define-key viper-vi-basic-map "s" 'viper-substitute)
The second was to change f and t to move one character further. ftFT do the right thing in combination with a region command like dcyr: dt; deletes everything before the semi-colon in C++, while df; deletes the whole statement including the semi-colon. However, t; does not move you to the semi-colon; it moves you one character before the semi-colon. So if you are parenthesising an expression, t;r) does the wrong thing. You need t;lr) or f;r). So t is basically useless. It would be much more useful to make tf each go one character further so that f; takes you past the C++ statement while t; takes you to the end of it.
This change is not trivial, although it’s still pretty easy. First, modify the central function that runs ftFT. It adjusts the caret forward or backward by some offset after the find is done. Before, it was hard-coded. Now the individual ftFT functions pass in the offset.
@@ -3197,6 +3193,10 @@
;; Find ARG's occurrence of CHAR on the current line.
;; If FORWARD then search is forward, otherwise backward. OFFSET is used to
;; adjust point after search.
+;; original type :: int * char * bool * bool
+;; modified type :: int * char * bool * int
+;; offset is now primarily determined in the calling functions varying on
+;; whether the function was called interactively
(defun viper-find-char (arg char forward offset)
(or (char-or-string-p char) (error ""))
(let ((arg (if forward arg (- arg)))
@@ -3243,8 +3243,8 @@
(error "Command `%s': `%c' not found" cmd char))))
(goto-char point)
(if (> arg 0)
- (backward-char (if offset 2 1))
- (forward-char (if offset 1 0)))))
+ (backward-char (1+ offset))
+ (forward-char offset))))
Next, modify the individual ftFT functions. This is a matter of changing viper-f-offset from nil to (if com 0 -1) for f or t to (if com 1 0) for t. See the diff for all four functions.
@@ -3259,7 +3259,7 @@
;; this means that the function was called interactively
(setq viper-f-char (read-char)
viper-f-forward t
- viper-f-offset nil)
+ viper-f-offset (if com 0 -1))
;; viper-repeat --- set viper-F-char from command-keys
(setq viper-F-char (if (stringp cmd-representation)
(viper-seq-last-elt cmd-representation)
@@ -3268,7 +3268,7 @@
(setq val (- val)))
(if com (viper-move-marker-locally 'viper-com-point (point)))
(viper-find-char
- val (if (> (viper-p-val arg) 0) viper-f-char viper-F-char) t nil)
+ val (if (> (viper-p-val arg) 0) viper-f-char viper-F-char) t (if com 0 -1)
)
(setq val (- val))
(if com
(progn
The third change was to make e move one character further. This problem is similar to ftFT: e would complement w much better if it actually moved past the end of the word. I guess somebody might want to move to the last letter of each word, but I’m not one of them, so e is useless to me as shipped.
This change is easy to implement: just remove the viper-backward-char-carefully at the end of the viper-end-word. In other words, viper’s author had to add the extra command to make e useless so that it would be backward compatible with vi.
@@ -2925,8 +2925,7 @@
(viper-skip-all-separators-forward))
(cond ((viper-looking-at-alpha) (viper-skip-alpha-forward "_"))
- ((not (viper-looking-at-alphasep)) (viper-skip-nonalphasep-forward)))
- (viper-backward-char-carefully))
+ ((not (viper-looking-at-alphasep)) (viper-skip-nonalphasep-forward))))
(defun viper-end-of-word-p ()
(or (eobp)
@@ -2949,7 +2948,6 @@
(viper-loop val (viper-end-of-word-kernel))
(if com
(progn
- (forward-char)
(viper-execute-com 'viper-end-of-word val com)))))
(defun viper-end-of-Word (arg)
@@ -2961,11 +2959,9 @@
(if com (viper-move-marker-locally 'viper-com-point (point)))
(viper-loop val
(viper-end-of-word-kernel)
- (viper-skip-nonseparators 'forward)
- (backward-char))
+ (viper-skip-nonseparators 'forward))
(if com
(progn
- (forward-char)
(viper-execute-com 'viper-end-of-Word val com)))))
(defun viper-backward-word-kernel (val)
You should also remove corresponding forward-char inside viper-end-of-word and viper-end-of-Word. This call undoes the backward movement so that e once again works properly with the region commands.
You may notice a common theme here; all three are off-by-one errors. They may have arisen for historical reasons; I think cursor handling worked differently on old terminals. It may be that the pipe cursor used to display to the right of the current character and that it displays to the left now. In that case, the behaviour of everything except t and e makes more sense. t and e are still useless as shipped. Of course I first suspected that the difference might be viper, but I checked all three behaviours in vim and they exist there too.