Pages: << 1 ... 4 5 6 7 8 9 10 11 12 13 14 ... 64 >>
This post comes with a ton of disclaimers.
Here are the diffs in question:
Apply them by saying
patch viper-keym.el viper-keym.diff && patch viper-cmd.el viper-cmd.diff
So. I switched to viper-mode a couple of years ago after my vim-using friend told me the main reason vim is better is the vi keyset. I didn’t believe him, but I’ve mellowed in my old age: I like to check out both sides of the argument before arguing that one is right. After using vim to edit a few config files, I switched viper-mode on at Level 5 (Supreme Wizard), printed a cheat sheet and spent the next week learning the various keys. I haven’t gone back except for 3 months at Microsoft because Visual Studio doesn’t have a built-in vi emulation (although viemu is a commercial plugin I’ve heard good things about).
It turns out that for non-Lisp languages, the vi keyset works better. There are two reasons: first, most key combinations are sequential, not simultaneous. You type d then e, not Alt-d. This is easier on your hands and provides an illusion of speed. Second, and more important, is that the Emacs keys revolve mostly around s-expressions. This is Good and Right if you are using a s-exp language, like Emacs Lisp, Common Lisp, Scheme, Clojure, Arc, or even something similar like an XML-based language. You can approximate structured editing by using commands that cut/paste/transpose s-expressions.
In contrast, vim thinks of documents as an ad-hoc collection of text. It knows about s-expressions, but its primary operations are much closer to regular expressions. Vi’s approach is Bad and Right: it turns out that as soon as you introduce complex syntax, the complexity of editing goes way up, and the only way to make edit operations fast again is to make them complex. Hence vi’s fearsome learning curve.
Since these days I mostly use Python, Haskell and C++, the vi learning curve is worthwhile because it speeds up editing the complex syntaxes of these languages. After a year or so, though, I noticed some inconsistencies. This shouldn’t be surprising; vi has SO many closely related ad-hoc commands. At least some of those fit together in sub-optimal ways, similar to any other complex system.
But! I don’t use vim, I use viper-mode. That means that (1) I don’t care about compatibility with other instances of vim and (2) I can dig around in the source and fix the inconsistencies myself. Of course you can do (2) in vim too, but I don’t think vim users expect to carry a customisation directory around for life. Emacs users do, largely because vanilla emacs has only slightly better defaults than vanilla Windows. (Now that’s an idea…carry around a directory full of .reg files forever in order to make new instances of Windows behave)
So I fixed three things: swap p and P, make ftFT go one character further and make eE go one character further.
The first thing I fixed was to swap p and P. I don’t see why the default pastes one character AFTER where the cursor is. The only good feature of p is line-based: ddp deletes the current line and pastes it after the one below, effectively giving you a transpose-line command. Of course that’s just C-x C-t in emacs (I just found out), so even that feature is dubious in viper. In any case, you can still get it via ddP.
Swapping p and P is easy. In viper-keym.el, swap the symbols ‘viper-put-back and ‘viper-Put-back. There’s probably a customisation hook for this but I didn’t read enough of the documentation to find out.
--- viper-keym.original.el 2009-12-18 09:18:38.000000000 -0500 +++ viper-keym.el 2009-12-18 09:18:57.000000000 -0500 @@ -400,7 +400,7 @@ (define-key viper-vi-basic-map "M" 'viper-window-middle) (define-key viper-vi-basic-map "N" 'viper-search-Next) (define-key viper-vi-basic-map "O" 'viper-Open-line) -(define-key viper-vi-basic-map "P" 'viper-Put-back) +(define-key viper-vi-basic-map "P" 'viper-put-back) (define-key viper-vi-basic-map "Q" 'viper-query-replace) (define-key viper-vi-basic-map "R" 'viper-overwrite) (define-key viper-vi-basic-map "S" 'viper-substitute-line) @@ -435,7 +435,7 @@ (define-key viper-vi-basic-map "m" 'viper-mark-point) (define-key viper-vi-basic-map "n" 'viper-search-next) (define-key viper-vi-basic-map "o" 'viper-open-line) -(define-key viper-vi-basic-map "p" 'viper-put-back) +(define-key viper-vi-basic-map "p" 'viper-Put-back) (define-key viper-vi-basic-map "q" 'viper-nil) (define-key viper-vi-basic-map "r" 'viper-replace-char) (define-key viper-vi-basic-map "s" 'viper-substitute)
The second was to change f and t to move one character further. ftFT do the right thing in combination with a region command like dcyr: dt; deletes everything before the semi-colon in C++, while df; deletes the whole statement including the semi-colon. However, t; does not move you to the semi-colon; it moves you one character before the semi-colon. So if you are parenthesising an expression, t;r) does the wrong thing. You need t;lr) or f;r). So t is basically useless. It would be much more useful to make tf each go one character further so that f; takes you past the C++ statement while t; takes you to the end of it.
This change is not trivial, although it’s still pretty easy. First, modify the central function that runs ftFT. It adjusts the caret forward or backward by some offset after the find is done. Before, it was hard-coded. Now the individual ftFT functions pass in the offset.
@@ -3197,6 +3193,10 @@
;; Find ARG's occurrence of CHAR on the current line.
;; If FORWARD then search is forward, otherwise backward. OFFSET is used to
;; adjust point after search.
+;; original type :: int * char * bool * bool
+;; modified type :: int * char * bool * int
+;; offset is now primarily determined in the calling functions varying on
+;; whether the function was called interactively
(defun viper-find-char (arg char forward offset)
(or (char-or-string-p char) (error ""))
(let ((arg (if forward arg (- arg)))
@@ -3243,8 +3243,8 @@
(error "Command `%s': `%c' not found" cmd char))))
(goto-char point)
(if (> arg 0)
- (backward-char (if offset 2 1))
- (forward-char (if offset 1 0)))))
+ (backward-char (1+ offset))
+ (forward-char offset))))
Next, modify the individual ftFT functions. This is a matter of changing viper-f-offset from nil to (if com 0 -1) for f or t to (if com 1 0) for t. See the diff for all four functions.
@@ -3259,7 +3259,7 @@
;; this means that the function was called interactively
(setq viper-f-char (read-char)
viper-f-forward t
- viper-f-offset nil)
+ viper-f-offset (if com 0 -1))
;; viper-repeat --- set viper-F-char from command-keys
(setq viper-F-char (if (stringp cmd-representation)
(viper-seq-last-elt cmd-representation)
@@ -3268,7 +3268,7 @@
(setq val (- val)))
(if com (viper-move-marker-locally 'viper-com-point (point)))
(viper-find-char
- val (if (> (viper-p-val arg) 0) viper-f-char viper-F-char) t nil)
+ val (if (> (viper-p-val arg) 0) viper-f-char viper-F-char) t (if com 0 -1)
)
(setq val (- val))
(if com
(progn
The third change was to make e move one character further. This problem is similar to ftFT: e would complement w much better if it actually moved past the end of the word. I guess somebody might want to move to the last letter of each word, but I’m not one of them, so e is useless to me as shipped.
This change is easy to implement: just remove the viper-backward-char-carefully at the end of the viper-end-word. In other words, viper’s author had to add the extra command to make e useless so that it would be backward compatible with vi.
@@ -2925,8 +2925,7 @@
(viper-skip-all-separators-forward))
(cond ((viper-looking-at-alpha) (viper-skip-alpha-forward "_"))
- ((not (viper-looking-at-alphasep)) (viper-skip-nonalphasep-forward)))
- (viper-backward-char-carefully))
+ ((not (viper-looking-at-alphasep)) (viper-skip-nonalphasep-forward))))
(defun viper-end-of-word-p ()
(or (eobp)
@@ -2949,7 +2948,6 @@
(viper-loop val (viper-end-of-word-kernel))
(if com
(progn
- (forward-char)
(viper-execute-com 'viper-end-of-word val com)))))
(defun viper-end-of-Word (arg)
@@ -2961,11 +2959,9 @@
(if com (viper-move-marker-locally 'viper-com-point (point)))
(viper-loop val
(viper-end-of-word-kernel)
- (viper-skip-nonseparators 'forward)
- (backward-char))
+ (viper-skip-nonseparators 'forward))
(if com
(progn
- (forward-char)
(viper-execute-com 'viper-end-of-Word val com)))))
(defun viper-backward-word-kernel (val)
You should also remove corresponding forward-char inside viper-end-of-word and viper-end-of-Word. This call undoes the backward movement so that e once again works properly with the region commands.
You may notice a common theme here; all three are off-by-one errors. They may have arisen for historical reasons; I think cursor handling worked differently on old terminals. It may be that the pipe cursor used to display to the right of the current character and that it displays to the left now. In that case, the behaviour of everything except t and e makes more sense. t and e are still useless as shipped. Of course I first suspected that the difference might be viper, but I checked all three behaviours in vim and they exist there too.
After working at Bing, I understand the scientific computing term “embarrassingly parallel". It means that when you describe your dissertation code to somebody, you’re embarrassed to admit that it’s not parallelised.
My dissertation measures syntactic distances between different villages in Sweden. The distance between each pair of villages is calculated independent of all the rest, so the second question the other Microsofties asked me was always, “How do you parellelise the problem?". Well, I wasn’t. My excuse was that our experimental machine only has two processors and I wanted to leave one free for other people to use.
But it preyed on my mind all through the internship at Bing. I didn’t have time to do anything about it in November because I was busy with my proposal. Finally, running out of things I could do unwired over Christmas vacation at my parents, I looked up my offline copy of the Python documentation and wrote a function to run multiple files. Here it is.
import subprocess
def multirun(n, tasks, files):
processes = [subprocess.Popen(tasks[i], stdout=file(files[i],'w'))
for i in range(n)]
i = n
while processes != []:
subprocess.Popen(['sleep', '1']).wait()
processes, dones = partition(lambda p:p.poll() is None, processes)
if i < len(tasks):
for _ in dones:
print("Starting", ' '.join(tasks[i]))
processes.append(subprocess.Popen(tasks[i],
stdout=file(files[i], 'w')))
i += 1
def partition(f, l):
yes = []
no = []
for x in l:
if f(x):
yes.append(x)
else:
no.append(x)
return (yes,no)
You pass multirun the number of processors to use, a list of tasks, and a list of output files to redirect stdout to. Each task is a list of strings, with the first item the command and subsequent items its arguments. (I sincerely suspect that this is already built-in to Python 3, or at least a library, but I didn’t have internet access, so I just wrote multirun based on the 2.5 documentation on my laptop. If you know the standard equivalent, please tell me in the comments.)
Now that I’m back at work on my experiment, I ran this code on our new 8-core Mac Pro server ("if you have to ask how much, you can’t afford it"). It’s a late 2008 machine, 5 years newer than the 2-core Power Mac that the CL group usually runs experiments on.
The difference is amazing: what took 2 hours is down to 2 minutes. My entire program runs in 4 minutes. The new machine is 60 times faster. 10 times of that difference is the speed of each chip. The other 6 times is the fact that my code uses 6 of the 8 cores. That means, if you write parallel code, chips are still doing better than the popular version of Moore’s law: in 5 years speed doubled 6 times, for a doubling period of 10 months, not 18.
In case you are curious, here is the code that uses multirun. (swediaSites is a list of sites in the swedia corpus, imported from a module of constants.)
def pairwise(l):
return [(x,y) for i,x in enumerate(l) for y in l[i+1:]]
### runner ###
def runner(feature):
params = open('params.h','w')
params.write('#define ITERATIONS 100\n')
params.write('#define SAMPLES 1000\n')
params.write('#define R_MEASURE r')
params.close()
os.system('g++ -O2 -o ctrl.out params.h icectrl.cpp')
suffix = '-' + feature + '.dat'
ctrl = "nice -n 6 ./ctrl.out".split()
pairs = pairwise(swediaSites)
tasks = [ctrl + [sq(fro+suffix), sq(to+suffix)] for (fro,to) in pairs]
files = ['dist-%s-%s-tmp.txt' % (fro,to) for (fro, to) in pairs]
return (tasks,files)
def sq(s):
return "'" + s + "'"
def combine(feature):
"Combine the disparate output files into one"
out = 'dist-100-1000-r-%s-interview.txt' % (feature,)
pairs = pairwise(swediaSites)
files = ['dist-%s-%s-tmp.txt' % (fro,to) for (fro,to) in pairs]
outf = open(out, 'w')
for file in files:
outf.write(open(file).read())
outf.close()
def syntaxDist():
multirun(6, *runner('path'))
combine('path')
multirun(6, *runner('trigram'))
combine('trigram')
multirun(6, *runner('dep'))
combine('dep')
I’ve been using Haskell for a little over a year and I’ve used Python (off and on) since 2003. Both feature heavily in my dissertation experiment. Now that I’ve used Haskell enough, I think I can write idiomatic programs in both languages. Since, as a linguist, one of my prurient interests is comparative linguistics, I enjoy writing the same thing in both Python and Haskell to see which is better.
The annoying part is that the comparison doesn’t work between Python and Haskell. In the old days I could write something in Java and Python and Python would clearly smoke Java. Or I could write a functional program in Python and Scheme and Scheme would win, except for the parts of functional programming that Guido likes and has added to Python. ("Python: As much Lisp as a C programmer can understand.") Even then I considered that Python cheated because I could write list comprehensions in Scheme if I wanted—everybody else had certainly tried.
But then a couple of things happened. Dan Friedman ruined my faith in Scheme AND Python AND any language, really. In Friedman’s class, you have to get into the head of a language designer. That’s different from the approach of a linguist or a programmer. A linguist tries to describe how the language operates. A programmer critiques how well the parts fit together. Neither have to worry about how to make the language work, or the process of building it.
Have I told this story before? I feel like I have. Maybe I should search backwards in my blog and just post a link. Or maybe not. I’m stuck at my parent’s house without internet access, so you get to suffer through it again. This is the Internet, you can leave if you want to.
Anyway, making yourself a language designer inverts everything. Everything else becomes the toy language and Scheme is the bedrock tool. All the language features that programmers rely on and linguists analyse are useless distractions because you’re writing your own language and you need to CONCENTRATE. All the effort other language implementers spent on inheritance or backtracking or whatever is wasted when it could have been better spent eliminating tail calls: making the compiler/runtime handle byzantine generated code. In the end you are basically left with Scheme and C. (Note how they’re both old, simple languages.) Since Scheme is functional, it’s a better starting place than C for building a functional language**.
The other thing that Friedman’s class did was push me ever further into writing pure functional programs. I had been moving that direction since 2002, ever since reading On Lisp, but 2005 was the year that main was the only code with read and write in it. After Friedman’s class, I switched back to Python because of Scheme’s culture and libraries* and because of the IU comp ling group’s Python culture. I wrote really functional programs in Python. And they were super ugly. Slow, too. Not many bugs, though, and once you understood them, very easy to modify.
So here I was, writing purely (?) functional programs in Python. Very dense, very fast to write, but ugly and hard to pick up after 6 months. I knew something was wrong. The two biggest pain points were the complex data structures and ugliness of advanced functional features. The functional code was dense, but hard to parse visually. But I couldn’t switch back to Scheme because it just pretends the outside world doesn’t exist. You could write an HPSG parser (for example) in either language, but a file-munging script only in Python–it would be 1/4 the size of the Scheme script, which would have to use all sorts of partly documented distro-specific functions.
The tipping point was when I wrote a type-checking wrapper. The second time I spent half a day figuring out the inputs and outputs for 6-month-old functions, I wrote down all the types in comments. The syntax was close to Haskell; I could read Haskell ever since the beginning of 2004 when I got a bunch of free books at a conference, so I knew the type annotations. Then I realised that, because classes are first-class Python objects, I could write the types in my program and have them evaluated. My code changed from
# read :: file * [str] -> [int]
def read(f, regions):
:
to
@check(file, [str], [int])
def read(f, regions):
:
Naturally, I knew this was horrifying. And I knew that I should probably just use Haskell. But I didn’t want to switch because I knew Haskell had the same problems as Scheme; no built-ins for scripting and weak libraries besides.
So I didn’t switch that summer. I growled and grumbled and wrote more dense functional Python. Finally, at the end of the summer, I was looking into an alternate measure of distance between trees. It had been a while since I had to write tree code. And you know there is really only one way to write tree code, and that’s recursively. But primitive recursion is so ugly in Python. I couldn’t bring myself to do it, so I wrote out, on paper, the Haskell to implement this measure. It was so nice. Like Scheme, except with destructuring that I could trust! I still had ghc installed on peregrin so I typed it in when I got back to the office.
At that point I decided I had to bite the bullet and switch. I had the perfect project coming up, too: I was starting on a qualifying paper in computational phonology, which is about as far from the outside world as you can get. It’s just complex list processing, which plays to Haskell’s strengths rather than its weaknesses. I had been worried about how to present the algorithm in the paper—I don’t like math notation much, and I didn’t think I could get away with Python. I didn’t want to use Python in any case, because the existing phonology code I had was the same dense, ugly stuff I had been writing for the last few years. Even if I put it in there, I wouldn’t blame my advisors for not wanting (or being able) to read it.
Not only is functional code pretty in Haskell, Haskell has type annotations. So I could annotate each snippet with its type, making it easier (in my opinion) to follow than math notation. I ported the existing algorithms to Haskell and started to work on my new algorithm.
The rest is history. (I always wanted to say that.)
Of course I still write a good bit of Python; my syntax distance code uses it for glue code, but not complex algorithms. And the CL group at IU still uses it as a lingua franca. (Markus and Mike like it and Sandra has learnt it, and is now teaching it, so all the new students know it.) For that reason, when I tried implementing the Chu-Liu-Edmonds dependency parser over Christmas weekend, I made a quick translation to Python after I finished the Haskell draft. I thought, “this will be like the old days when I back-translated code to Java to see how much better Python was".
But it wasn’t like that at all. It was very disappointing. The Python code was longer, but only by 20-30% more lines, and the byte count was nearly the same. Admittedly, the code was a Haskell translation, but I tried to be as idiomatic as I could. I even had a class! More than anything, the languages are just different: things that are short in Haskell are long in Python, and vice versa. The Python is easier to understand—from an imperative standpoint. The Haskell is easier to understand from a functional standpoint. So there’s no clear winner like I was hoping for.
I guess those days really are gone forever. And it’s all Dan Friedman’s fault. He’s the one who forced us all in 511 to see that all languages are broken and inadequate, each in their own way.
*PLT has them, but they’re low quality–everybody else is minimalist because that’s the intention of Scheme. Maybe R6RS will someday fix this. I doubt it. Scheme is perfect the way it was.
**Also it has a REPL.
Now that Josh, a fellow CL student admin, has been using git for at least six months, I feel confident in doing so myself, because now I can just offload complaints and questions to him. Anyway, I've been using subversion for a year or so for very simple one-person source control, but it's a real pain to work from my parent's house with limited internet—I have no interest in installing RIM's USB tethering drivers on my ancient Powerbook. The other option is to come back to school and commit a giant patch that touches every file.
So I want to get back to distributed source control. I used darcs for a few years, but it lost the ability to pull patches to very out-of-date repos in a "reasonable" amount of time. That's scary, so I dropped it completely and went back to subversion. I knew git and hg were both gaining wide support at the time, but I wanted something that was stupidly reliable, and as a one-person dissertation-writing team, I don't need anything beyond revision control really.
Warning: These instructions worked for me on an Intel Mac running OS X 10.5 and a PPC Mac running 10.4 Server. Both have MacPorts installed and working as well as it ever does. In other words, I am not an expert. These instructions may be wrong. This is just the information I used to get git working.
I learned about git reset --hard from the warning git printed and this discussion.
I also found a very nice git crash course for svn users, and instructions on how to use opendiff (aka FileMerge) for viewing diffs on OS X.
As a public service, here is the metric time of day. You can also type in the text boxes to convert metric time to English time or back again.
There are 100,000 metric seconds in 1 earth day. Therefore 1 metric second is 0.864 English second. Metric time is useful to avoid needless unit conversions when measuring time and to assist in the transition to off-planet living. The current notation for time-of-day is kiloseconds: 50.000ks is noon, 45ks is time for a mid-morning snack. 1 kilosecond is 14:24, or almost a quarter of an hour.