Everything's Beta

things I don't get to do at work :)

Archive for the ‘code’ Category

Quick starting Scalatra or In which I discover Scalatra and sbt

without comments

Over the weekend, I was researching various frameworks for implementing a REST API. Although I had already started the implementation using Tornado, I wanted to see what else was out there.

And am I glad I looked. I discovered Scalatra which seems to be exactly what I was looking for; a lightweight, sinatra-esque way to map URLs to actions that easily lends itself to testing. I especially like the uber-readable way the tests are written.

Who wouldn’t want to write tests like this?

// taken from http://github.com/alandipert/step
class MyScalatraServletTests extends FunSuite with ShouldMatchers with ScalatraTests {
  // `MyScalatraServlet` is your app which extends ScalatraServlet
  route(classOf[MyScalatraServlet], "/*")
 
  test("simple get") {
    get("/path/to/something") {
      status should equal (200)
      body should include ("hi!")
    }
  }
}

I cloned the repo, ran the examples and decided my search had ended.

However, while running the example was easy enough, I wasn’t sure of how to get started with an actual app. It looked especially cryptic since I haven’t ever used maven or sbt. I even considered bailing on scalatra for the well-known shores of tornadoweb. But, since I recently started working with Java at work, I decided to stick it out.

I’m glad I did because sbt is a pleasure to use, especially if you take the time to RTFM.

Anyway, here are the basic steps to help cut down the 0 to 60 time when starting scalatra :)

Pre-reqs:
Install java, I have java 1.6.
Setup sbt as per these instructions.

Good, now we are ready to start.
Create a new sbt project “HelloScalatra”

 mkdir HelloScalatra
 cd HelloScalatra
 sbt
# fill out the other inputs as you want but make sure you enter 2.8.0 as scala version.

Create a project definition for sbt and save it under project/build. Here’s a barebones one, refer to the docs if you want more info on creating sbt build configs.

// save as project/build/HelloScalatraBuild.scala
import sbt._
class HelloScalatraBuild(info: ProjectInfo) extends DefaultWebProject(info)
{
  // scalatra
  val sonatypeNexusSnapshots = "Sonatype Nexus Snapshots" at
"https://oss.sonatype.org/content/repositories/snapshots"
  val sonatypeNexusReleases = "Sonatype Nexus Releases" at
"https://oss.sonatype.org/content/repositories/releases"
  val scalatra = "org.scalatra" %% "scalatra" % "2.0.0-SNAPSHOT"
 
  // jetty
  val jetty6 = "org.mortbay.jetty" % "jetty" % "6.1.22" % "test"
  val servletApi = "org.mortbay.jetty" % "servlet-api" %
"2.5-20081211" % "provided"
}

Tell sbt to account for the new dependencies:

#from project root
sbt update

We have the basic dependencies taken care of, so let’s create our class that will serve requests. Create “HelloScalatra.scala” under src/main/scala/com/helloscalatra with the following content:

// save as src/main/scala/com/helloscalatra/HelloScalatra.scala
package com.helloscalatra
 
import org.scalatra._
 
class HelloScalatra extends ScalatraServlet with UrlSupport {
 
 before {
   contentType = "text/html"
 }
 
 get("/") {
   <html>
     <head>
       <title> My first scalatra webapp</title>
     </head>
     <body>
       <h1> Hello Scalatra </h1>
     </body>
   </html>
 }
 
 protected def contextPath = request.getContextPath
}

The last thing we need to do is setup web.xml to tell jetty what to do:

// save as src/main/webapp/WEB-INF/web.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE web-app
 PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
 "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd">
<web-app>
 
 <servlet>
   <servlet-name>HelloScalatra</servlet-name>
   <servlet-class>com.helloscalatra.HelloScalatra</servlet-class>
 </servlet>
 
 <servlet-mapping>
   <servlet-name>HelloScalatra</servlet-name>
   <url-pattern>/*</url-pattern>
 </servlet-mapping>
</web-app>

Great! We should be good to go :)
Go to project root and startup the sbt console:

sbt

and start jetty:

>jetty-run

TADA!

Navigate over to localhost:8080 to see the webapp in action.

Written by srijak

July 18th, 2010 at 4:43 pm

Posted in code

Tagged with

Get updates when your server’s ip changes

without comments

It’s a simple problem; I want to get updates when my home server’s ip address changes. It’s such a simple and pervasive problem that I’m sure there are loads of solutions out there. But, I figured it would be faster/easier to roll my own rather than evaluate+integrate w/e else is out there.
My first instinct at a solution comprised of 2 parts:

  1. Poll for changes to self.ip
  2. Publish updated ip using/to w/e

Let’s tackle part1. The main question here is how the server should get its external ip. The first thing that came to mind is a web page that echoes the visitor’s ip. So I wrote a simple Google appengine app that echoes your ipv4 address when you hit it. My script on the server GETs its external ip by hitting http://my-ipaddr.appspot.com/ and if its changed publishes it using the passed-in function. Here’s the code:

# ipwatcher.py
import urllib2
import socket
import time
 
root_url = 'http://my-ipaddr.appspot.com'
 
def get_ip():
    ip = urllib2.urlopen(root_url).read()
    try:
        socket.inet_aton(ip)
        return ip
    except socket.error:
        return None
 
def watch(publisher):
    if not callable(publisher):
        raise Exception('The publisher needs to be a callable function')
 
    last_ip = None
    while 1:
        current_ip = get_ip()
        if current_ip != last_ip:
            publisher(current_ip)
            last_ip = current_ip
 
        time.sleep(5*60)
 
 
def publish(ip):
    # publish however
    print ip
 
if __name__ == '__main__':
    watch(publish)
 
 
# google app engine part
# simpler than simple. It's just here for completeness
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
 
class MainPage(webapp.RequestHandler):
    def get(self):
        # echo user ip
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write(self.request.remote_addr)
 
application = webapp.WSGIApplication(
                                     [('/', MainPage)],
                                     debug=True)
def main():
    run_wsgi_app(application)
 
if __name__ == "__main__":
    main()

Yeah, it’s crummy that I do GET requests every 5 minutes, but with the GAE cap at 500requests/s I’m not too worried yet. I don’t know if there is a better way to get the external IP, but as this works and it’s a means to an end, I’m going to resist getting distracted.

So, part1 done with no sweat. Let’s move on to part2 where we actually publish the ip. My first instinct was just to use email. But I get too many emails anyway. Second I thought of scp or ftp to push the ip out. Meh. Been there done that.

Since I have something running on GAE, why not use it? Let’s do it.
GAE’s datastore makes it brain dead easy way to persist data. We could just do this:

 
name = "my-ip"
class IpAddress(db.Model):
    ip = db.StringProperty(required=True)
    name =  db.StringProperty(required=True)
 
class MainPage(webapp.RequestHandler):
    def get(self):
        # echo user ip
        ip = IpAddress.gql("where name = :1", name).get()
        if not kv:
           ip = IpAddress(ip = self.request.remote_addr, name = name)
           ip.put() 
        else:
            ip.ip =  self.request.remote_addr
            ip.put()
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write(ip.ip)

This would probably work just fine. At least until random people/bots start hitting it. But, with maybe a hundred views a week on my blog, I don’t forsee too many people hitting the random appspot site.

However, since I am playing around with the GAE, might as well make it a little more interesting. Let’s try to make a simple key value store.

class KeyValue(db.Model):
    name = db.StringProperty(required=False)
    passwd = db.StringProperty(required=False)
    value = db.TextProperty(required=False)
    modified = db.DateProperty(auto_now=True)

The password field is there just so random people who guess a key wont be able to edit the data. Yup, it’s not encrypted. Why? Because I already know the password of everyone who’s going to use it (me :) )
If you are dying to use it and don’t want me to see your password, let me know and I can hash it or something.

Anyway, adding simple CRUD is easy too:

class KVStoreUpdate(webapp.RequestHandler):
    def post(self):
        # post to an existing key
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
        content = self.request.get('content')
        kv = KeyValue.gql("where name = :1 and passwd = :2", name, passwd).get()
        if not kv:
            self.response.out.write('-1')
        else:
            kv.value = content
            kv.put()
            self.response.out.write(kv.name)
 
    def get_name_passwd(self, request):
       return 
 
class KVStoreCreate(webapp.RequestHandler):
    def post(self):
        # reserve a new key or confirm old key.
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
 
        kv = KeyValue.gql("where name = :1", name).get()
        if not kv:
            newKv = KeyValue (name = name, passwd = passwd)
            newKv.put()
            self.response.out.write(name)
        elif kv.passwd == passwd:
            self.response.out.write(name)
        else:
            self.response.out.write('-1')
 
class KVStoreRead(webapp.RequestHandler):
    def post(self):
        # return stored value for the given key
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
 
        kv = KeyValue.gql("where name = :1 and passwd = :2", name, passwd).get()
        if not kv:
            self.response.out.write('-1')
        else:
            self.response.out.write(kv.value)
 
class KVStoreDelete(webapp.RequestHandler):
    def post(self):
        # delete kv 
        self.response.headers['Content-Type'] = 'text/plain'
        (name, passwd) =  (self.request.get('name'), self.request.get('passwd'))
 
        kv = KeyValue.gql("where name = :1 and passwd = :2", name, passwd).get()
        if not kv:
            self.response.out.write('-1')
        else:
            kv.delete()
            self.response.out.write(name)

Now all we need is the client code:

import urllib2
import urllib
import random 
import string
 
# inbox_id is the just the key. I was going to have versioning + 
# addressing but decided not to since it was starting to look
# like a hybrid of a queuing system and a key value store.
class KeyValuePublisher:
    def __init__(self, inbox_id, passwd, url):
        self.inbox_id = inbox_id
        self.passwd = passwd
        self.url = url
        self.reserve_inbox()
 
    def reserve_inbox(self):
        self.create_inbox()
 
    def create_inbox(self):
        response = self.make_request({}, "/c")
        if (response == '-1'):
            raise Exception('Name already taken, or password invalid')
 
    def publish(self, content):
        response = self.make_request({'content': content}, "/u")
        if (response == '-1'):
            raise Exception('Invalid update')
 
    def read(self):
        return self.make_request({}, "/r")
 
    def delete(self):
        response = self.make_request({}, "/d")
        if (response == '-1'):
            raise Exception('Can\'t delete')
 
 
    def make_request(self, data, resource = "", method = 'POST'):
        data['name'] = self.inbox_id
        data['passwd'] = self.passwd
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(self.url + resource, data=urllib.urlencode(data))
        request.add_header('Content-Type', 'application/x-www-form-urlencoded') 
        request.get_method = lambda: method
        return opener.open(request).read()        
 
if __name__ == "__main__":
 
    # publish strings to a valid mailbox
    kvs = KeyValuePublisher("srijak0", "password", 'http://my-ipaddr.appspot.com')
    for i in range(20):
        len = random.randint(0,10000)
        content = ''.join(random.choice(string.letters) for i in xrange(len))
        kvs.publish(content)
        assert content == kvs.read()
 
    # initialize with taken mailbox name
    passed = False
    try:
        kvs = KeyValuePublisher("srijak0", "not_password", 'http://my-ipaddr.appspot.com')
    except:
        passed = True
    assert passed
 
    # delete mailbox
    kvs = KeyValuePublisher("srijak0", "password",  'http://my-ipaddr.appspot.com')
    kvs.delete()
    assert kvs.read() == '-1'
 
    print "[Tests passed]"

Pretty self explanatory.

And there you have it. Not the prettiest code I’ve ever written but it gets the job done well enough for <1hour of work :)

Written by srijak

February 24th, 2010 at 11:18 pm

Posted in code

Tagged with , ,

Massaging a Tornado web pain point: restart requirement

without comments

I have been playing around with the Tornado web framework and I really like it: the shallow learning curve paired with everything it brings to the table makes for a very good framework to at least kick off your next real time app.

One of the core assumptions in tornado is that request are handled quickly: if there are any blocking calls in your request handlers, it will cause other requests to queue. So, I have http wrappers around my dbs and queues so that I can handle these blocking calls from my request handlers asynchronously.

Well and good you say. What is the problem? The pain point is that nodes need to be restarted in order for code changes to propagate. And though it’s not a huge problem, its started to bug me that I have to waste seconds(!) restarting three nodes every time I make a change.  So, I wrote a quick python script to do so. It takes the simplest approach; polling for changes in current directory and restarting nodes if required. I was going to use inotify, but as OSX apparently doesn’t have it (has something called FSEvents), I decided to put off learning a new lib for another day so I could keep hacking on my project.

# Simple script to poll all files of interest below the current working directory
# for changes. On change, it will run w/e commands you want. For me, it has
# been helpful in restarting tornado web nodes.
import os
import re
import time
import signal
from subprocess import Popen
 
# define what you want to run here:
# each task is a list of command/arguments to run, popen style
tasks = [['python','httpDatabase.py'],['python','main.py']]
 
# define the file types you want to trigger on
# I opted for .py and html files.
file_regexp = re.compile("(.py$|.html$)")
 
def files_have_changed(old_stats, new_stats):
    if len(old_stats) != len(new_stats):
        return True
    for k in old_stats:
        if new_stats[k] != old_stats[k]:
            return True
    return False
 
def get_stats():
    stats = {}
    f = []
    for root, folders, files in os.walk(os.getcwd()):
        f.extend([os.path.join(root,x) for x in files if file_regexp.search(x)])
    for file in f:
        try:
            stats[file] = time.localtime(os.stat(file)[8])
        except:
            pass
    return stats
 
 
handles = []
def stop_current():
    if len(handles) > 0:
        for h in handles:
            print "Killing %d" % (h.pid)
            os.kill(h.pid, signal.SIGTERM)
        del handles[:]
 
def restart():
    print "Files changed. Restarting"
    stop_current()
    for t in tasks:
        p = Popen(t)
        print "Started %s. PID:%d" % (" ".join(t),p.pid)
 
        handles.append(p)
 
 
last_stats = {} 
while 1:
    current_stats = get_stats()
 
    if (files_have_changed(last_stats, current_stats)):
        last_stats = current_stats
        restart()
    time.sleep(1)

Written by srijak

February 16th, 2010 at 1:08 am

Posted in code

Tagged with ,

Jiggly goodness

without comments

This is barely worth the effort of making a post for. However, since it took me more than a couple doc. references to figure this out, I figured it will help someone else.

Situation:

I have a text field inside a custom UITableViewCell. When a user tries to edit the text when it is disabled, I want to make it obvious. So, I decided to have the whole cell jiggle back and forth prettily when this happens.

Fix:

Add jiggle animation to the pertinent layer when the cell is selected.

- (void)setSelected:(BOOL)selected animated:(BOOL)animated {
    [super setSelected:selected animated:animated];
    if (!textField.enabled && !first){
        [self jiggle:self.layer]; // send in textField.layer if you just want the text to jiggle
    }
    first = false;  //hack: setSelected is called when the table is loaded...
}
 
-(void) jiggle:(CALayer*) layer{
 
    CABasicAnimation *animation = [CABasicAnimation animationWithKeyPath:@"position"];
    [animation setFromValue:[NSValue valueWithCGPoint:CGPointMake(layer.position.x-2, layer.position.y)]];
    [animation setToValue:[NSValue valueWithCGPoint:CGPointMake(layer.position.x+2, layer.position.y)]];
    [animation setDuration:0.1f];
    animation.repeatCount = 3;
    animation.autoreverses = YES;
 
    [layer addAnimation:animation forKey:@"positionAnimation"];
 
}

If you want to know more about CABasicAnimation, start here.

Written by srijak

October 26th, 2009 at 12:30 am

Posted in code

Tagged with ,

Automatic Language Detection using Scala

without comments

For a personal project of mine I want to be able to identify the input text’s language. Although this feature is just a nice-to-have, rather than a must-have, its addition would make the results produced much more meaningful. So let’s see if we can hack out a prototype

I know that n-gram models have something to do with language detection so let’s dig around on wikipedia.

The basic idea with using n-gram models for language detection is follows:

  1. Generate n-gram profiles for supported languages (train on corresponding corpa).
  2. Generate n-gram profile for text to identify.
  3. Compare results from # 2 to #3. If the profiles match, so do the languages.

So what is a language’s n-gram profile? It’s just the frequency distribution of n-(character|word) sequence for that language. This tells us how n-grams are chained in a given language and this data is generated using a corpa of text from the language.

So how does comparison work? If n-grams are chained the same between the unknown text and any other language, we know there is a match.

Does this make sense intuitively? Yup. Languages sound different because of the way the consonants and vowels are chained together. And, what we are doing is creating a sort of a fingerprint of how a given language is written.

How well does this work? We’ll see.

Since I have done the last few posts in Erlang, I am going to try to do this one using Scala.

import collection.mutable.HashMap
 
class DeBabel {
	val n : Int = 4
 	var fd : HashMap[String, CountMap[String]] =  new HashMap[String, CountMap[String]]()
 
	def train(pathToCorpus: String, languageId: String) = {
	  fd(languageId) =  new CountMap(getNgrams(scala.io.Source.fromFile(pathToCorpus).mkString))
	}
 
	private def getNgrams(text: String) : Iterable[String] = {
	  //remove non-word chars and trim whitespace.
	  val normalizedText : String = text.replaceAll("\\W+"," ").replaceAll("\\s", " ")
	  (0 to normalizedText.length - n).map(i => normalizedText.substring(i,i+n))
	}
 
	//compare the given text to all languages we can identify
	def compareToLanguages(text: String) : Iterable[Tuple2[String, Double]]  ={
		val ngrams = new CountMap(getNgrams(text))
		fd.map( kv => (kv._1, getSimilarityIndex(ngrams,kv._2)) )
	}
 
	// get similarity between 2 frequency tables as a value between 0 to 1
	def getSimilarityIndex(map1: CountMap[String], map2: CountMap[String]) : Double ={
		var total = 0.0
		var diff = 0.0
 
		map1.foreach{ kv =>
			diff += Math.abs(kv._2 - map2(kv._1))
			total += Math.max(kv._2, map2(kv._1))
		}
		(total-diff)/total	
	}
}
// from http://blog.lostlake.org/index.php?/archives/61-A-spell-checker-in-Scala.html
class CountMap[K](start: Iterable[K]) extends HashMap[K, Int] {
	override def default(key: K) = 0
	start.foreach(v => this(v) = this(v) + 1)
 }

And a simple driver:

object Driver {
	def main(args:Array[String]) = {
		var detector = new DeBabel();
		detector.train("click10.txt","english")
		detector.train("19380.txt","german")
		detector.train("720kc10.txt","french")
		println(detector.compareToLanguages(scala.io.Source.fromFile("25019.txt").mkString)
		          .toList.sort(_._2 > _._2))
	}
}

Note: Texts used were Jules Verne and P.G Wodehouse novels from Project Gutenberg

$ scalac mrd.scala 
$ List((german,0.2715568746868134), (english,0.1184407499488911), (french,0.05424252275682705))

And the text was german so it works. The margins could use a little work, but given the sparse training data I am happy with the results at this stage.

What next ?

I only tested against Romance languages. And it would probably help to train with larger corpa :) I could also have tested to see whether character or word grams give better results(intuitively, it seems that word level n-grams would perform better). I also ignored char-set information.

However, in spite of these limitations, the results were more than satisfactory for the amount of work I put in. Even by training it on just one book, this small program was able to correctly identify the language used. If we tune this using additional heuristics (for example a Language Recognition Chart) we should be able to make the results much better.

Although I haven’t verified this, I feel like this method will work well for languages with a “phonetic” script (that may be the wrong term, what I mean is graphemes represent consonants and vowels in that language). However, languages like chinese/japanese where a graphemes represent whole words, going may be tough. Might this be mitigated using word level n-grams rather than character level ones? Maybe, I don’t know enough (any) Chinese/japanese/Korean to be able to test that.

Disclaimer: As you are probably painfully aware by now, I am not even close to being a computational linguist. I haven’t even sat next to one on a long flight. As such, I apologize for anything in this post that was completely off mark. Feel free to correct me. As this was my first Scala program, the same thing applies there too :)

Written by srijak

August 30th, 2009 at 2:41 am

Posted in code

Tagged with ,

Heapsort Pt 1

without comments

I happened to see one of my colleague’s passwords, “h3apS0r7″. As I asked him whether he actually knew how to implement it, I realized that I had never had to implement a Heapsort. And as we all (should) know: knowing how to do X and actually doing X are vastly different things.

So this post is my attempt at fixing that.

What’s Heapsort anyway? Like the name suggests it consists of

  1. Building a (max/min)heap using the data
  2. Plucking out the root(which is either the greatest or smallest value depending on whether you built a max or min heap) continuously such that you have the data in order.

Remind you of something? If you read just #2 you will see that Heapsort is just Selection sort powered by a heap :)

Let’s start off by implementing Selection sort. First we need a function pop_max(List) that returns/removes the maximum element from a list:

-module(heapsort).
-export([pop_max/1]).
 
% pop_max(List) returns {MaxValue, RestOfList}
pop_max([])->{nil,[]};
pop_max(L) -> 
     Max = lists:max(L),
     {Max,lists:delete(Max,L)}.

Integrating this into Selection sort is pretty easy too:

selectionSort([])-> [];
selectionSort(L) ->
    {Max,Rest} = pop_max(L),
    selectionSort(Rest) ++ [Max].
 
% Note: Yeah it's not tail recursive, but I think it looks cleaner.

Let’s try it out:

8> c(heapsort).                    
{ok,heapsort}
9> heapsort:selectionSort([1,5,2,77,8,3,2,5,8,12,53,1,5,87,1,4573,2,969]).
[1,1,1,2,2,2,3,5,5,5,8,8,12,53,77,87,969,4573]

Looks good to me.

Anyway, ideally we should be able to just swap out pop_max with pop_heap.

Let’s figure out how we want to implement the binary heap now. We need to support adding and removing the root from the heap.
Adding to the heap works as follows:

  1. Add the item at the bottom.
  2. Swap the item with it’s parent until the parent is greater than the item (a.k.a sift-up)

Removing from the heap is the inverse .i.e:

  1. Pop off the root and set the last item on the lowest level as the root.
  2. Swap the root with it’s children until the item is greater than it’s children(a.k.a sift-down).

Something else we want to determine is the underlining data structure for the Heap. We could use the binary tree we developed earlier, or we could use a list. Let’s go with the latter.

Anyway, I set myself a time limit of 20 minutes when I started this and it’s kinda late, so I’m going to implement the binary Heap in the next post :)

Written by srijak

August 22nd, 2009 at 10:34 am

Posted in code

Tagged with

Shared + Concurrent BST

without comments

As promised, today we will try to re-implement the BST from last post and make it erlang-y. Once again, I am going to just implement inserts and leave the rest as as exercise for the reader. I am going to try to put down my thought process as I try to solve this.

To recap here’s what the BST insert looks like :

-record(node,{kvTuple,left,right}).
 
binsert(#node{kvTuple={CK,CV},left=L,right=R},{K,V}) when K < CK ->
    #node{kvTuple={CK,CV},left=binsert(L,{K,V}),right=R};
binsert(#node{kvTuple={CK,CV},left=L,right=R},{K,V}) ->
    #node{kvTuple={CK,CV},left=L,right=binsert(R,{K,V})};
binsert(null,{K,V}) ->
    #node{kvTuple={K,V},left=null,right=null}.

My first instinct is to give this process a “mailbox” so that we have a shared BST that can be updated by multiple processes concurrently. However, this will process the messages sequentially. To mitigate this we can spawn a process to handle each message and so parallelize it. Is this the best solution though ? As we are going utilizing parallelization, can we break this up into smaller/more processes?

What about if each node is a process? Would that work? Would this incur too much overhead ? Hopefully not. Erlang’s supposed to have really lightweight processes so we are going to assume it’s ok (maybe a future post about exactly what the overhead is).

So at this point I did a quick google search on concurrent BSTs to see if there were other ways I was missing and I ran into this post by Faud AlTabba which totally messed up this post :)
His post has an implementation for each of the ideas I talked about and covers everything I hoped too. So go read it. I’m going to go try to find something else to practice Erlang on.

Written by srijak

July 29th, 2009 at 8:59 pm

Posted in code

Tagged with

Quick and dirty erlang BST

without comments

The book I have been reading implements a Set using a BST. Implementing a naive BST
in Erlang proved much easier than I expected. Maybe thats because I just implemented inserts :)

I started with a function that does sorted inserts on lists :

insert([],V) -> V;
insert([H|T],V)  when V > H ->
    [H|insert(T,V)];
insert([H|T],V)  when V < H ->
    [insert(H,V)|T];
insert(H,V) ->
    [V|H].

Once I defined a node record, converting to a bst was pretty easy:

%% what our node looks like
-record(node,{kvTuple,left,right}).
 
binsert(#node{kvTuple={CK,CV},left=L,right=R},{K,V}) when K < CK ->
    #node{kvTuple={CK,CV},left=binsert(L,{K,V}),right=R};
binsert(#node{kvTuple={CK,CV},left=L,right=R},{K,V}) ->
    #node{kvTuple={CK,CV},left=L,right=binsert(R,{K,V})};
binsert(null,{K,V}) ->
    #node{kvTuple={K,V},left=null,right=null}.

Adding a “memberOf” function is pretty much exactly like the insert so I wont bother showing it here.

Anyway, since that was trivial to implement, lets take it a step further: Erlang’s strength is concurrency so next post will be about adding concurrency to this BST.

Written by srijak

July 26th, 2009 at 8:58 pm

Posted in code

Tagged with

A little more Erlang

without comments

I have been meaning to read Purely Functional Data Structures for a while. So, I grabbed a copy earlier and started to work through the book using Erlang.

Translating1 the first couple of examples to Erlang was easy and looked pretty much exactly like the ML implementations:

%concatenate 2 lists
concat([],X) -> X;
concat(Y,[]) -> Y;
concat([H|T],X) -> [H|concat(T,X)].
 
%update value at given index
update([_H|T],0,N) -> [N|T];
update([H|T],C,N) -> [H|update(T,C-1,N)].



The first exercise (2.1) was to write a function suffixes that takes a list xs and returns a list of all suffixes in decreasing order of length.

%get all suffixes for the list
suffixes([]) -> [];
suffixes([H|T]) -> [[H|T]|suffixes(T)].

These are generated in O(n) time and represented in O(n) space.


Next up implementing a Binary Search Tree…



1.Translating: Not strictly translating. I write my own from the description, then match it with the author’s ML implementation to make sure they are close enough.

Written by srijak

July 26th, 2009 at 5:24 pm

Posted in code

Tagged with

HTTP PUT Rails gotcha

without comments

So, I was pretty surprised to learn that Rails PUT requests are required to be  “application/x-www-form-urlencoded”. Definitely not expected behavior when I am just consuming someone else’s API. If that isn’t completely true, feel free to enlighten me.

Basically here’s what happened: I was doing a HTTP PUT using C# to an API on Rails. C# looked pretty much like this:

request = (HttpWebRequest) WebRequest.Create(uri);
request.Method = "PUT";
request.ContentType = "text/plain";
request.ContentLength = postData.Length;
using(var writeStream = request.GetRequestStream())
{
      var bytes = (new UTF8Encoding()).GetBytes(postData);
      writeStream.Write(bytes, 0, bytes.Length);
}

But the code no worky.

After trying a couple of different things, I talked to the people who implemented the API. They said the Rails log indicated that I wasn’t PUTing anything .i.e. they could see the PUT request, but not the data. So, my first instinct was that it must be the content type, but was told it very definitely couldn’t be that.

Mystified, I wrote a quick Ruby script to do the same request. It worked!

Out came Wireshark.

Guess what the differences were ?

  1. C# has keep-alive set, Ruby didn’t.
  2. Ruby content-type was “application/x-www-form-urlencoded”.

I figured 1 really wasn’t the problem and since 2 supported my gut reaction, I went with it.
Sure enough, once I set content-type to “application/x-www-form-urlencoded” the c# code worky.

Moral of story:

  • Remember Rails requires a PUT to be of Content-type:”application/x-www-form-urlencoded”. Maybe it takes other things, but it doesn’t take text/plain.
  • Be very aware how the different layers/libraries interact in your app. All that magic is a double edged sword. There are 3rd party libs out there that do everything. But when you use one, please be sure you know what it does :)
  • When debugging take all assumptions (esp. other people’s :) ) with a grain of salt.

Written by srijak

July 22nd, 2009 at 7:12 pm

Posted in code

Tagged with , , , ,