I recently built this toy command line interface (CLI) app for fun. You can see a higher resolution video of it in action by clicking the image below.

Twitch Chat Filterer

The video above is a snapshot of the CLI in action during the middle of a speed run of the Nintendo 64 game The Legend of Zelda: Ocarina of Time, hosted on Twitch.tv. It shows the filtering in action, in the middle of a burst of the WutFace spam, which you can see in the lower left of the above video.

Twitch.tv Overview

For the uninitiated, Twitch.tv is a streaming video website that allows anyone to share live video with viewers over the internet, with twitch. The most popular use case by a wide margin for streaming players playing video games, although it is also sometimes used for sharing live music, where the video streamer might be playing an instrument (or instruments, if it’s a band.

Several thousand people can be watching the same video stream at the same time — it’s not uncommon to see streams that have 100,000 simultaneous viewers. Since viewers who watch the same video stream are also put into the same chat room (there’s no chat room splitting that happens automatically), the chat room messages can become incredibly difficult to parse.

To (sort of) solve this problem, I wrote a simple CLI Twitch/IRC chat client that supports the ability to filter out duplicate messages within a sliding time window in realtime.

Implementation Details

Although this was built specifically for Twitch, it works with any chat that supports IRC as a protocol.

The fully functional IRC client was written in Golang. The choice of Golang was purely out of my desire to learn the language. If I was to build something beyond for toy usage, I’d probably write it as a Chrome extension, since native Twitch chat is in the browser, and it’s where users are accustomed to chatting.

I used termui for the UI. The usecase for termui is pretty niche — I’d say it’s useful for interfaces when a browser is out of the question, such as while SSHed onto a remote server. Unless performance is a problem or golang is absolutely necessary, blessed-contrib using JavaScript looks to be a better option for building CLI UIs. I’d avoid building CLI UIs over browser based ones in general though — these frameworks for building CLI UIs is not like working with the DOM in the browser, and the kinds of UIs you can build using termui or blessed-contrib are limited in scope.

There are the following widgets in this client:

1) Message Rate Monitor (Rate of messages over sliding window)

2) Message Stats Monitor (Min, Max, Avg messages over sliding window)

3) Duplicate Message Aggregator (List of messages sorted by number of occurrences over sliding window)

The trickiest implementation detail was the duplicate message aggregator. It involved maintaining a sorted list of (message, count) pairs in realtime as messages arrived, and then decrementing / removing the message from the sorted list of (message, count) pairs, while also preventing concurrent updates from malforming the sorted list counts.

Despite the complexity, Golang made this process quite easy. Concurrent updates were handled by serializing all updates through a queueing channel, and goroutines with time.Sleep(duration) were used to decrement counts.

Further Thoughts on Golang

Golang was a bit of a pain at times — the lack of generics in the language shows. I had to duplicate methods like math.Max for int types, since the builtin math.Max only works with float64. Also, the extra overhead of thinking about when to pass a pointer to a struct vs just the struct got in the way of thinking about the higher level functionality of the application.

On the flip side, I spent almost no time learning Golang and was able to build something functional almost immediately. This hasn’t been the case with other languages I’ve learned, such as TypeScript or Scala. The expediency experienced with Golang comes from two things — firstly, the amazing tooling that has gone into the language ecosystem, and secondly, the sheer simplicity of the language and lack of language features.

I wouldn’t use Golang as my first language choice for prototyping a product, but for system level programming, network intensive applications, or performance critical backends, Golang seems like the perfect choice for development. I’ll stick to Python, Scala, or TypeScript for prototyping products quickly, and use Golang when the extra performance is necessary.

Using the Client

The source code and instructions for usage of the project described earlier can be found at my github page here. All that is needed is a golang installation and a Twitch account.

This article is a collection of runtime and memory usage numbers that might be useful to keep in mind when building performance critical software.

Big-O Absolute Numbers

This chart shows how well or poorly certain algorithm complexities scale when n grows.

Function   | n=8 (2^3) | n=64 (2^6)  | n=1024 (2^10) | n=2^32 (~ 4 billion)
-----------|-----------|-------------|---------------|---------------------
log(n)     | 3         | 6           | 10            | 32
n          | 8         | 64          | 1024          | 4294967296
n*log(n)   | 24        | 384         | 10240         | 1.374...E11
n^2        | 64        | 4096        | 1048576       | 1.844...E19
n^2*log(n) | 192       | 24576       | 1.048...E7    | -
n^3        | 512       | 262144      | -             | -
2^n        | 256       | 1.844...E19 | -             | -
n!         | 40320     | 1.268...E89 | -             | -

Runtime Performance Numbers

Big-O Runtime Performance Numbers

This chart shows how well or poorly certain algorithm complexities scale when n grows, in the context of CPU runtime. Exponentials and factorial algorithms clearly do not scale to values of n of any significant size.

Function   | n=8 (2^3) | n=64 (2^6)       | n=1024 (2^10) | n=2^32 (~ 4 billion)
-----------|-----------|------------------|---------------|---------------------
log(n)     | 3   ns    | 6   ns           | 10 ns         | 32  ns
n          | 8   ns    | 64  ns           | 1  µs         | 4.2 seconds
n*log(n)   | 24  ns    | 384 ns           | 10 µs         | 137 seconds
n^2        | 64  ns    | 4   µs           | 1  ms         | 584 years
n^2*log(n) | 192 ns    | 24  µs           | 10 ms         | forever
n^3        | 512 ns    | 262 µs           | 1  second     | forever
2^n        | 256 ns    | 584 years        | forever       | forever
n!         | 40  µs    | forever          | forever       | forever

ns      = Nanoseconds, about the time it takes for a CPU instruction cycle to run
µs      = 1,000 ns
ms      = 1,000 µs
forever = way way longer than the lifetime of the universe of 13.798 billion years

Latency Numbers

This chart shows the round trip latency for certain computing actions.

 Action                             | Time     | Comparisons
------------------------------------|----------|-----------------------------
 L1 cache reference                 | 0.5   ns |
 Branch mispredict                  | 5     ns |
 L2 cache reference                 | 7     ns | 14x L1 cache
 Mutex lock/unlock                  | 25    ns |
 Main memory reference              | 100   ns | 20x L2 cache, 200x L1 cache
 Compress 1K bytes with Zippy       | 3,000 ns |
 Send 1K bytes over 1 Gbps network  | 10    µs |
 Read 4K randomly from SSD*         | 150   µs |
 Read 1 MB sequentially from memory | 250   µs |
 Round trip within same datacenter  | 500   µs |
 Read 1 MB sequentially from SSD*   | 1     ms | 4X memory
 Disk seek                          | 10    ms | 20x datacenter roundtrip
 Read 1 MB sequentially from disk   | 20    ms | 80x memory, 20X SSD
 Send packet CA->Netherlands->CA    | 150   ms |

+ By Jeff Dean: http://research.google.com/people/jeff/
+ Originally by Peter Norvig: http://norvig.com/21-days.html#answers
+ Retrieved from Jonas Bonér: https://gist.github.com/jboner/2841832

Memory and Storage Numbers

Big-O Storage size

This chart shows how well or poorly certain algorithm complexities scale when n grows, in the context of memory and storage.

Function   | n=8 (2^3) | n=64 (2^6)    | n=1024 (2^10) | n=2^32 (~ 4 billion)
-----------|-----------|---------------|---------------|---------------------
log(n)     | 3   B     | 6   B         | 10 B          | 32 B
n          | 8   B     | 64  B         | 1  KB         | 4 GB
n*log(n)   | 24  B     | 384 B         | 10 KB         | 128 GB
n^2        | 64  B     | 40  KB        | 1  MB         | 16 exabytes
n^2*log(n) | 192 B     | 24  KB        | 10 MB         | ...
n^3        | 512 B     | 256 KB        | 1  GB         | ...
2^n        | 256 B     | 16  exaBytes  | ...           | ...
n!         | 40  KB    | ...           | ...           | ...

B       = 1 Byte, or 8 Bits
KB      = kilobyte (1024 B)
MB      = megabyte (1024 KB)
GB      = gigabyte (1024 MB)
TB      = terabyte (1024 GB)
TB      = petabyte (1024 TB)
exabyte = 1024 petabytes, or 1 million computers with 1 terabyte HDs
...     = there isn't enough room on earth to store the required computers.

Powers of Two Table

This chart shows how much memory is required to store bits in powers of two. For example, it takes 4GB of memory to store 232 bits in memory, and would require 16GB of memory to store an array of 32-bit integers. This is useful for knowing when a hash table is appropriate for a problem.

Power of 2 | Exact Value       | Storage size
-----------|-------------------|-------------
7          | 128               | 128 B
8          | 256               | 256 B
10         | 1024              | 1   KB
16         | 65,536            | 64  KB
20         | 1,048,576         | 1   MB
30         | 1,073,741,824     | 1   GB
32         | 4,294,967,296     | 4   GB
40         | 1,099,511,627,776 | 1   TB
64         | 1.844...E19       | 16  exabytes

+ Modified from _Cracking the Coding Interview_ By Gayle L. McDowell (p. 47)

OverClocked Remix (OCRemix) is a website dedicated to serving high quality remixes of music from video games. The RSS feed is occasionally updated with the latest ten new songs that are posted on the site.

This post is about a server daemon project I wrote a year ago — it polls the RSS feed of OCRemix periodically, and converts new results to a Twitter feed. The results can be seen at @newOCRemixes on Twitter. It’s been running on Heroku 24/7 for the past year without any problems. Source code is here.

The rest of this post goes into the design and development of this project, so if that doesn’t sound interesting to you, now is a good time to hop off this train.

Motivation and design

I’ve been listening to music off of OCRemix ever since 2002. The way I used to find new songs was to repeatedly visit their website, look for new songs, and then click through three or four links for every new song that was posted.

I got tired of checking the website, only to end up finding out there were no new songs, or finding out there are ten new songs, and having to click 30 times to hear all of them.

So I set out to figure out how to make it much, much easier to hear new songs and download them if I liked them.

Existing OCRemix Feeds

OCRemix has it’s own Twitter feed, which they will use to post new remixes, but it’s not exclusive to providing just new songs:

Official OCRemix Twitter Feed picture

Twitter happens to have this nice feature that they call cards. Cards are essentially a feature that detect certain types of URLs, and display information within the tweet that would only be accessible from visiting the URL. One of the supported card types are Youtube links, which will embed a Youtube video into a Tweet whenever a Youtube link is detected.

Sometimes the links to new songs include a Youtube link to the remix for easy listening…

Official OCRemix Twitter Feed picture, embedded Youtube

and sometimes they don’t:

Official OCRemix Twitter Feed picture, no Youtube

They also have their own Youtube channel where new songs are posted, but similarly to their Twitter feed, it is not exclusive to new songs.

My OCRemix Feed

Since I was already using Twitter as a news feed, I chose that as my target platform for receiving new song notifications. The following is a list of some of the things I wanted out of my Twitter feed system.

  • Only new songs should be posted — no reposts.

  • Only songs should be posted — nothing else but new OCRemix songs.

  • Every single song must contain a link to the Youtube song link.

  • Show the video game, remixer, and composer the song is remixed from, if it fits in 140 characters along with the Youtube link.

  • This system should not require any input from me (automated).

  • If something is broken, I should be notified of the breakage.

Initially, I wanted to implement direct download links within each Tweet in order to simplify the process of downloading the mp3 for new songs. However, because ocremix.org is non-profit and receives advertising money from the main website to pay for bandwidth, I chose to just link to their site to download out of politeness.

This is what the final version of my implementation looks like:

My Unofficial Twitter Feed picture

Assuming this information fits into the 140 character limit, each one of these song tweets is composed of the following:

songId     - Integer, official remix number (currently at 2827 at the time of writing). 
title      - Song title, given to the song by the remixers.
remixers   - The remix artists that created the remix.
composers  - The original artists that created the original song.
youtubeUrl - Link to the remix on Youtube.
writeupUrl - Official OCRemix link with remix information and a download link.

Programming Details

The actual project was implemented using Scala. The main libraries used were Dispatch for OAuth and talking with Twitter’s API, and Akka for the periodic polling of the RSS feed.

Fitting Info into a Tweet

Trying to detect whether or not a Tweet will fit in 140 characters is tricky, especially if there are URLs in the Tweet. Twitter will automatically shorten URLs for you — the caveat is that the length of the shortened URL is what counts to the 140 character limit, and not the size of the original URL.

So how do you figure out what the shortened URL size will be? Twitter happens to have a configuration API that will tell you the current length of URLs (https links are one character longer than http). Since this number can change, it can’t be hardcoded into the system. It’s also not a number that is likely to change often.

To take this varying URL length into consideration, the previously known shortened URL length is cached in memory for quick usage. I poll the configuration API every 24 hours to check for and update the URL length cache if necessary.

This polling is done using Akka’s scheduler:

1
2
3
4
5
6
7
8
val twitterConfigUpdaterSchedule = MySystem().scheduler.schedule(
  initialDelay = 0 seconds,
  frequency    = 1 day,
  receiver     = configUpdater,
  // If I was writing this again, I probably wouldn't use a "doit" message
  // for initiating an update.
  message      = "doit"
)

Sometimes there’s just too much information to include in 140 characters. Going back to my design goal of making it easy to hear new songs, the Youtube link is the most important thing to include in the Tweet.

Here’s a shortened version of the code that figures out what to put into a tweet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// The youtube URL is in every possible one of these.
lazy val v0 = Tweet(songId, title, remixers, composers, youtubeUrl, writeupUrl)
lazy val v1 = Tweet(songId, title, remixers, youtubeUrl, writeupUrl)
lazy val v2 = Tweet(songId, title, youtubeUrl, writeupUrl)
lazy val v3 = Tweet(songId, title, youtubeUrl)
lazy val v4 = Tweet(songId, youtubeUrl)

// Start with the longest possible, then slowly go to the shortest.
if (v0.isTweetable)
  v0
else if (v1.isTweetable)
  v1
else if (v2.isTweetable)
  v2
else if (v3.isTweetable)
  v3
else
  v4

Unless Twitter plans on upping the shortened URL length to ~135 characters, or the ids start incrementing in several orders of magnitude per new remix, the smallest tweet content (songId,youtubeUrl) is good enough.

There’s a reason songId is included in all of the possibilities as well, more on that later.

Receiving Error Notifications

In order to make sure I received a notification whenever something breaks, every single method that could cause an error returns an Either[_, String], where the Right case stores the successful result, and the Left case stores a string explaining what went wrong. Whenever a method is completed with the Left case, a direct message is sent to my own personal Twitter handle, with information about the failure.

Here’s what error messages look like when they reach my Twitter inbox:

Error Messages in my Twitter Inbox

Some examples of methods that might fail that I want to be alerted about if they do:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  // Sends a private message to someone on Twitter
  def directMessage(user: TwitterHandle, message: String): Future[Either[String,String]]

  // Tweets a new post
  def statusesUpdate(tweet: Tweetable): Future[Either[String,String]]

  // Retrieves the last few Tweets from a user's timeline
  def userTimeline(
    userId:     String,
    screenName: String,
    count:      Int
  ): Future[Either[String,String]]

  // Retrieves configuration information about Twitter
  def helpConfiguration: Future[Either[String, TwitterConfiguration]]

(In retrospect, just using Future[String] would have been enough, since it stores the exception information in case of failure… but keep in mind I did write this code as a novice just as I started learning Scala).

Polling and Parsing the RSS Feed

There’s nothing particularly fancy about this part — it mostly consists of periodically polling OCRemix’s RSS and parsing out new songs to post on Twitter.

Here’s the code that schedules an RSS polling check every half hour:

1
2
3
4
5
6
val rssPollerSchedule = MySystem().scheduler.schedule(
  initialDelay = 0 seconds,
  frequency    = 30 minutes,
  receiver     = rssPoller,
  message      = "doit"
)

In order to figure out whether or not a song is new, I send a request to Twitter’s API to retrieve the @newOCRemixes Twitter Timeline, check the last Tweet’s songId, and make sure any new song I post happens to have a songId greater than the one on the timeline.

Things to Improve

All of this mostly works. There are a few problems that were more complicated to solve than the time I wanted to spend on this.

One of these problems happens to be that the RSS feed only stores the 10 latest new songs — if there’s ever more than 10 songs posted within a 30 minute interval, any songs prior to the last 10 will not get posted. This has only happened once since I’ve launched this project, but there’s not much that can be done about it, short of asking for the OCRemix owners to not post so many songs at once, or up the number on their RSS feed.. or polling the website itself, and parsing the song URLs from raw HTML.

Another problem (that hasn’t actually been much trouble so far) is that the error message reporting is dependent on Twitter’s API working. If there’s a problem with the directMessage call to send me a message on errors, I won’t be notified. I could have solved this by setting up a secondary notification system such as email, but errors happen so rarely that it wasn’t worth the trouble.

The third and last problem that I’ve run into is actually not one I can do much about. The Twitter card feature for Youtube videos sometimes won’t work — a few Tweets just won’t embed a Youtube video into the post, even if there’s a valid Youtube URL in the message.

Having said that, I’m pretty happy with the results — it’s been working without a hitch for the last year since it’s been deployed.

Scalariform is a Scala source code formatter, originally written by Matt Russell (big thanks to him for writing it).

It’s much easier to show you what this does than it is to try and explain it, so that’s what I’ll do.

This is some poorly formatted code before running Scalariform:

1
2
3
4
5
6
7
8
9
10
11
class Coffee {
val sugarCubes = 20
val isCaffeinated = false

def energyBoost = {
if (caffeinated)
100 * sugarCubes
else
0
}
}

After running Scalariform:

1
2
3
4
5
6
7
8
9
10
11
class Coffee {
  val sugarCubes = 20
  val isCaffeinated = false

  def energyBoost = {
    if (caffeinated)
      100 * sugarCubes
    else
      0
  }
}

Pretty cool, right? Unfortunately, there haven’t been very many updates to the official version of this super awesome project lately, so I decided to fork the project and start improving upon it myself. Since I use Scala quite often, I’ve got some pretty strong motivation to work on improving it.

Here’s a quick summary of what I’ve added to Scalariform so far:

Previous Scalariform Formatting
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def showInput[A](
  parent: Component = null,
  message: Any,
  title: String = uiString("OptionPane.inputDialogTitle"),
  messageType: Message.Value = Message.Question,
  icon: Icon = EmptyIcon,
  entries: Seq[A] = Nil,
  initial: A
): Option[A]

case class Cake(
  icingFlavor: Flavor = Vanilla,
  cakeFlavor: Flavor = Chocolate,

  candles: Int = 1,
  layers: Int = 3,
  iceCream: Boolean = False
)

o.manyArguments(abc = 0,
  abcOne = 1,
  abcTwo,
  abcThree = 3,
  abcFour = 4,
  abcFive = 3
)
Scalariform Formatting with My Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Parameter names, types, and defaults are aligned into three separate columns
def showInput[A](
  parent:      Component     = null,
  message:     Any,
  title:       String        = uiString("OptionPane.inputDialogTitle"),
  messageType: Message.Value = Message.Question,
  icon:        Icon          = EmptyIcon,
  entries:     Seq[A]        = Nil,
  initial:     A
): Option[A]

// Two newlines will result in separate alignment groups 
case class Cake(
  icingFlavor: Flavor = Vanilla,
  cakeFlavor:  Flavor = Chocolate,

  candles:  Int     = 1,
  layers:   Int     = 3,
  iceCream: Boolean = False
)

// Same feature working with method calls
o.manyArguments(
  abc    = 0,
  abcOne = 1,
  abcTwo,
  abcThree = 3,
  abcFour  = 4,
  abcFive  = 3
)

And here’s how to use my version:

Scalariform Formatting with My Changes
1
2
3
4
// Add this to .../project/plugins.sbt
resolvers += "Sonatype OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots"

addSbtPlugin("com.danieltrinh" % "sbt-scalariform" % "1.3.0-SNAPSHOT")

See the plugin for how to configure formatting options, and the Scalariform readme for available formatting options.

Since this is an ongoing project, there will be more updates to come.