I recently built this toy command line interface (CLI) app for fun. You can see a higher resolution video of it in action by clicking the image below.
The video above is a snapshot of the CLI in action during the middle of a speed run of the Nintendo 64 game The Legend of Zelda: Ocarina of Time, hosted on Twitch.tv. It shows the filtering in action, in the middle of a burst of the WutFace spam, which you can see in the lower left of the above video.
For the uninitiated, Twitch.tv is a streaming video website that allows anyone to share live video with viewers over the internet, with twitch. The most popular use case by a wide margin for streaming players playing video games, although it is also sometimes used for sharing live music, where the video streamer might be playing an instrument (or instruments, if it’s a band.
Several thousand people can be watching the same video stream at the same time — it’s not uncommon to see streams that have 100,000 simultaneous viewers. Since viewers who watch the same video stream are also put into the same chat room (there’s no chat room splitting that happens automatically), the chat room messages can become incredibly difficult to parse.
To (sort of) solve this problem, I wrote a simple CLI Twitch/IRC chat client that supports the ability to filter out duplicate messages within a sliding time window in realtime.
Although this was built specifically for Twitch, it works with any chat that supports IRC as a protocol.
The fully functional IRC client was written in Golang. The choice of Golang was purely out of my desire to learn the language. If I was to build something beyond for toy usage, I’d probably write it as a Chrome extension, since native Twitch chat is in the browser, and it’s where users are accustomed to chatting.
There are the following widgets in this client:
1) Message Rate Monitor (Rate of messages over sliding window)
3) Duplicate Message Aggregator (List of messages sorted by number of occurrences over sliding window)
The trickiest implementation detail was the duplicate message aggregator. It involved maintaining a sorted list of (message, count) pairs in realtime as messages arrived, and then decrementing / removing the message from the sorted list of (message, count) pairs, while also preventing concurrent updates from malforming the sorted list counts.
Despite the complexity, Golang made this process quite easy. Concurrent updates were handled by serializing all updates through a queueing channel, and goroutines with time.Sleep(duration) were used to decrement counts.
Further Thoughts on Golang
Golang was a bit of a pain at times — the lack of generics in the language shows. I had to duplicate methods like math.Max for int types, since the builtin math.Max only works with float64. Also, the extra overhead of thinking about when to pass a pointer to a struct vs just the struct got in the way of thinking about the higher level functionality of the application.
On the flip side, I spent almost no time learning Golang and was able to build something functional almost immediately. This hasn’t been the case with other languages I’ve learned, such as TypeScript or Scala. The expediency experienced with Golang comes from two things — firstly, the amazing tooling that has gone into the language ecosystem, and secondly, the sheer simplicity of the language and lack of language features.
I wouldn’t use Golang as my first language choice for prototyping a product, but for system level programming, network intensive applications, or performance critical backends, Golang seems like the perfect choice for development. I’ll stick to Python, Scala, or TypeScript for prototyping products quickly, and use Golang when the extra performance is necessary.
Using the Client
The source code and instructions for usage of the project described earlier can be found at my github page here. All that is needed is a golang installation and a Twitch account.
This chart shows how well or poorly certain algorithm complexities scale when n grows,
in the context of CPU runtime. Exponentials and factorial algorithms clearly
do not scale to values of n of any significant size.
Function | n=8 (2^3) | n=64 (2^6) | n=1024 (2^10) | n=2^32 (~ 4 billion)
log(n) | 3 ns | 6 ns | 10 ns | 32 ns
n | 8 ns | 64 ns | 1 µs | 4.2 seconds
n*log(n) | 24 ns | 384 ns | 10 µs | 137 seconds
n^2 | 64 ns | 4 µs | 1 ms | 584 years
n^2*log(n) | 192 ns | 24 µs | 10 ms | forever
n^3 | 512 ns | 262 µs | 1 second | forever
2^n | 256 ns | 584 years | forever | forever
n! | 40 µs | forever | forever | forever
ns = Nanoseconds, about the time it takes for a CPU instruction cycle to run
µs = 1,000 ns
ms = 1,000 µs
forever = way way longer than the lifetime of the universe of 13.798 billion years
This chart shows the round trip latency for certain computing actions.
Action | Time | Comparisons
L1 cache reference | 0.5 ns |
Branch mispredict | 5 ns |
L2 cache reference | 7 ns | 14x L1 cache
Mutex lock/unlock | 25 ns |
Main memory reference | 100 ns | 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy | 3,000 ns |
Send 1K bytes over 1 Gbps network | 10 µs |
Read 4K randomly from SSD* | 150 µs |
Read 1 MB sequentially from memory | 250 µs |
Round trip within same datacenter | 500 µs |
Read 1 MB sequentially from SSD* | 1 ms | 4X memory
Disk seek | 10 ms | 20x datacenter roundtrip
Read 1 MB sequentially from disk | 20 ms | 80x memory, 20X SSD
Send packet CA->Netherlands->CA | 150 ms |
+ By Jeff Dean: http://research.google.com/people/jeff/
+ Originally by Peter Norvig: http://norvig.com/21-days.html#answers
+ Retrieved from Jonas Bonér: https://gist.github.com/jboner/2841832
Memory and Storage Numbers
Big-O Storage size
This chart shows how well or poorly certain algorithm complexities scale when n grows,
in the context of memory and storage.
Function | n=8 (2^3) | n=64 (2^6) | n=1024 (2^10) | n=2^32 (~ 4 billion)
log(n) | 3 B | 6 B | 10 B | 32 B
n | 8 B | 64 B | 1 KB | 4 GB
n*log(n) | 24 B | 384 B | 10 KB | 128 GB
n^2 | 64 B | 40 KB | 1 MB | 16 exabytes
n^2*log(n) | 192 B | 24 KB | 10 MB | ...
n^3 | 512 B | 256 KB | 1 GB | ...
2^n | 256 B | 16 exaBytes | ... | ...
n! | 40 KB | ... | ... | ...
B = 1 Byte, or 8 Bits
KB = kilobyte (1024 B)
MB = megabyte (1024 KB)
GB = gigabyte (1024 MB)
TB = terabyte (1024 GB)
TB = petabyte (1024 TB)
exabyte = 1024 petabytes, or 1 million computers with 1 terabyte HDs
... = there isn't enough room on earth to store the required computers.
Powers of Two Table
This chart shows how much memory is required to store bits in powers of two. For example, it takes
4GB of memory to store 232 bits in memory, and would require 16GB of memory to store
an array of 32-bit integers. This is useful for knowing when a hash table is appropriate for a problem.
Power of 2 | Exact Value | Storage size
7 | 128 | 128 B
8 | 256 | 256 B
10 | 1024 | 1 KB
16 | 65,536 | 64 KB
20 | 1,048,576 | 1 MB
30 | 1,073,741,824 | 1 GB
32 | 4,294,967,296 | 4 GB
40 | 1,099,511,627,776 | 1 TB
64 | 1.844...E19 | 16 exabytes
+ Modified from _Cracking the Coding Interview_ By Gayle L. McDowell (p. 47)
OverClocked Remix (OCRemix) is a website dedicated to serving high quality remixes of music from video games. The RSS feed is occasionally updated with the latest ten new songs that are posted on the site.
This post is about a server daemon project I wrote a year ago — it polls the RSS feed of OCRemix periodically, and converts new results to a Twitter feed. The results can be seen at @newOCRemixes on Twitter. It’s been running on Heroku 24/7 for the past year without any problems. Source code is here.
The rest of this post goes into the design and development of this project, so if that doesn’t sound interesting to you, now is a good time to hop off this train.
Motivation and design
I’ve been listening to music off of OCRemix ever since 2002. The way I used
to find new songs was to repeatedly visit their website, look for new songs,
and then click through three or four links for every new song that was posted.
I got tired of checking the website, only to end up finding out there were no new songs,
or finding out there are ten new songs, and having to click 30 times to hear all of them.
So I set out to figure out how to make it much, much easier to hear new songs and download
them if I liked them.
Existing OCRemix Feeds
OCRemix has it’s own Twitter feed, which they will use
to post new remixes, but it’s not exclusive to providing just new songs:
Twitter happens to have this nice feature that they call cards. Cards are essentially a feature that detect certain types of URLs, and display information within the tweet that would only be accessible from visiting the URL. One of the
supported card types are Youtube links, which will embed a Youtube video into a Tweet
whenever a Youtube link is detected.
Sometimes the links to new songs include a Youtube link to the remix for easy listening…
and sometimes they don’t:
They also have their own Youtube channel
where new songs are posted, but similarly to their Twitter feed, it is not exclusive to new songs.
My OCRemix Feed
Since I was already using Twitter as a news feed, I chose that as my target platform for
receiving new song notifications. The following is a list of some of the things I wanted
out of my Twitter feed system.
Only new songs should be posted — no reposts.
Only songs should be posted — nothing else but new OCRemix songs.
Every single song must contain a link to the Youtube song link.
Show the video game, remixer, and composer the song is remixed from,
if it fits in 140 characters along with the Youtube link.
This system should not require any input from me (automated).
If something is broken, I should be notified of the breakage.
Initially, I wanted to implement direct download links within each Tweet in order to simplify the process of downloading the mp3 for new songs. However, because ocremix.org is non-profit and receives advertising money from the main website to pay for bandwidth, I chose to just link to their site to download out of politeness.
This is what the final version of my implementation looks like:
Assuming this information fits into the 140 character limit, each one of these song tweets is composed of the following:
songId - Integer, official remix number (currently at 2827 at the time of writing).
title - Song title, given to the song by the remixers.
remixers - The remix artists that created the remix.
composers - The original artists that created the original song.
youtubeUrl - Link to the remix on Youtube.
writeupUrl - Official OCRemix link with remix information and a download link.
The actual project was implemented using Scala. The main libraries used were
Dispatch for OAuth and talking with Twitter’s API, and Akka for the periodic
polling of the RSS feed.
Fitting Info into a Tweet
Trying to detect whether or not a Tweet will fit in 140 characters is tricky, especially
if there are URLs in the Tweet. Twitter will automatically shorten URLs for you — the caveat
is that the length of the shortened URL is what counts to the 140 character limit,
and not the size of the original URL.
So how do you figure out what the shortened URL size will be?
Twitter happens to have a configuration API that will tell you the current length
of URLs (https links are one character longer than http). Since this number can
change, it can’t be hardcoded into the system. It’s also not a number that is likely
to change often.
To take this varying URL length into consideration, the previously known shortened URL length is cached in memory for quick usage. I poll the configuration API
every 24 hours to check for and update the URL length cache if necessary.
This polling is done using Akka’s scheduler:
valtwitterConfigUpdaterSchedule=MySystem().scheduler.schedule(initialDelay=0seconds,frequency=1day,receiver=configUpdater,// If I was writing this again, I probably wouldn't use a "doit" message// for initiating an update.message="doit")
Sometimes there’s just too much information to include in 140 characters. Going back
to my design goal of making it easy to hear new songs, the Youtube link is the
most important thing to include in the Tweet.
Here’s a shortened version of the code that figures out what to put into a tweet:
// The youtube URL is in every possible one of these.lazyvalv0=Tweet(songId,title,remixers,composers,youtubeUrl,writeupUrl)lazyvalv1=Tweet(songId,title,remixers,youtubeUrl,writeupUrl)lazyvalv2=Tweet(songId,title,youtubeUrl,writeupUrl)lazyvalv3=Tweet(songId,title,youtubeUrl)lazyvalv4=Tweet(songId,youtubeUrl)// Start with the longest possible, then slowly go to the shortest.if(v0.isTweetable)v0elseif(v1.isTweetable)v1elseif(v2.isTweetable)v2elseif(v3.isTweetable)v3elsev4
Unless Twitter plans on upping the shortened URL length to ~135 characters,
or the ids start incrementing in several orders of magnitude per new remix,
the smallest tweet content (songId,youtubeUrl) is good enough.
There’s a reason songId is included in all of the possibilities as well, more on that later.
Receiving Error Notifications
In order to make sure I received a notification whenever something breaks, every single
method that could cause an error returns an Either[_, String], where
the Right case stores the successful result, and the Left case stores a string explaining what went wrong. Whenever a method is completed
with the Left case, a direct message is sent to my own personal Twitter handle,
with information about the failure.
Here’s what error messages look like when they reach my Twitter inbox:
Some examples of methods that might fail that I want to be
alerted about if they do:
// Sends a private message to someone on TwitterdefdirectMessage(user:TwitterHandle,message:String):Future[Either[String,String]]// Tweets a new postdefstatusesUpdate(tweet:Tweetable):Future[Either[String,String]]// Retrieves the last few Tweets from a user's timelinedefuserTimeline(userId:String,screenName:String,count:Int):Future[Either[String,String]]// Retrieves configuration information about TwitterdefhelpConfiguration:Future[Either[String, TwitterConfiguration]]
(In retrospect, just using Future[String] would have been enough, since it
stores the exception information in case of failure… but keep in mind I did write this code as a novice just as I started learning Scala).
Polling and Parsing the RSS Feed
There’s nothing particularly fancy about this part — it mostly consists of
periodically polling OCRemix’s RSS and
parsing out new songs to post on Twitter.
Here’s the code that schedules an RSS polling check every half hour:
In order to figure out whether or not a song is new, I send a request to Twitter’s API to retrieve the @newOCRemixes Twitter Timeline, check the last Tweet’s songId, and make sure any new song I post happens
to have a songId greater than the one on the timeline.
Things to Improve
All of this mostly works. There are a few problems that were more complicated
to solve than the time I wanted to spend on this.
One of these problems happens to be that the RSS feed
only stores the 10 latest new songs — if there’s ever more than 10 songs posted
within a 30 minute interval, any songs prior to the last 10 will not get posted.
This has only happened once since I’ve launched this project, but there’s not
much that can be done about it, short of asking for the OCRemix owners to not post
so many songs at once, or up the number on their RSS feed.. or polling the website
itself, and parsing the song URLs from raw HTML.
Another problem (that hasn’t actually been much trouble so far) is that the error
message reporting is dependent on Twitter’s API working. If there’s a problem with
the directMessage call to send me a message on errors, I won’t be notified. I could have solved this by setting up a secondary notification system such as email, but errors happen so rarely that it wasn’t worth the trouble.
The third and last problem that I’ve run into is actually not one I can do much about. The Twitter card feature for Youtube videos sometimes won’t work — a few Tweets just won’t embed a Youtube video into the post, even if there’s a valid Youtube URL in the message.
Having said that, I’m pretty happy with the results — it’s been working without a hitch for the last year since it’s been deployed.
Pretty cool, right? Unfortunately, there haven’t been very many updates to the official version of this super awesome project lately,
so I decided to fork the project and start improving upon it myself. Since I use Scala quite often, I’ve got some pretty strong motivation to work on improving it.
Here’s a quick summary of what I’ve added to Scalariform so far:
// Parameter names, types, and defaults are aligned into three separate columnsdefshowInput[A](parent:Component=null,message:Any,title:String=uiString("OptionPane.inputDialogTitle"),messageType:Message.Value=Message.Question,icon:Icon=EmptyIcon,entries:Seq[A]=Nil,initial:A):Option[A]// Two newlines will result in separate alignment groups caseclassCake(icingFlavor:Flavor=Vanilla,cakeFlavor:Flavor=Chocolate,candles:Int=1,layers:Int=3,iceCream:Boolean=False)// Same feature working with method callso.manyArguments(abc=0,abcOne=1,abcTwo,abcThree=3,abcFour=4,abcFive=3)
And here’s how to use my version:
Scalariform Formatting with My Changes
// Add this to .../project/plugins.sbtresolvers+="Sonatype OSS Snapshots"at"https://oss.sonatype.org/content/repositories/snapshots"addSbtPlugin("com.danieltrinh"%"sbt-scalariform"%"1.3.0-SNAPSHOT")