Thinking Bigger: 2012

sexta-feira, 17 de agosto de 2012

HTTP API + Media Server

It's great to see the two buddycloud's GSoC projects working together. Denis has built a toy client to demonstrate his HTTP API, and this week he adapted it to handle channels avatars provided by the Media Server. Here are some examples:

https://www.denisw.de/buddycloud-http-client/

https://www.denisw.de/buddycloud-http-client/?channel=team@topics.buddycloud.org

Also, there are some examples of already stored media:

video: https://api.buddycloud.org/media/gsoc2012@topics.buddycloud.org/Jw8Eq6kWf6m2rmAg0Nrp
video thumbnail: https://api.buddycloud.org/media/gsoc2012@topics.buddycloud.org/Jw8Eq6kWf6m2rmAg0Nrp?maxheight=150
image: https://api.buddycloud.org/media/gsoc2012@topics.buddycloud.org/avatar
image thumbnail: https://api.buddycloud.org/media/gsoc2012@topics.buddycloud.org/avatar?maxheight=200
media list from a channel: https://api.buddycloud.org/media/gsoc2012@topics.buddycloud.org

terça-feira, 14 de agosto de 2012

One useful type of URL query are paging queries, they limit the returned data into a specified range so, since the amount of displayed items are lower, the interface performance is improved.

Since we are implementing a XMPP media server, would be great to have something similar to RSM (XMPP way to do pagination), at least, similar to its syntax. So, the implemented paging for the media server is like:

GET /channel@domain.com/media?max=10 -> returns max 10 metadata media info
GET /channel@domain.com/media?max=10&after=foo -> returns max 10 metadata media info after the media with id equals to foo

There are no new features to be implemented, so the effort in this final week is to polish the code and improve documentation.

sexta-feira, 10 de agosto de 2012

GSoC 2012 - Last sprint

What I'm doing right now? Finishing some details, like documentation and tests improvement, also, one more feature is being planned to be added, that is a RSM like URL query, this means:

GET /channel@domain.com/media?max=10&after=foo

Will return the first 10 metadata info (ordered by last modified date) after the media with id equals to foo.

Next monday (13th), is the suggested pencils down from GSoC 2012. It was a pleasure to work with the buddycloud team, and I've learned A LOT during this period.

sexta-feira, 27 de julho de 2012

Media Server is deployed!

Finished the XEP-0070 implementation, the Media Server was ready to be deployed, and yesterday the first tests with Denis' HTTP API + Media Server were excellent! Some bugs, of course, but already fixed, and we are very excited with the result of our efforts.

Although, there are some improvements needed by the Media Server for the next couple of weeks:
- Handle video media previews;
- Handle clients with bare JIDs: the XEP-0070 negotiation with those clients needs to be done via XMPP messages;
- Handle audio media: set length property;
- Documentation.

quarta-feira, 25 de julho de 2012

Media Server user authentication - XEP-0070

XEP-0070 is a known specification of how verify HTTP requests via XMPP. It has basically 8 steps.

In the Media Server, when a HTTP request arrives, the HTTP side forwards the request to a AuthVerifier class, this class has control over an XMPP component, to send and receive packets in a synchronous way, via a SyncReplySend util class. Once the AuthVerifier class receives the request, it "asks" if the client has sent it, if yes, the request is authorized, if not, the HTTP side returns a 403 error.

Here is the sequence diagram:

To send its credentials, the client has two options:

Via HTTP auth: Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Via URL: /media/test@topics.buddycloud.org?auth=QWxhZGRpbjpvcGVuIHNlc2FtZQ==

In both ways, the client's JID and transaction id, are separated by a ; and are base 64 encoded.

This week, we hope to do the first deploy, to finally see the Media Server running in a production environment!

quinta-feira, 19 de julho de 2012

Media Server Pub-Sub based authentication

As already cited at the GSoC's proposal, the Media Server would use pubsub capabilities to give/deny access a channel's media.

Although the concept is quite simple, and the necessary packets to trade such information between a client and a pubsub server, the biggest challenge was adapt the Smack API to the particularities from the buddycloud's XEP. The work was a lot easier because in the last year GSoC, another buddycloud's student (Abmar Barros) had to implement similar functionality into his project, also using Smack.

So, to verify if an user like "rodrigo@buddycloud.org" can POST a media into "private@topics.buddycloug.org" all that the Media Server does is retrieve the channel's affiliations, and check if "rodrigo@buddycloud.org" has post capabilities:

<iq from="mediaserver@buddycloud.org/Client" id="Eq5j2-2" to="channels.buddycloud.org" type="get"> 
  <pubsub xmlns="http://jabber.org/protocol/pubsub#owner"> 
    <affiliations node="/user/private@buddycloud.org/posts"> 
  </affiliations></pubsub> 
</iq>

RESPONSE:

<iq from="channels.buddycloud.org" id="Eq5j2-2" to="mediaserver@buddycloud.org/Client" type="result">
  <pubsub xmlns="http://jabber.org/protocol/pubsub#owner">
    <affiliations xmlns="http://jabber.org/protocol/pubsub#owner">
      <...>
      <affiliation jid="rodrigo@buddycloud.org" affiliation="publisher"/>
    </affiliations>
    <set xmlns="http://jabber.org/protocol/rsm">
      <first index="0">
        user1@buddycloud.org
      </first>
      <last>
        user5@buddycloud.org
      </last>
      <count>
        5
      </count>
    </set>
  </pubsub>
</iq>

What are the next steps?
- Fist I'd like to deploy a toy client to demonstrate some of the Media Server functionality;
- After that, the next step is the implementation of the XEP-0070 and documentation.

sexta-feira, 29 de junho de 2012

Media Server XMPP component

The main module from the Media Server is its XMPP component, it has several responsibilities, including the authentication process.

The authentication process that will be used is based on the XEP-0070, to illustrate how it would work, here is a sequence diagram:

As described on XEP-0070, the Media Server will provide two types of HTTP authentication, Basic and Digest, both requests user's login and password as authentication mechanism, in this way, the Media Server has to have a mechanism where it will verify the passed password.

The authentication response and authorization request are already implemented, the next steps are implement the verify password mechanism and request confirmation process.

edit: changed the sequence diagram after discussed some points with Kev.

quinta-feira, 14 de junho de 2012

GSoC 2012 - Everything seems to be flowing well

Last week I've faced some problems due to URL confusion, but, it was easily fixed changing a little the approach that I was taking in the resources implementation.

In resume, this is what was implemented this week:

All Media Server API methods without any kind of authentication. In this way, the whole defined actions (upload, update, download and delete) are working (in a very naive way, but still...);
Simple tests just to verify if everything is working in a minimum condition;
Documentation improvement at buddycloud's wiki;
A test suite, where we can run all test cases developed so far.

I also started the implementation of the other main Media Server feature, the XMPP component. It will be responsible to communicate with the buddycloud server and ask about users permissions based on pub-sub. I'm expecting a lot of tricky coding challenges in this phase :-D

Other interesting information is about the authentication process that I'm planning to use for the Media Server: http://xmpp.org/extensions/xep-0070.html
It is a well defined XEP about how to authenticate HTTP requests via XMPP. Denis, the other buddycloud's GSoC student, has started a very interesting discussion about how would work the HTTP Pub-Sub API and the Media Server together using the XEP cited above: https://groups.google.com/forum/?fromgroups#!topic/buddycloud-dev/5KOmqMKXEko

I'm planning to end the XMPP component in the next couple of weeks with the functionality of handling media for public buddycloud channels, this will allow a first 0.0.1 version deploy and further real tests. As usual, what I expect is to experience a lot of bugs and crashes, but not in everything (I hope so...).

sábado, 9 de junho de 2012

GSoC 2012 - Some specification changes

During the last two weeks, we decide to do some changes at the URLs definitions and the stored metadata of each media.

Now, the URLs are:

POST /media/<entityId>

PUT, DELETE, GET /media/<entityId>/<mediaId>

POST, PUT, DELETE, GET /media/<entityId>/avatar

Also we defined two query parameters, maxheight and maxwidth, in this way, if the client sends a request like

GET /media/lounge@topics.buddycloud.org/aksdh10201d?maxheigth=100&maxwidth=100

The returned media is a preview of " aksdh10201d " with dimensions that not exceed 100x100. It is assumed that the client will ask for previews of "thumbnailing" media, like images and videos.

Since /media/<entityId>/<mediaId> and /media/<entityId>/avatar can be confused by Restlet's router, I'm facing some problems due to that, I'm trying to figure out a solution that doesn't involve changing those defined URL templates.

Other significant change occurred at POST media request, now, the client should provide a multipart/form-data with 6 parameters:

uploadtoken: related to authentication process;
filename: the real name of the file (including its extension);
title: a title for the media (optional);
description: media's description;
author: the user that is uploading this file;
binaryFile: the file content.

The server response is a JSON representation of the uploaded media metadata:

{
   "id":"lETuJi8rPE4IfQrygN6rVtGx3",
   "fileName":"testimage.jpg",
   "author":"user@domain.com",
   "title":"Test Image",
   "mimeType":"application/octet-stream",
   "description":"My Test Image",
   "fileExtension":"jpg",
   "shaChecksum":"bc46e5fac2f1cbb607c8b253a5af33181f161562",
   "fileSize":60892,
   "height":312,
   "width":312,
   "entityId":"channel@topics.domain.com"
}

Next week, I'm planning to fix the URL confusion issue, and develop the first XMPP functionalities, like upload/download media from public channels.

sexta-feira, 25 de maio de 2012

GSoC 2012 - Coding has started!

Since May 21th, the Google of Summer 2012 has entered its coding period. After a lot of reading and documentation conception, we are finally doing the fun part =)

In this first week, I've implemented two methods for the Media Server HTTP API (with tests):

Once the main idea right now is prototyping, both methods doesn't have authentication yet, but is possible to publish the media server into a web server and have simple upload/download functions.

GET /channel/{channelId}/media/{mediaId}
POST /channel/{channelId}/media

Also, a JSON representation of the Media was defined:

{
  "id": string,
  "uploader": string,
  "title": string,
  "mimeType": string,
  "description": string,
  "uploadedDate": datetime,
  "lastViewedDate": datetime,
  "downloadUrl": string,
  "fileExtension": string,
  "md5Checksum": string,
  "fileSize": long,
  "length": long, // for videos
  "height": int, // for videos and images
  "width": int  // for videos and images
}

In this way, when a client tries to upload a media, it has to provide two fields at the multipart/form-data (more fields can be added later, for authentication as example). The body field, that will have the media json representation and the binaryFile field that will have the file content:

-----Aa1Bb2Cc3---
Content-Disposition: form-data; name="body"; filename=""
Content-Type: application/json; charset=UTF-8
{"fileExtension":"jpg","md5Checksum":"cd1b78e4686ae1a3cfb461a1085545f0","width":312,"uploader":"user@domain.com",
"downloadUrl":null,"id":"testFileId","fileSize":60892,"title":"testimage.jpg","height":312,"description":"a
description","length":null,"mimeType":"application/octet-stream","lastViewedDate":null,"uploadedDate":null}
-----Aa1Bb2Cc3---
Content-Disposition: form-data; name="binaryFile"; filename="testimage.jpg"
Content-Type: image/jpeg

...file content...

-----Aa1Bb2Cc3-----

When the server receives the file, it makes some verifications, like: MD5 checksum matching, file size matching, etc.

On the other side, we have the download method, it simple returns the requested file with the {mediaId} passed as argument.

The next steps are:
- Implement methods to upload/download channel's and user's avatars;
- Implement the XMPP component;
- Implement full upload/download for public channels functionality.

domingo, 29 de abril de 2012

Buddycloud Media Server Architecture

After some conversations with my mentor (dodo), we finally have a bigger picture of how should work the media server communication flow, here is a short description of the "posting" media functionality:

Client discovers the channels media server;

Since it is a XMPP Component, it should be "discoverable";

Negotiate media url;
Client receives a link for public channels and a cookie for private ones;
Upload data, PUT or POST request to the media server http static file server.

Here is an architecture overview diagram:

Where:

Media Server:

HTTP FIle Server: static file server, where clients will be able to GET and PUT (or POST) files;
XMPP Component: component that will evaluate user's permissions and also generate valid URLs for the client download or upload files through the HTTP File Server.

Storage:

Medatada: stores files metadata, like checksum, upload date, etc;
Binary Files: stores the files themselves (and also their previews).

That means, that both Media Server and Client should talk two "languages", HTTP and XMPP. But this would change (or not) after Denis, the other buddycloud's GSoC student, finishes his project.

quarta-feira, 25 de abril de 2012

Google Summer of Code 2012

This year I'm participating of the Google Summer of Code for the first time, I'll be working with an exciting XMPP project called buddycloud.

The project title is An XMPP Media Server, it basically consists of the development of an XMPP based media server that will have a HTTP face to facilitate the use by different types of clients.

The media server should be responsible for getting, adding and deleting of files (images, videos and documents), treating properly of the authentication process and also providing previews if the client doesn't want to download the whole stored file. That's why this project should bring even more developers for the XMPP world.

I will be reporting all the process evolution, problems, challenges and ideas at this blog =)

sexta-feira, 6 de abril de 2012

Hadoop + Amazon EC2 - Using Apache Whirr

Apache Whirr is a set of libraries for running cloud services. You can use it to create EC2 instances running Hadoop, just like the previous tutorial. But, Whirr also provides a lot of other interesting functionality that worth to give a try.

Other advantage is the possibility to run into different cloud providers without worrying about each one configuration details. Of course that you need an active account.

You can find a simple tutorial at Whirr's homepage: Getting Started with Whirr.

quarta-feira, 14 de março de 2012

Hadoop + Amazon EC2 - An updated tutorial

There is an old tutorial placed at Hadoop's wiki page: http://wiki.apache.org/hadoop/AmazonEC2, but recently I had to follow this tutorial and I noticed that it doesn't cover some new Amazon functionality.

To follow this tutorial is recommended that you are already familiar with the basics of Hadoop, a very useful "how to start" tutorial can be found at Hadoop's homepage: http://hadoop.apache.org/. Also, you have to be familiar with at least Amazon EC2 internals and instance definitions.

When you register an account at Amazon AWS you receive 750 hours to run t1.micro instances, but unfortunately, you can't successfully run Hadoop in such machines.

On the following steps, when a command starts with $ means that it should be executed into the local machine, and with # into the EC2 instance.

Create an X.509 Certificate

Since we gonna use ec2-tools, our account at AWS needs a valid X.509 Certificate:

Create .ec2 folder:

$ mkdir ~/.ec2

Select “Security Credentials” and at "Access Credentials" click on "X.509 Certificates";
You have two options:

Create certificate using command line:

$ cd ~/.ec2; openssl genrsa -des3 -out my-pk.pem 2048

$ openssl rsa -in my-pk.pem -out my-pk-unencrypt.pem

$ openssl req -new -x509 -key my-pk.pem -out my-cert.pem -days 1095

It only works if your machine date is ok.

Create the certificate using the site and download the private-key (remember to put it at ~/.ec2).

Setting up Amazon EC2-Tools

Download and unpack ec2-tools;
Edit your ~/.profile to export all variables needed by ec2-tools, so you don't have to do it every time that you open a prompt:

Here is an example of what should be appended to the ~/.profile file:

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export EC2_HOME=~/ec2-api-tools-*
export PATH=$PATH:$EC2_HOME/bin
export EC2_CERT=~/.ec2/my-cert.pem

To access an instance, you need to be authenticated (obvious security reasons), in this way, you have to create a Key Pair (public and private keys):

At https://console.aws.amazon.com/ec2/home, click on "Key Pairs", or
You can run the following commands:

$ ec2-add-keypair my-keypair | grep –v KEYPAIR > ~/.ec2/id_rsa-keypair

$ chmod 600 ~/.ec2/id_rsa-keypair

Setting up Hadoop

After download and unpack Hadoop, you have to edit the EC2 configuration script present at src/contrib/ec2/bin/hadoop-ec2-env.sh.

AWS variables

These variables are related to your AWS account (AWS_ACCOUNT_ID, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), they can be found at logging at your account, in Security Credentials;

The AWS_ACCOUNT_ID is your 12 digit account number.

Security variables

The security variables (EC2_KEYDIR, KEY_NAME, PRIVATE_KEY_PATH), are the ones related to the launch and access of an EC2 instance;
You have to save the private key into your EC2_KEYDIR path.

Select an AMI

Depending on Hadoop's version that you want to run (HADOOP_VERSION) and the instance type (INSTANCE_TYPE), you should use a properly image to deploy your instance:
There are many public AMI images that you can use (they must suit the needs for most users), to list, type

$ ec2-describe-images -x all | grep hadoop

Or you can build your own image, and upload it to an Amazon S3 bucket;
After selecting the AMI you will use, there are basically three variables to edit at hadoop-ec2-env.sh:
- S3_BUCKET: the bucket where is placed the image that you will use, example hadoop-images,
- ARCH: the architecture of the AMI image you have chosen (i386 or x84_64) and
- BASE_AMI_IMAGE: the unique code that maps an AMI image, example ami-2b5fba42.
Other configurable variable is the JAVA_VERSION, there you can define which version will be installed along with the instance:
- You can also provide a link where would be located the binary (JAVA_BINARY_URL), for instance, if you have JAVA_VERSION=1.6.0_29, an option is use JAVA_BINARY_URL=http://download.oracle.com/otn-pub/java/jdk/6u29-b11/jdk-6u29-linux-i586.bin.

Running!

You can add the content of src/contrib/ec2/bin to your PATH variable so you will be able to run the commands indepentend from where the prompt is open;
To launch a EC2 cluster and start Hadoop, you use the following command. The arguments are the cluster name (hadoop-test) and the number of slaves (2). When the cluster boots, the public DNS name will be printed to the console.

$ hadoop-ec2 launch-cluster hadoop-test 2

To login at the master node from your "cluster" you type:

$ hadoop-ec2 login hadoop-test

Once you are logged into the master node you will be able to start the job:

For example, to test your cluster, you can run a pi calculation that is already provided by the hadoop*-examples.jar:

# cd /usr/local/hadoop-*

# bin/hadoop jar hadoop-*-examples.jar pi 10 10000000

You can check your job progress at http://MASTER_HOST:50030/. Where MASTER_HOST is the host name returned after the cluster started.
After your job has finished, the cluster remains alive. To shutdown you use the following command:

$ hadoop-ec2 terminate-cluster hadoop-test

Remember that in Amazon EC2, the instances are charged per hour, so if you only wanted to do tests, you can play with the cluster for some more minutes.