New node.js microformats parser – microformat-node

I have built a node.js microformats parser, it is based on my previous javascript parsing code. It has been packaged up so you can easily be add to your projects using npm.

Source code: https://github.com/glennjones/microformat-node
Test server : http://microformat-node.jit.su

Install


npm install microformat-node

or


git clone http://github.com/glennjones/microformat-node.git
cd microformat-node
npm link

Use

with URL


var shiv = require("microformat-node");

shiv.parseUrl('http://glennjones.net/about', {}, function(data){
    // do something with data
});

or with raw html


var shiv = require('microformat-node');

var html = '<p class="vcard"><a class="fn url" href="http://glennjones.net">Glenn Jones</a></p>';
shiv.parseHtml(html, {}, function(data){
    // do something with data
});

with URL for a single format


var shiv = require("microformat-node");

shiv.parseUrl('http://glennjones.net/about', {'format': 'XFN'}, function(data){
    // do something with data
});

Supported formats

Currently microformat-node supports the following formats:
hCard, XFN, hReview, hCalendar, hAtom, hResume, geo, adr and tag. Its important to use the right case when specifying the format query string parameter.

Response

This will return JSON. This is example of two geo microformats found in a page.


{
    "microformats": {
        "geo": [{
            "latitude": 37.77,
            "longitude": -122.41
        }, {
            "latitude": 37.77,
            "longitude": -122.41
        }]
    },
    "parser-information": {
        "name": "Microformat Shiv",
        "version": "0.2.4",
        "page-title": "geo 1 - extracting singular and paired values test",
        "time": "-140ms",
        "page-http-status": 200,
        "page-url": "http://ufxtract.com/testsuite/geo/geo1.htm"
    }
}

Querying demo server

Start the server binary:


$ bin/microformat-node

Then visit the server URL


http://localhost:8888/

Using the server API

You need to provide the url of the web page and the format(s) you wish to parse as a single value or a comma delimited list:


GET http://localhost:8888/?url=http%3A%2F%2Fufxtract.com%2Ftestsuite%2Fhcard%2Fhcard1.htm&format=hCard

You can also use the hash # fragment element of a url to target only part of a HTML page. The hash is used to target the HTML element with the same id.

Viewing the unit tests

The module inculdes a page which runs the ufxtract microfomats unit test suite.


http://localhost:8888/unit-tests/

Notes for Windows install

microformat-node using a module called ‘jsdom’ which in turn uses ‘contextify’ that requires native code build.

There are a couple of things you normally need to do to compile node code on Windows.

  1. Install python 2.6 or 2.7, as the build scripts use it
  2. Run npm inside a Visual Studio shell, so for me, Start->Programs->Microsoft Visual Studio 2010->Visual Studio Tools->Visual Studio Command Prompt

If you have the standard release of node it will probably be x86 rather than x64, for x64 there is a different Visual Studio shell but usally in same place.

Looking for new things to do

From this Friday I am looking for new things to do. I have pulled out of working full time at the company I co-founded. I will remain a director and major shareholder, and no Madgex is not in trouble in fact the opposite, it has just had its most profitable quarter in its history. After helping restructure it over the last year I have the opportunity to do other things.

This has left me in the lucky position of being able to follow my passions. I want to try to take the product research and design I have done commercially for years and mix it with my interests in open web and standards development.

In the next few months I want to research and build projects in a number of areas:

  • Web Intents/Activities
  • Personnel data stores using services like Dropbox
  • Possibility of semantic data reuse
  • Mobile web apps – the right way

I am sure within a few months I will want to work with teams again as I always want to design and build products that impact people’s lives, that usually means lots of calibrations, but at first I want some time to open my mind to new people and ideas.

No sitting on beaches for me. Watch this space I am about to turn up the volume…

DevUp 2012 – Barcelona

Last Friday I went to DevUp 2012 in Barcelona. The event focused on HTML5, but interestingly for me it mixed two different development communities. As well as web developers there is a whole world of games developers who are embracing HTML5 or more precisely Canvas. Darius Kazemi from bocoup did a talk which made a side-by-side comparison of the web and games development culture. As a games developer in a JavaScript company he had some nice insights.

I missed Javier Usobiaga talk on Responsive Web Design, which is a shame, but managed to go to the session by Ibon Tolosana on CocoonJS. It takes HTML5 canvas based games and boosts their performance by using an OpenGL ES execution environment. Through the day a recurring theme seemed to be that the current performance of Canvas on phones is just under what the gaming community are looking for. Although like petrol-heads I feel they will always want just a few extra frames per a second. The best example of this was Miguel Ángel Pastor’s talk on cross compiling JavaScript from C++. Not an approach I would take, but interesting.

I hope people found my talk “Beyond the page” on API’s of interest. The Web Intents demo did not work, although I am in good company as Paul Kinlan had the same trouble when he visited Barcelona. Maybe adding that one last demo at 1am the night before was not a good idea! Next time I am going to have backup screencasts.

The PeopleStore HTML5 app I showed on the day is unfortunately not online, but the codebits area of my site does have demos/code of some of the API’s it uses.

 

Sorry the video does not show the demo’s there is an earlier screencast of some of these demo’s from a previous blog post.

I would like to thank Ideateca for inviting me to speak and putting on such a good event, the live translation services and high quality video which was streamed live added to the events success.

Google are about to murder a good friend of mine

Let me start by saying the good friend is an API. Google have decided to close down the Social Graph API (SGAPI) on the 20th April 2012. I have spent the last couple of months thinking of a measured response, although I do somewhat agree with Jeremy Keith’s sentiment.

 

Jeremy Keith Tweet on SGAPI
“Dear Google, Fuck you. Signed, the people who actually use your APIs”

 

The API provides two main features the first of which lists our distributed identities across the web. So if I give it the URL of my Twitter profile it returns a list of profiles I have on other sites. I wrote an in depth List Apart article about how this feature works. The second feature tries to find links to profiles of people you have listed as friends/followers on social media sites.

Let’s be pragmatic about its true value

Brad Fitzpatrick built this API as a Google 20% project and it has never really lost its experimental roots. From the outset I was not a fan of the social graph friends listing. It’s too problematic, a lot of friend’s data is private and it’s too complex to mark-up and extract from web pages well. I personally wrote off using the social graph element of the API from the beginning. Evan Prodromou also made a good point that developers want to get authentication and social graph data together. I think lanyrd.com is a good example of this approach.

The identity aggregation element of the API was impressive if not a little too raw to be use on commercial sites. The results needed a degree of post processing to increase the quality. Although I would love to say increasing the quality of the results could be completely done by parsing open standards like hCard or FOAF, you do need to connect to some sites proprietary APIs to get profile data.

Google never tackled the quality issue or put the API on a commercial footing, both of which help stop most people using it beyond experimental hacks.

Panda, schema.org and identity based authority in search

In the last couple of years I have lost my faith a little in the ideas which gave birth to the SGAPI. Development of the semantic web and distributed open web, which seem to have drifted with the growth of monolithic services like Facebook, but things are changing.

Google’s recent changes to search have breathed new life into the semantic web concept. As Google tries to increase its search quality it is moving towards identifying entities (the blocks of structured information within a page). The SEO industry is now adding vast amounts of semantic mark-up to the web. This is being done not because it is the right thing to do, but because of the enhanced click through rates providing the right commercial motivation.

More importantly part of Panda’s new ethos is the promotion of identity authority. We can see this, both in Google’s search listing that displays recommendations from your friends and authorship profiles. They are attempting to link people to other entities such as articles by using mark-up like rel=me and rel=author. Profiles and how they are interlinked is a small part of the Panda concept but still important.

Identity based listing

 

The future – Getting the food chain right

Today’s web apps are often more about building ecosystems of service relationships than technology. These relationships are often chained together and always need each actor to be rewarded for their part.

Web authors or at least the SEO wing of our industry are now seeing a real return for adding semantic structures for entities such as products and reviews. The mark-up of people and organisations entities still have rogue claim issues and still may not become a strong part of Googles search listing. Let’s hope Google continues to support rel=me for identity authority and it drives them to resolve these issues. At the moment them seems to be moving towards a wall garden approach.

Unfortunately, although Google may have the ability now to build a much better API based on its latest developments parsing semantic information for search, it unlikely it will be created. Google is now bringing together its services into a coherent whole and focusing on building its own monolithic social network Google+. It just does not make commercial sense for it to support the open web without financial return.

Other companies have started to provide successful services in this area. Qwerly.com by Max Niederhofer was one of the most impressive identity aggregation APIs I have seen, it’s now part of fliptop.com. Products like rapportive.com and thesocialcv.com use the same technique under the hood. These companies are providing the next generation pay-as-you-go APIs blending together the semantic web and snowflake APIs.

Let’s hope we see on-going development of this new generation of APIs.

So on 20th April I will have a drink and say goodbye to SGAPI

I would like to thank Brad for giving us the SGAPI and everyone else who has worked on it. Although I can understand the commercial rationale that has driven Google to murder my friend, I am not sure I can forgive them for it.

Web Intents Design Push

After giving a talking at UXCamp Brighton about Web Intents, I ended up talking to a few designers about it and how they would love to help with its development and adoption. In fact we all felt that the design community was sometimes left out of the development process, when it has a lot to offer.

So, for the past couple of months I have been working with two of Brighton’s leading UX people Danny Hope and Andy Dennis to organise a UX design event. We are trying a slightly new format that we’ve coined a “design push”. The idea is to take a current technology or topic and focus a group of UX designers on a day of open collaboration with the aim of positively adding to its traction by the wider community.

Web Intents Design Push, 25 February 2012 – Brighton, UK

The first “design push” will be on Web Intents, an idea about how to standardise buttons such as Tweet, Like, Share, etc. on the web. My last few posts will give you some background. I have also created a brief screencast introduction to Web Intents.

I had some fun creating the screencast, I built a presentation using HTML for the first time. It’s worth a play as I built the demo’s of Web Intents directly into the presentation. Best use Chrome if you want to play with the demo’s.