Quantcast
Channel: ulcc da blog » blogs
Viewing all articles
Browse latest Browse all 7

BlogForever: Thoughts about blog data and metadata

$
0
0

From the BlogForever blog.

During the ArchivePress project at ULCC, we briefly considered the data and metadata generally made available with blogs and blog posts. As ArchivePress focused on the representations of blogs in newsfeeds, we examined the metadata that is generated in common, and exposed in the newsfeeds of three of the most common blog platforms, WordPress, Blogger and TypePad. Blogger and Typepad prefer the Atom newsfeed format; WordPress (particularly WordPress.com) prefers RSS (though it can be made to publish Atom feeds too). This analysis was done, about a year ago, things may have changed, but here is a summary of what we found.

For each Blog, the following core information is available in the feeds:

WordPress (RSS) Blogger (Atom) Typepad (Atom)
Feed Unique ID NA feed/id feed/id
Blog URL rss/channel/link feed/link@rel=”alternate” feed/link@rel=”alternate”
Blog Title rss/channel/title feed/title feed/title
Blog Description rss/channel/description feed/subtitle feed/subtitle
Date of last update rss/channel/lastBuildDate feed/updated feed/updated
Generating software rss/channel/generator feed/generator feed/generator

For each Post, we established that the following core information is available in the newsfeeds:

WordPress (RSS) Blogger (Atom) Typepad (Atom)
Post Unique ID rss/channel/item/guid@isPermaLink feed/entry/id feed/entry/id
Post Title rss/channel/item/title feed/entry/title feed/entry/title
Post Summary rss/channel/item/description NA feed/entry/summary
Post URL rss/channel/item/link feed/entry/link@rel=”alternate” feed/entry/link@rel=”alternate”
Date of publication rss/channel/item/pubDate feed/entry/published feed/entry/published
Date of last update NA feed/entry/updated feed/entry/updated
Post Author rss/channel/item/dc:creator

rss/xmlns:dc

feed/entry/author/name feed/entry/author/name
Post Category rss/channel/item/category feed/entry/category@term feed/entry/category@term
Post Content rss/channel/item/content:encoded

rss/xmlns:content

feed/entry/content feed/entry/content
Post Comments rss/channel/item/comments feed/entry/link@rel=”replies” feed/entry/link@rel=”replies”
Post Comments Feed rss/channel/item/wfw:commentRss NA NA

One interesting point we noted was that neither Blogger nor Typepad published a link to a Comments Feed for each post. This made our work on ArchivePress more difficult since it was predicated on being able to easily identify the Comments feed for each post, and harvest new Comments as they were published. Obviously for blogs generated other than by WordPress, this was not going to be so easy. (Our ace developer Emanuele found some workarounds, but that’s another story.)

I think this offers us an interesting overview of the core of standard, structured blog data and metadata, in three of the leading blog platforms. This is the data structure and metadata profile that is maintained in blog databases, in one of its native forms, and I’d expect it to be present in all blog platforms, since it arguably represents the essence of blogs. I hope this will be useful background when considering the core models for data and metadata handling that will be developed for BlogForever.


Viewing all articles
Browse latest Browse all 7

Latest Images

Trending Articles





Latest Images