Ontologies Expanded


Ontologies Expanded
Conceptual awareness helps people classify and distribute information.
_despite the advances being made in machine learning human engagement is still key in determining the the importance of data and its nature. _

When you classify according to your system you help machines "learn" classification and cross referencing.

Originally shared by David Amerland

Ontologies Being Built Everywhere

Anyone using Google+ Collections intuitively understands the concept of an Ontology. A grouping of posts and subjects that are unified by a theme or a collection that is grouped under a specific attribute. The grouping helps create and then identify the Ontology and the Ontology itself adds an additional layer of metadata onto the grouping.

You can see why the word "Ontology" in itself has its roots deep in philosophy and it's a subject that deals with the very nature of existence.

So, essentially, by grouping things we classify them, by classifying them we create order and impart structure upon them. Our activity (and thoughts) create structured data out of the chaotic data of the digital universe.

The StumbleUpon share tool recently underwent a change which enables users who share articles to classify them so that they are tagged and begin to fall into specific Ontologies within its index.

There are several educational takeaways here. First, despite the advances being made in machine learning human engagement is still key in determining the the importance of data and its nature.

Trusting Us to Tell the Truth
StumbleUpon (like Google's automated indexing of data) has a handy feature which allows us to declare the subject of the post being shared and then classify it into specific categories. The weakness of using human input is that A. It is fallible and human error can creep in if we choose the wrong category by mistake and B. It is subject to gaming as human economic behavior kicks in.

The way this is addressed is very similar to the way Google's automated indexing of posts addresses it: By having more than one category to choose from, each of which is cross-referenced a certain judgement can be made as to the trustworthiness of the integrity of the poster and the truthfulness of the categorisations being made.

Suppose, for instance we shared something that talks about "search". The terms: SEO, Marketing, Optimization, Branding, Online Traffic and Website Optimization, could easily be chosen as part of its domain. A cross referencing of definitions and the content of the post would reveal whether this is the case or not.

Should we be tempted, in this case, to broaden the appeal of the post being shared through a 'bait and switch" approach where we click on another, popular category, StumbleUpon makes sure that the choices being presented are subject relevant, even if tangentially so. The temptation then is reduced from outright lying to the more acceptable fibbing where we can 'broaden' the appeal by picking additional, tangentially relevant categories (or even clicking on all presented categories in the mistaken belief that this will increase the exposure of the article we are sharing).

That action would raise red flags regarding the clarity of the signal of the importance and relevance of the piece being shared and it would then actually become less visible as StumbleUpon would not be certain as to which category it should really put it in and what real relevance it has with the ones picked.

We could get round that by being 'clever' and pick say only two or three categories, the most popular ones, thinking that this way we go for the 'moneyshot' and get all the high-visibility ones, but then we would knowingly be missing out on truly relevant ones we did not click on in order to not highlight our gaming and that usually goes against the grain as we would miss out on finding a truly relevant audience.

Successful gaming behavior requires that we economically broaden the appeal of what we do and find a bigger audience that can be filtered for relevancy rather than lose relevant audience and hope that the audience we do get is sufficiently large for its conversion percentage to make up for the loss.

Seeing how we cannot bend the rules and get away with it the only other option left to us should we wish to game the system is to outright lie and simply start off with a category (and options) that has nothing to do with the true nature of the article we share. So, a piece on SEO could be shared in the every popular parenting category with the assumption that some parents at least will be interested in SEO.

That however raises other red flags as the content of the piece (which machines now can read) would clash visibly with the category we chose to share it in and that would immediately mark our share as untrustworthy. Thing that would then affect our reputation in the social network and the fate of future shares.

The only viable option we have then is to actually spend some time creating truly compelling content and then think really carefully about how we tag it when we are asked to do so.

Google's semantic search does not work quite so transparently in terms of asking us to tag content but the principle behind it remains largely the same. In a web where data and people are connected via their lines of engagement the veracity of the former and the trustworthiness of the latter are calculable and can lead to significant impact.

The web of trust and Google's "Truth Engine" much like StumbleUpon's sharing of content and the request to tag it into Ontologies have simply made gaming behavior as costly as the real thing. Bereft of a shortcut the real thing becomes more attractive as well as easier.

A connected web transparent to machine classification makes us more honest, automatically.

Comments

Popular Posts