Semantic Web, Can it Happen?

Not if but when

What is the Semantic Web? It’s the Internet in it’s current information pipeline form but with the addition of machine understandable language describing web based objects (pictures, text, audio/video). The term semantic describes the essential meaning of words and in the case of text on the Internet, is usually represented by tags or keywords. The process of representing information with tags is currently being implemented with products made by Zemanta and Open Calais.

There are still many who believe a semantic web is beyond our technical ability to craft. And that algorithms and definitions will never agree on identifying tags. There is validity to the argument in regards to individuals, we often have our own view of the meaning of words and concepts. But the advantage of web based learning algorithms is a common set of definitions. In this case many semantic processing teams are using wikipedia as an authority for a fair set of definitions. If wikipedia links to a site it’s considered relevant to the page topic. While this can be gamed temporarily by malicious editors, overall the system has proven to be highly effective at sharing information.

I have a great interest in this area for a couple of reasons. First and foremost I think this is the beginning of more intelligent software. Second it feeds into a project I started working on about a month ago that will match tags derived from social commentary to other converstions happening in real time as well as relevant advertising. There are several businesses emerging in this market (Lazyfeedmy6sense, + others, and APML is working on a standard).

Why I make videos

I’m wrapping up with a quick description of why I make these bobbly head videos while out walking.

  1. I walk a lot. Exercise gets my brain going, and I’m a big fan of multitasking whenever I can get away with it
  2. It’s a memory assistant, I can document ideas when they’re fresh in my mind and as I’m reading material
  3. It’s a tool of introspection. I can listen for the concepts I most often refer to, for keys into my deeper interests, and then leverage that interest to get some good work done either by blogging or coding

Introducing Snap Shots from Snap.com

I just installed a nice little tool on this site called Snap Shots that enhances links with visual previews of the destination site, interactive excerpts of Wikipedia articles, MySpace profiles, IMDb profiles and Amazon products, display inline videos, RSS, MP3s, photos, stock charts and more.

Sometimes Snap Shots bring you the information you need, without your having to leave the site, while other times it lets you “look ahead,” before deciding if you want to follow a link or not.

Should you decide this is not for you, just click the Options icon in the upper right corner of the Snap Shot and opt-out.

Please let me know what you think in the comment section.

Reblog this post [with Zemanta]

  • phomer

    I've always figured the problem with the semantic web wasn't whether or not it was technically possible, but whether or not the payback from all of the extra work was really worth it for most people. Realistically, just throwing up a simple basic web site is enough to utilize the web's presence. Interconnecting that web site with tonnes of other stuff is neat, but does it really draw in that many more viewers? Are they there for the data, or just because the site is interesting? How long will that last?

    Paul.

  • http://www.victusspiritus.com/ Mark Essel

    The semantic web will potentially benefit us by allowing search and more intelligent utility of all the great info available on the web. Here's the wikipedia link Semantic Web. If you check down a bit you can scan through all the potential applications.

    My own preferred tools focus on search and relevant ads that are related to the intent of a users social interactions online. Over time simple software systems can understand our favored topics and simple relationships between those topics. This will happen with user assistance and training. There are many technologies that are coming together.
    They'll have to before I have a functional virtual assistant that can scan the web for information relevant to my discussion 3 months ago and have it waiting in my input stream. I won't see ads that I have no probability of clicking or acting on.

  • phomer

    Sure it's fine for the researchers wanting to investigate the relationships between people and pages, or even the marketers wanting to target their efforts down to specific people, but how does it help the ordinary user or data provider?

    If your perspective is oriented from the data provider's view, those people that put up web sites and maintain them, they aren't really interested in spending huge parts of their lives adding meta-data to everything if it isn't going to buy them something “real”. It's even worse now with people shifting away from free stuff, than it was when Tim first started pushing these ideas. Ultimately people need some type of payback for their efforts, altruistic motives alone don't last long enough.

    Another point is that computers aren't intelligent, thus any sort of virtual assistant is only as smart as the logic that was coded into it by the programmers. That type of logic, generally rigid doesn't get very far before people become frustrated with how un-intelligent it really is. Tools shouldn't try to act intelligently, they should try to do bigger things with less effort. In a real sense, we should leave the “intelligence” to the user, while trying to give them more powerful to make acting on their desires as nearly trivial as possible. I don't want a virtual assistant, I want an interface to easily pull up my conversation history “correctly” (all of it, and nothing from anyone else).

    Paul.

  • http://www.victusspiritus.com/ Mark Essel

    First, thanks for taking the time to leave a well thought out description of your take on this. Much appreciate the feedback even if I don't see eye to eye on this topic.

    I certainly wouldn't want to be required to tag everything I did so I understand your view. The semantic meta-tagging of data should be primarily done by automated systems with an authority system and aided through voluntary learning.

    The intelligence I'm referring to is searching for information relevant to me (so it could easily be characterized by rigid logic).

    But if you don't believe in machine intelligence in search than I'd argue take a look at Google. The tool can discover errors in spelling, and correlation of words to narrow down information fairly quickly. The authority system in place allows for massive rejection of clutter (keyword spam, link spam, etc).

    The intelligence I hope for in semantic data is for the tags I don't know as a user. The machine (software on a computer) knows article A and B are related by a few meta data tags. It can “discover” them in near real time while I'm sleeping and present findings to me when I have free time.

    Spelling isn't proof of intelligence but it's a great example of partial matching in action, the machine has a library of words (has knowledge of them). It is capable of correct spelling even though a decision making individual like myself is wrong.

    Chess programs that cost $40 can easily decimate any simple tactics I can muster by not only using look ahead, but by knowing (memory) the value of tactical positions.

    The Intelligence I think you are doubtful of is the ability to reason or make analogies. I agree that machine intelligence in this area is still sometime off.

    Consider the simple sentence: “Ice and snow are cold to the touch”

    This returns from Zemanta's semantic extraction algorithm with the following tags:
    Snow, Vineyard, Mendoza,Valle de Uco, Winter, School Time, Kids and Teens, The Earth

    While there are bogus correlations, even Mendoza, Valle de Uco and Vineyard are somewhat related,see this picture that came up while Googlin Valle de Uco.

    Humans learn from humans, why can't complicated correlative software learn from humans?

  • http://twitter.com/robertmale robertmale

    I've been watching the information roll by and waiting on bits and pieces of the semantic web to fall in place for a couple of years now. I think I have a handle on the processes that will be involved, and have a good idea of what should be expected from the web of things, cyber-agents, and all and sundry entailed in the dream of the semantic web.

    I have looked at specifications and jargon riddled documents regarding the mechanics, the coding, involved in creating ontologies and even a little bit of the works of how the tags and meaning will be handled. I think I fairly understand what goes into and what comes out of a database as a low level, already organised, information processing unit.

    What I have thus far failed to grasp is how will it help me with the things I want to do, with what I want to make smarter. I have three blogs, two annotated link sites (I post a link and my notes/impressions about the topic), and then I have copious sets of notes on myriad topics. I want to be able to leverage this knowledge. I want it to work for me so that I pick a topic and I have it all at my fingertips.

    To that end I have been tagging everything manually, making connections between topics and ideas, and generally spending a lot of time putting together an index that can answer questions I've already asked, perhaps figuratively direct future questions, and generally be of help or interest to me in generating new content built upon what I already have, and to be of help to the people who read my sites.

    The blog tagging is nearing completion (for most of my archives) and the difficult part is making the connections between tags that are related but not visibly so. For example marketing and stock markets seem similar but are not too similar. Looking for something about art might send you looking at tags for painting, sculpting, and photography. Then there are the less obvious connections like process or method, and approach as tags.

    This is all with a small subset of data, never mind what it will take to expand out elsewhere and forward in time as more information is published. Admittedly, my disconnect in understanding may be because I do not grasp enough of the concept. Having researched and watched the progress of semantic web blogs, articles, discussion videos, analysis that people are doing and expecting to do in this space I am uncertain if anyone has the answers I seek yet.

    So, I guess I do not know if it can happen to an important enough level of sophistication, least of all working outside of rigid and extensive pre-programmed connections and tonnes of human-done work in the background.

  • http://www.victusspiritus.com/ Mark Essel

    Robert your needs are precisely the type of problem space the semantic web promises to aid. What you require is data information aggregation and connection. But through your creative process you've generated much content and the tags associated with that content aren't always clear.

    How would a systematic and automated tag assignment function potentially help you create a better knowledge database out of your content? This is the type of tool I'm currently using with Zemanta's API. There are spurious tags, but the automation of tagging allows connectivity of previously disparate data. There is also a weight associated with how often themes recur in your work. My hypothesis is that the tags and weights help position your knowledge within a larger cluster of web information. Now we can begin seeking connections between your blog tags, weights and others who share your interests and passions in their expressed ideas.

    The state of modern semantic tools can be used as a form of knowledge and information. Not just factual or opinionated work but fiction and fantasy as well as I recall you have great interest in dark horror writing. I forsee human voting mechanisms in helping to improve the connectivity introduced from automated extraction algorithms. The definitions or authority functions used are limited.

    Fantastic comment. I feel obligated to research the matter further and try and discover how a semantic web may best support Internet content generators. Do you have any related posts on the subject, I'd be happy to link to them?

  • phomer

    If we saw eye to eye, then it wouldn't be an interesting discussion :-)

    Google search and spelling are both great examples. In both cases a computer can find some issues with the data, but ultimately it takes a person to validate it, and often they have to ignore the incorrect associations. So a pattern matching tool can go out there and say that 200 lines match pattern X, but if you're only looking for a rather hazy subset of that, say 20 things, it's a problem. It is not that the computer found the correct 20 lines that is important, it's that it embedded them along with the incorrect 180 that matters. And more importantly that it cannot distinguish between the good and the bad.

    Thus, the computer highlights all of my spelling mistakes, but I still need to approve each and every one (and often the computer is wrong about my intent). There is no intelligence in spelling, it is a straight-forward (although complex) algorithm whose results are calculated then put forward for an intelligent being to validate.

    To answer your final question, I'm not really sure whether or not AI can exist (according to some like Roger Penrose, it can't happen, but I've never liked those explanations). But I am sure that right now, with our current state of technology, that the only way intelligent gets into code is by a programmer spending lots and lots of time to put it there. Ultimately, before the code ever gets interesting enough to do something cool, the overall complexity exceeds the programmer's abilities and the whole thing ceases to work as expected. We are still working with code at such a low level that it restricts what we can do with it.

    For software to learn from humans, it would have to have some dynamic capabilities that allow it to add in information outside of its original programming. It's as if the code would have to have the ability to rewrite itself to automatically add in more and more previously unknown data-structures. People have done this with code, in very limited ways, but even then, the overall dynamism was entirely restricted by how much the programmers themselves entered into the code. Until you can write an accounting program that will eventually rewrite itself to be an editor or a video game, then the code itself is still too limited to actually learn anything. Of course, once you have one program that can do this, you have all of them :-)

  • http://www.victusspiritus.com/ Mark Essel

    Now you're getting to the really fascinating stuff, programs that edit their own source code to “understand” new data structures. I get a little wide eyed when conversations go towards AI.

    I'm drinking the cool aid right now and going out on a limb by believing that software like this is even feasible. It would first have to understand how data is described before it could update it's ability to use the new data. But that's part of the promise of a semantic web, that information will be machine readable and associable.

    Your comments have added a great deal of value to my post, and I'm very grateful for it. Thanks again for your time and interest. I'll continue to read up on the topic and look forward to further conversations phomer. Feel free to say howdy on friendfeed (messel) or twitter (victusfate) anytime.

  • http://twitter.com/robertmale robertmale

    I'm glad to hear that this will be the level approached by the semantic web movement. A lot of the discussion is geared toward understanding names and locations or times, and tying them to together. The low hanging fruit, so to speak, of organising schedules and calendars, planning trips, and managing unified communications while useful, and a fine first step, should only be the beginning. Likewise having someone's works, whether they are an author or a scientist, compiled and annotated at your fingertips is also just the tip of the iceberg.

    Content generation is a topic I have come across. There have been a couple of patents put forward for content generators. One at least involved a partial programming schema. There is a video about a Patent on “Long Tail” for automated content authorship that I noted at http://tinyurl.com/l2y4zv complete with link.

    This automation project seems ambitious and quite forward-looking. Whether it will come to fruition in short order or not is another matter. Beyond the ability to generate content–or as I am looking for, simply to point out directions to research and tangents to become involved with–there is also the legal aspect depending upon the type of information required. This is about the silo issue where different companies and parties have their own proprietary masses of data and little wish to share. The web part of the semantic web eases this restriction partially because it looks at only freely available information. Changing the dynamic of data restriction and opening up the walled gardens will benefit everyone. We are seeing some of happen now in the social networks for example.

    More information that I have gathered and annotated is available on my TechStop: Tag Page – Semantic Web at http://tinyurl.com/kuqpgo

  • http://www.victusspiritus.com/ Mark Essel

    Openess is a big part of the attraction for algorithms that will help automate web data integration. I have some homework ahead of me, thanks much for the share Robert. I'll poke around and see what I can learn when it comes time to design a tag database cloud or a list of clouds (sorted by topic) per user.

  • http://www.victusspiritus.com/ Mark Essel

    Openess is a big part of the attraction for algorithms that will help automate web data integration. I have some homework ahead of me, thanks much for the share Robert. I'll poke around and see what I can learn when it comes time to design a tag database cloud or a list of clouds (sorted by topic) per user.