Saturday, September 26, 2009

Intelligent Social Networks. Part 2

See the beginning of the article here.

D. Among the numerous discussions in various groups there may be a lot of information that is interesting to me, and I would prefer that the network helped me find it. But without an analysis of my profile, where I described my interests and preferences (i.e. what I'm reading and what I’m liking, what movies I watch and so on), again, it is difficult to achieve anything. The problems with my profile are that: a) it is incomplete, as in it is far from completely representing myself, b) to describe myself, I may not choose the standard semantics, or even a generally accepted terminology and ontology (see problems that led to the paradigm of the Semantic Web), and c) in my description, I am subjective or even try to pass my wishful being as my real self; for example, in my profile I can describe myself as an expert in some area, though I only read a few books on the subject and persuaded my friends, who do not have any experience with this topic, that I am in fact an expert and that they should give me recommendations "confirming" my "expertise". Does this mean that this system should "trust" my profile and recommend me to someone seeking advice or help in this area? One approach to this problem – determining the degree of credibility in my profile (and consequently of my posts), based on the trust analysis inside and outside (e.g. rating of scientific publications) the social network. Another - the auto-generation of my synthesized profile, computed on the basis of my contacts, posts and discussions – much like Google's PageRank, which pulls "semantics" from the use of websites. Such a profile, different from the one I'd write, would be more normalized, "semantically" clear to the network, and comparable with others, and would allow the site to find similarities and differences, classify it, and deduce common interests, yielding more accurate recommendations. Furthermore, such a profile would make online advertising more targeted and effective.

E. The network should be able to analyze (closer to human-style analysis) my correspondence to make my synthesized profile more accurate. To do this, it may use my discussions in various groups, feedback of other users, as well as forward to me, as a person competent in some areas, inquiries or requests for assistance from other users. My responses may then be used to deduce my competency based on feedback, and I may be further tested by having the system forward me requests intended for someone else. "Knowing" so much about me, the network should at this point be able to help me find the right contacts, opportunities and resources, for example, for completing a particular project. Or it may find experts who could answer my questions, give me appropriate links, or even solve my problem, keeping my budget in mind, as well as give me the opportunity to find projects that match my interests and allow me to participate, much like InnoCentive.

But where do we get the computing resources necessary? Today social networks are implemented as centralized systems, using the modern concept of virtualization - Cloud Computing. I think that we should move to a hybrid architecture, employing centralized services, and intelligent software agents which would use the computing power of individual users. This way the agents and resources necessary to shape and polish the synthesized profiles of their owners, as well as to personalize services and make them more intelligent, are proportional with the number of active users. By the way, the social network's developer can generate profit by offering agent templates of varying degrees of intelligence. And the most advanced agents can cooperate with each other to solve various problems, the complexity of which exceeds the capabilities of any individual agent, for example, the search described earlier of a person based on a rough description of him.

But if such an agent is already up and running on one social network, we can agree on a standard like OpenSocial and give it the ability to work cross-boundary, on all social networks (an interoperability problem). Or, at worst, we can create an agent proxy for each network as "personalities", corresponding to each social network environment.

Finally a question arises: is it possible to implement this model in the context of Semantic Social Networks? I am convinced that it is not, and this is why it is necessary to supplement capabilities of the Semantic Web (Web 3.0) with natural language processing, which along with the concept of intelligent software agents is a better suited paradigm for Intelligent Web (Web 4.0); but if we require from these agents greater autonomy, adaptability to the surrounding social environment, as well as cooperation with other agents, then we step into the paradigm of the Adaptive Web (Web 5.0).

P.S. What other ways are there to make money in social networks? I see another way: one can conduct intelligent marketing and analyze how the network formulates an opinion around certain brands, how these opinions can be affected, how they can be predicted, and how one can choose the best marketing strategy and determine to what brands network it is favorable and vice versa (read an article named "Identifying influential spreaders in complex networks").

Intelligent Social Networks. Part 1

This article is written in collaboration with Maxim Gorelkin.

Collecting users in one's social network is tricky business. Features that allow people to "manually" solve their problems such as, for example, finding contacts and information, are tedious and cumbersome. Many such networks measure their success through the number of profiles, even if they are inactive, fake, duplicate, or created and used once to simply "check out" the site, never to be used again. Such systems should be consistently analyzed in order to accurately measure the level of its complexity and growth rate, but the key measure should be how dynamic it is, since it is the activity on the network and the intensity of this activity that determines its current popularity. And one major way to make one's network popular is to constantly evolve its intelligence. It should offer its users the ability to solve their social and professional problems through a combination of their intelligence with that of the artificial kind within the realm of Collective Intelligence, as demonstrated by digg.com (lots of people liked this story so you might too), last.fm (people who like Madonna also like this artist), and others (see my first article, "Adaptive Web Sites").

Here I will describe some properties of Intelligent Social Networks for the use of which I would be willing to pay.

A. A search for a person by name does not always work: the name may have changed, I may have forgotten it, or remembered it incorrectly. However I can describe certain facts about this individual, such as when, where, as whom and with whom he worked; each of which is insufficient to identify the exact person I'm seeking. On the other hand, even if the combination of facts is not unique, it may narrow the number of similar profiles to allow a quick browse. Or perhaps there is a set of individuals that can help add details about this person and relegate down the line until the sought contact information is found and returned, or someone is able to pass my information directly to the person I'm seeking.

B. The networks often use names as identifiers, and as a result feature dozens of duplicate entities that denote the same physical instance, complicating the search process. In one network, for example, I claimed four (!) universities in my profile, all referring to only one by different names. Standard classification does not usually work for larger networks, but there is a simple decentralized solution to this problem - if a sufficient number of people who use different names in their profiles indicate that they denote the same entity, they should be joined by a common identifier and depicted as different values of its "names" attribute. If there is any uncertainty left, this assumption can be formulated as a hypothesis and tested on a sample of users with these names.

C. Most of the emails I receive every day from my groups, don't have any relation to the interests I described in my profile. This stream seems closer to that of "noise" than that of information, in which I rarely come across something interesting; thus more often than even skimming, I simply delete all messages. I would prefer that the social network took on the task of filtering and re-categorizing my email, possibly with an importance indicator. Of course, for this we would need to employ natural language processing, but not necessarily in real time. Moreover, if I found something interesting in these lists, I would like for the network to suggest other relevant discussions, similar in content, as well as other groups in which such discussions occur. By the way, the search for groups is another difficult problem that cannot be solved by name and keyword search alone. For example, one group may match my interests perfectly, however without having any activity in the last six months, while another group, with a name that means nothing to me, may be extremely active with people discussing subjects that I would find fascinating. Hence the problem of groups is a semantic problem. And of course, I would prefer to get not only the information relevant to my interests, but also that which only MAY interest me, but that I may not be aware of.

News: On September 21st, an international team "BellKor's Pragmatic Chaos" (Bob Bell, Martin Chabbert, Michael Jahrer, Yehuda Koren, Martin Piotte, Andreas Töscher and Chris Volinsky) received the $1M Grand Prize by winning of the Netflix Prize contest for advancing its recommendation system - algorithm Cinematch.

See the final part here.

Thursday, September 3, 2009

Website Engineering

This article is written in collaboration with Maxim Gorelkin.

There are two basic ways to build bridges: from the bottom up and from the top down. The former, the one more familiar to us and, unfortunately, still employed: to build a bridge and then see what happens; if it withstands all weights (at least some time), then all the better. The latter, a much more difficult approach not even from the same discipline, engineering: we define the requirements for the bridge and the metric for its evaluation - the load-carrying capacity - and using strength of materials we design, build, and test a model and... Only after we are satisfied with the result, we decorate the built bridge with, for example, statues of lions lying on their front paws. My preference to the second way yields my preference for the term “website engineering” over the traditional “website development”.

We shall start with the definition of the key metric of the website’s effectiveness. Typically, this is defined as the conversion rate - the percentage of visitors who take a desired action (e.g. who buy something from your online store or meet a specified dollar amount). Thus the essence of the engineering approach is when building websites, we must guarantee specified levels of their effectiveness. Adaptive websites, described in my previous article, define the model that should solve this problem.

The issue with today's web development is the lack of an engineering approach and similar models. Yes, you can construct several alternative landing pages for a/b split or multivariate testing and collect statistics for several months in order to find the best solution. However, as demonstrated by Tim Ash, your result may depend on the chosen method of testing and data analysis techniques!? Or there may be no statistically significant differences between alternatives and, consequently, you may be unable to choose the best page. Suppose you get lucky, and after months of testing, you optimize your website, only to discover that its web traffic has changed and you must start the process from scratch. The same applies to web analytics: yes, you have found that, for example, some number of users visited certain pages of your site and made certain clicks, but how do you interpret this? What motivations led them to do it? And what actions does such “knowledge” suggest you take to improve your site? What about if you find a complete chaotic behavior of users on your site, what do you do then?

Web testing (preferably adaptive, for example adaptive multivariate testing), web analytics and web usage mining (discovering patterns of user behavior on your site) should become part of your website or, put another way, your website must be self-testing, self-analyzing, and “intelligent” enough to extract the practical knowledge of user behavior from these tests and analyses in order to use it for its adaptivity. By the way, in order for the mentioned patterns to become knowledge about user behavior on your website, they must be formulated as statistical hypotheses and constantly verified for accuracy.

Next, let's assume you defined the metrics for the effectiveness of your website and measure them regularly to determine the effectiveness (or ineffectiveness) of your site. However, the problem is even more difficult: learning how to manage these metrics to achieve sustainable improvement. How can it be done? In one of the areas of quality control - statistical process control, they developed a technique for process stabilization prior to taking the process under control and improving the quality of production. It seems to me that there is a direct analogy here with web traffic and control over it for improving the website’s effectiveness.

Summing up, we say that website engineering is about computability of the website’s effectiveness on the basis of characteristics of its web traffic and of its web traffic on the basis of the website elements: its content, navigation, etc.

P.S. Another example is one from the field of algorithmic trading: this type of trading has become a money making machine - a lot of money - without direct human intervention. And this ability stems from the fact that these machines are becoming more intelligent and adaptive. Today, their development uses such disciplines as complexity theory, chaos theory, mathematical game theory, cybernetics, models of quantum dynamics, and so forth. For example, in order for e-commerce to set such ambitious goals, it must attain a similar level of sophistication, and the application of artificial intelligence algorithms is a modest start at best. But we’re treading on the territory of advanced engineering, based on modern science.

Monday, June 29, 2009

Adaptive Web Sites

This article is written in collaboration with Maxim Gorelkin.

If you use Web Analytics to determine the effectiveness of your website, you may have noticed that around 99% of your visitors leave in the first three seconds without having done anything. And this is after you’ve spent a good fortune on both search engine optimization that maximized traffic from Google, and landing page optimization that increases the attractiveness of your site. Why, then, are these efforts and investments not yielding the expected results?

You have to deal with a very diverse and demanding web audience. Traditional static web sites are trying to satisfy this group by approximating the “typical” user. But the problem is that every visitor is unique and is looking for something very specific. And since the Internet arms him with nearly unlimited access to resources that meet his demand, he does not wish to fit any more user templates or make compromises. Not to mention that the Internet is ever-evolving to more accurately determine his needs and more quickly meet his requirements. This is what shapes the conditions under which your site must exist, and dictates the rules of the game.

Another trend – to populate your site with all relevant information in order to meet the diverse requirements of all users. The problem with this approach is that the majority of web surfers cannot cope with this “abundance” and decide whether they find anything interesting, and thus quickly leave the site for a more practical “solution”. The use of Web Analytics and Web Mining obliterate many of our illusions about the usefulness of our websites.

As demonstrated by Peter Peter Norvig, Director of Google Research, as far back as 2001 in his web-manifesto "Adaptive Software", in these circumstances you must change the paradigm and begin applying intelligent and adaptive systems. And this is exactly the trend we are now seeing on the Internet. In our case, this is the development of adaptive web sites, which adjust their content and interfaces, in order to better meet the needs and preferences of every individual user. Do not request anything impossible of your visitor: he will never search your site for anything. Today it is your problem to identify what each user is looking for, and to present that to him in the most appealing fashion. Remember, in your Internet business, you compete with… Google!

Now let’s describe the basic properties of adaptive web sites. First, we know from Web Analytics, that the basic web traffic (around 85-90%) comes from search engines such as Google, Bing, Yahoo, Ask, AOL, and so forth. But the majority is from Google (around 75-80%), which in response to any query still throws out on the order of a million results, in the midst of which the user must find his “answer” as quickly as possible. And if a search engine brought a visitor to my online footwear shop looking for “wedding shoes”, where the homepage is populated with the latest-fashion sneakers, then I’ll mostly likely lose this potential customer, even if the page has the exact link he’s seeking at the very bottom. This is the New Internet, where the site must “guesstimate” (be sufficiently “smart” for this) the needs of the user, know how he “arrived” there, and in “real-time” reconstruct itself with potential matches specific to this visitor. Is this possible? Yes and no. Under the old static-site paradigm: no. However under the new paradigm of adaptive web sites: very often, yes! In regards to an always-yes, forget it. Even when this “yes” accounts for a 20-30% conversion, Amazon and Flowers.com would envy such a site even during Christmas shopping season. The secret lies in the information that the search engine brings with every referred customer: where he came from (URL), which browser and operating system he’s using, but most importantly, what he’s searching for. More precisely, the search query he submitted, which the site must translate into what he’s intending to find. (*)

Second, the site should analyze in real-time what the user is doing: what he’s reading, what he’s clicking on, and the links he’s selecting. It should “formulate” and test its hypotheses in regards to the user’s objectives and generate, for example, new advertisements or recommendations, “watching” his reactions and using this information to offer better suited content (and improve itself at the same time). Even if the site doesn’t have what the user is searching for, it can find it on the Internet and offer links.

Third, one doesn’t search for information on adaptive sites since the pages offer it up-front, similar to Amazon with its “customers who bought this also bought”, Netflix with its Cinematch, Pandora and so forth. They recommend content that their visitors may not have even known about. (**)

In summary, while the idea of adaptive websites isn’t novel, being formulated around ten years ago, it can be implemented now on the combination of Machine Learning and Rich Internet Applications, such as AJAX and MS Silverlight.

(*) To do this I analyze my web-log for constant and static patterns of visitor behavior based on the search keywords submitted to the search engines. I then find combinations of words that “work” on the site, and those that do not. If there is a significant amount of the latter that we can’t ignore, we build specialized content pages for the most common combinations. The problem lies in the fact that my web traffic is complex, dynamic and sporadic, and thus many patterns carry a dynamic characteristic; thus to discover and apply them, I rely on machine learning algorithms. And in order to yield more accurate results, I must complicate the matter further and enrich the search queries with demographic data about the visitors, obtained from the same search engine.

(**) This is extremely complicated – analyzing every visitor, that’s why adaptive web sites cluster similar users into groups, building models for each and every one of them, and then in real-time try to correctly classify the visitor with one of these, and considering unavoidable occasional errors, continue to improve both the collection and the matching mechanism.

References:

Carolyn Wei "Adaptive Web Sites: An Introduction" (2001)
Paul De Bra, Lora Aroyo, и Vadim Chepegin "The Next Big Thing: Adaptive Web-Based Systems" (2004)
Abhijit Nadgouda "Adaptive Websites – The Future of Web" (2006)
Howard Yeend "Adaptive Web Sites" (2009)
Jason Burby "Using Segmentation to Improve Site Conversions" (2009)

Business Case Studies:

BT Group: Website Morphing can increase your sales by 20%
Autonomy Interwoven Merchandising and Recommendation
Vignette Recommendations
FatWire Analytics
Sitecore Visitor Experience Analytics & Real-Time Site Personalization
ChoiceStream RealRelevance Recommendations
Leiki Focus