Categories
3D Information Innovation Research & Development True Stories

What will your next gesture invoke?

When Alvaro Cassinelli, the winner of the 2011 grand prize at Laval Virtual, the largest annual Virtual Reality conference, was asked by the Guardian what motivated him to develop a platform using Augmented Reality and everyday objects to represent a user's request, his reply revealed something to which we should all pay attention.

Cassinelli said "non-verbal communication was (and still is) the most reliable device I have when I want to avoid ambiguity in everyday situations." He was referring to the fact that as a South American living in Japan, there are many times when communication is unclear.

One doesn't have to be living with cultural and linguistic barriers to need gestures. I learned the value of technology-assisted non-verbal communications 20 years ago. During one of my first sessions using a personal videoconferencing system in my home office with a client who was then working at Apple, his words and his gesture did not align! He said "maybe" in response to a recommendation I made, but the posture of his head and the position of his hands said "no way." This was an "ah ha" moment that convinced me how valuable technology could be to share non-verbal communications when meetings involve a remote participant.

In 2004, when I started working with the partners of the EU funded (FP 6) Augmented Multiparty Interaction project, one of the objectives of using computer vision was to analyze the non-verbal communications in gestures and to compare these with the spoken words during a business meeting. One of the conclusions of the project's research was that computers can detect when there is a discrepancy between verbal and non-verbal signals, but they cannot determine which of the two messages is the one the user intended to communicate.

If and when gestures become a way of communicating our instructions to a digital assistant, will we all need to learn to use the same gestures? or will we individually train the systems watching us to recognize the gestures we designate? These are a few of questions I raised in the position paper AR Human Interfaces: The Case of Gestures. I don't have the answers to these questions, but I'm certain that it will take multiple disciplines working together over many iterations to get us to an optimal balance of standard and personal gestures, just as we have in other forms of communication.

Cassinelli won a prize for innovation and I'm confident that he's on to something, but it will be several more years before gestures are reliable for man-machine interfaces.

Categories
Internet of Things Research & Development

Useful IoT Comparisons

Many before us have struggled to develop the "final" definition of the IoT and I've suggested that one day it will be unnecessary to argue over the precise terms because it will be ubiquitous. Between now and then, metaphors are very powerful.

Along this line of thought, I found this post by Dale Calder of Axeda Corp on the Forbes web site comparing the Internet of Things to Facebook insightful.  He points out that "while Facebook is attempting to digitize and “platformize” every Internet user in the world (currently more than 2  billion people), the Machine-to-Machine market is doing exactly the same with all of the world’s machines, devices, and real-time information sources – over 7 trillion potential targets and counting." Every machine or device will have the potential to be at the hub of its graph.

The connection from inanimate objects to social ones is one that has been made before when developing the concept of IoT. For example, WideTag connects objects and/to people. The graph, however, does not need to be social to be valuable. It just needs to portray active (or dormant) connections in logical hierarchy. Or perhaps without any hierarchy.

Comparing the Internet of Things with very large, distributed systems is helpful for architecting suitable infrastructure. Could we compare the sensors deployed in a city with those in a nuclear power plant?

One thing to keep in mind is the potential for information overload. Reminds me of the problems facing physicists and IT experts when building the Large Hadron Collider. According to this article on ReadWriteWeb, "NoSQL solutions are designed to handle large numbers of transactions. CouchDB, for instance, has been used to power the web based IM client Meebo, proving it can handle a rapid influx of data. CouchDB is also specifically designed for distributed environments."

Another thing to consider is bandwidth. Calder points out in the Forbes piece that the traffic from all these sensors could become the single largest user of cellular airtime. Clearly, the network equipment providers and network operators are developing new protocols and procedures in order to offer appropriate products and services.

Categories
Augmented Reality Events Research & Development Social and Societal

Algorithmic City and Pattern Language

The Mobile City Web portal and the blog of Martijn de Waal are inspirational to me. He introduces many concepts that, although they do not use precisely the same words/vocabulary, they mirror what I've been thinking and seeing. One of the posts on The Mobile City from March 2011 is a review by guest contributor Michiel de Lange of a compendium of articles about technology in cities edited by Alessandro Aurigi and Fiorella De Cindio.

The Augmented Urban Spaces (2008) would be the basis for a great conference on "smarter cities" and Internet of Things.

Another post that I found stiulting is Martijn's report of the Cognitive Cities Salon he attended in Amsterdam, Martijn highlighted a talk given by Edwin Gardner entitled "The Algorithmic City" which I am sorry to have missed and, unfortunately, I have not found the slides on-line (yet). From what Martijn writes, the subject of Algorithmic Cities is currently theoretical but one can imagine a day when it will become common place.

The Algorithmic City is the result of urban planners using algorithms as part of their process(es). Quoting from the Mobile City blog post published on July 3 2011:

"So far algorhithms have shown up in ‘parametric design’ where all kinds of parameters can be tweeked that the computer will then turn into a design for a building or even a complete city. Gardner is not so much interested in this approach. The problem is that there is no relation between the paramaters, the shapes generated and the society that is going to make use of these shapes. Social or economic data are hardly used as parameters and the result is ‘a fetishism of easthetics’, at best beautiful to look at, but completely meaningless.

Instead, Gardner takes inspiration from Christophers Alexander‘s book A Pattern Lanugage."

I'm not sure if there is a connection between the theoretical work of Gardner and a video of the next version of a software solution provided by Procedural (a startup based in Zurich purchased by ESRI on July 11, 2011), called CityEngine but somehow the two come together in my mind. Using CityEngine, design algorithms are manipulated using the simplest of gestures, such as dragging. It's not a software platform I'm likely to have need for in the near future, but I hope that urban planners will soon have opportunities to experiment with it, to explore the Algorithmic City concept, and that I will witness the results. Maybe someone will build an AR experience to "see" the Algorithmic City using smartphones.

Categories
Augmented Reality Research & Development

ElipseAR-Cloud Image Recognition

There's long been a debate among computer vision experts between those who envisage feature extraction and matching in the network and those who implement it on the device. There are many factors one must consider and trade offs that must be made but in the end everything boils down to cost: what can you gain by where you put the different tasks if the tasks must be done in real time. For many applications, feature extraction is a bottle neck due to lack of computational power. Qualcomm's AR SDK is an example of device-based recognition.

ElipseAR, a startup based in Madrid, Spain, is heavily on the network side of the debate. They are planning to release a set of tools for markerless (feature-based) Augmented Reality and Computer Vision development that will make image matching and tracking, 3D animation rendering, geolocation using the camera view, face recognition easier to integrate into AR applications. Existing AR apps? Future AR apps?

The company's web site clearly makes a distinction between image recognition, image tracking and matching. What's not clear at the moment, because their position differs depending on which page you are reading, is how much of the ElipseAR processing is to happen within the device and how much will be in the network. They also may be confusing "image" with real time video recognition.

At the moment the company says it will offer its tools for commercial use at no charge. The beta program started in early July and is expected to run until the end of 2011.

Tests must be conducted in real world circumstances to measure the merits of the new algorithm and its architecture. It will be compared against not only other network-based image recognizers such as kooaba, but also other SDKs that have been out for much longer such as Qualcomm AR SDK, Qconcept and others. It's difficult to imagine, for example, ElipseAR getting out ahead of String Labs which released their code June 16, 2011.

Even if the reliability of the ElipseAR algorithms and architecture prove to be up to industry benchmarks, there will continue to be latency out of the control of the developer or user in the cellular networks. There have been rumors that the network "effect" can be overcome, but this will never be a universally reliable solution because coverage of mobile networks is and never will be 100%.

Categories
3D Information Business Strategy News Research & Development

London’s Imperial College

On July 27, 2011 the UK Research Council awarded a £5.9m grant to the “digital city exchange” programme of Imperial College. According to the press release, the funds are to be used to establish a new research center focusing on “smart cities” technologies.

A multidisciplinary team, involving businesses, public administrations and academia, is being put in place to use the city of London as a test bed for emerging smart cities hardware, software and processes. The article in the Financial Times very perceptively puts the focus on the following statements issued by spokesperson, David Gann, the head of the innovation and entrepreneurship group at Imperial College.

"New sensors and real-world monitoring can be combined with “cloud computing” to bring greater efficiency and new services to cities. For instance, data about peak traffic periods and local sensors could be used to avoid congestion for supermarket deliveries.

“London, with all its economic and social diversity, will be a very good place to launch some of these capabilities into new cities around the world and create new jobs and growth. The act of invention is at the point of consumption.”

Another article about the grant emphasizes, as did the press release, more of an urban planning angle.

It's very exciting to have this center establishing itself, although the size of the grant does not seem in line with the ambitions and objectives as they are described, and there should be others of its kind connecting to it as well.

Categories
Augmented Reality Research & Development

Reduced Reality

In the physical world, visually noisy environments are common. Some cultures enjoy or at least live in a stimulating visual landscape, be it on their screens or in/on the real world. I recall there being more visual noise in Asian urban landscapes than I am accustomed to. I prefer the work of designers that hide or disguise the clutter in every day life. Take, for example, power and telephone lines. For a variety of reasons these are above ground in some parts of the world and below in others.

I prefer a visually "simple" world. Blocks of uniform or lightly textured surfaces: the sky, the water of lake Geneva, even skyscrapers.

Why would the same algorithms and systems used to attach additional information to the real world not also be useful to reduce information? Power and telephone lines could "disappear" from view, as would graffiti and trash.

There was a poster at ISMAR2010 that demonstrated the "reduction" of reality using a mobile device to cover/camouflage a QR code in a photograph. By sampling from the background in the immediate proximity and tiling the same pixel colors and textures over the marker, there was a sense of continuity, the marker disappeared. Unfortunately, the specific project and references made to it are difficult to find but I hope to see more of this in the next ISMAR event in Basel.

Categories
3D Information Augmented Reality Research & Development

AR for Blacksburg

The AR-4-Basel project is a pilot for what could become a widespread trend: a municipality or any size area can make data sets it owns and maintains for its citizens available to AR developers who then can prepare AR experiences for visitors and inhabitants.

Ever since starting the AR-4-Basel project in May, I have been planning how to expand and apply the lessons learned to other cities. The first to follow is definitely Barcelona, Spain. The AR-4-Barcelona project is already ramping up. Then, Berlin is my next target. I’d like to explore the possibility of getting something started in Beijing as well, if there is going to be an AR in China conference in 2012.

Another “B” city which has all the earmarks of a future haven for AR experiences is Blacksburg, Virginia!

“The 3D Blacksburg Collaborative is a consortium of multi-disciplinary researchers, experts and students from various universities and governments, who are creating a data and delivery infrastructure for an interactive virtual 3D city model.”

Which “B” city would you nominate for a future AR project?

Categories
Innovation Research & Development

Innovation Research

INSEAD has published its Global Innovation Index for 2011.

"The overall GII scores provide a composite picture of the state of each country’s innovation performance. The Report stresses leaders by index, by income group and by region.

"Switzerland comes in at top place in the overall GII 2011 rankings (up from position 4th last year) on the basis of its strong position in both the Input and Output Sub- Indices (3rd and 2nd, respectively). Although the country does not top any individual pillar, it places within the top 5 in three Input pillars (Institutions, Market and Business sophistication) and both Output pillars (Scientific outputs and Creative outputs)."

Source: INSEAD (2011) Global Innovation Index

Another interesting point to examine is where China is positioned on the scales of R&D users and R&D importers. From the LiftLab, this is part of a post by Marc Laperrouza on his "Time to look east" blog. Marc pulled out this chart and made the comment below.

"As with many synthetic indexes, it is always worthwhile to dig further into the data. It turns out that China has a number of strengths and weaknesses. Among the former, the report lists patent applications, gross capital formation, high-tech imports and exports (a large majority are MNC-driven).

Among the latter, one can find regulatory quality, press freedom and time to start a business. True enough, both business and market sophistication have notably increased over the years and so has scientific output.If China aims to reach the top 20 or higher it will have to work hard (and fast) on its institutions."

Categories
Augmented Reality News Research & Development

Pittsburgh Pattern Recognition

On July 22 2011, Google acquired PittPatt, the Pittsburgh Pattern Recognition Team, a privately-held spin out of CMU Robotics.

Three questions jumped out when I learned of this acquisition.

  • Why? Doesn't Google already have face recognition technology?
    Unfortunately, based on the publicly available information, it's not clear what is new or different about PittPatt's technology. Okay, so they have an SDK. There are several possible explanations for this acquisition. Maybe the previous facial recognition technology Google had acquired with Neven Vision in August 2006 then released as part of Picasa in 3rd quarter 2008 (it appeared in Picasa as early as May 2007) was insufficient. Insufficient could mean inaccurate too often, too difficult to implement in mobile, not scalable. That doesn't seem likely.
    Maybe the difference is that the PittPatt technology was working on video as well as still images. YouTube already has a face recognition algorithm, but it is not real time. For AR it would be valuable if the face recognition and tracking performs reliably in real time.
    Another possible explanation has to do with IP. Given the people who founded PittPatt, perhaps there are some intellectual properties that Google wants for itself or to which it wants to prevent a competitor to have access.
     
  • What are the hot "nearby" properties that will get a boost in their valuation as a result of Google's purchase?
    Faces are the most important attribute we have as individuals and the human brain is hard wired to search for and identify faces. Simulating what our brains do with and for faces is a fundamental computer vision challenge. Since this is not trivial and so many applications could be powered by face recognition (and when algorithms can recognize faces, other 3D objects will not be far behind), there's always a lot of resources going into developing robust, accurate algorithms.

     

     

    Many–perhaps dozens–of commercial and academic groups continually work on facial recognition and tracking technology. Someone has certainly done the landscape analysis on this topic. One of the face recognition research groups with which I've had contact is at Idiap in Martigny, Switzerland. Led by Sebastien Marcel, this research team is focusing on the use of such highly accurate facial recognition that it can be the basis for granting access. KeyLemon is an Idiap spin off using the Idiap technology for biometric authentication to personal computers. And, there is (almost certainly) a sizable group already in Google dedicated to this topic. 
     

  • What value added services or features can emerge that are not in conflict with Google's privacy policy and haven't been thought of already/implemented by Google and others?
    This is an important question that probably has a very long and complex, multi-part answer. I suspect it has a lot to do with 3D objects. What's great about studying faces is that there are so many different ones to work with and they are plastic (distort easily). When the algorithms for detecting, recognizing and tracking faces in video are available on mobile devices, we can imagine that other naturally occurring and plastic objects would not be too far behind.

I hope Eric Schmidt is proven wrong about there not being facial recognition in the future of Google Goggles and similar applications and we see what is behind the curtain in the PittPatt acquisition!

Categories
Innovation Research & Development

3D City Models and AR

Google StreetView was certainly a trail-blazing concept and it has entered the mainstream. But it was not the first service and Google isn’t the first company that had the concept to collect data about the physical world by driving a specially equipped vehicle (with one or more cameras, high performance GPS and other sensors) through space. Decades earlier, the Jet Propulsion Laboratory worked on this concept in order to permit the vehicles landing on the moon (or other spatial bodies) to record their immediate environment. Earthmine is a pioneer not only in the capture of the real world (using designs developed by the JPL) but also to explore business models based on this data sets. What do these have in common? They proved that the ambitious goal of digitally “capturing” the real world in a form that supports navigation through the data afterwards, was possible.

As the technologies developed in these projects have evolved and become more powerful–in every dimension–and competitors have emerged based on other maturing technologies, systems are detecting the physical world at higher and higher resolutions, and the data gathered produce increasingly more accurate models at lower costs.

Instead of “manually” building up a 3D model from a 2D map and/or analog data, urban environments are being scanned, measured and modeled at an amazing speed, and at lower cost than ever before. Fascinating, but to what end?

In the AR-4-Basel project, we seek to make available to AR developers accurate 3D models in order for the digital representation of the real world to serve as the basis for higher performance AR experiences. The concept is that if a developer were able to use the model when designing experiences, or the placement of content, they would have a virtual reality in which to experiment. Then, when in the real world the user’s device with a camera would automatically extract features, such as edges of buildings, roofs, and other stationary attributes of the world, and match those with the features “seen” earlier in the digital model. The digital data would be aligned more accurately and the process of augmenting the world with the desired content would be faster.

In order to determine if this is more than just a concept, I need to find and receive the assistance of 3D city model experts. Here are a few of the sites to which I’ve been in search of such knowledge:

This process is proving to be time consuming but it might yield some results before another solution to improve AR experience quality emerges!