If you’ve been to one city you’ve been to them all, right? Of course not. Urban life is a cacophony of sights, sounds, architecture and smells all grafted onto one another over the course of time. Paris is nothing like Los Angeles, which is nothing like Melbourne, which is nothing St. Petersburg. Cities are urban snowflakes dotting Earth’s landscape.
Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner.
By analyzing 250 million visual elements from 40,000 Street View images of Paris, London, New York, Barcelona and eight other cities, the system now has the knowledge to identify one of the featured cities by just a single image.