GIANT: Geo-Informative Attributes for Location Recognition and Exploration

Summary

This paper considers the problem of automatically discovering geo-informative attributes for location recognition and exploration. The attribute is expected to be both discriminative and representative, which corresponds to a distinctive visual pattern and associates with semantic interpretation. For solution, we analyze the attribute at region level. Each segmented region in the training set is assigned a binary latent variable indicating its discriminative capability. A latent learning framework is proposed for discriminative region detection and geo-informative attribute discovery. Moreover, we use user-generated content to obtain the semantic interpretation for the discovered visual attribute. The proposed approach are evaluated on one challenging dataset including GoogleStreetView and Flickr photos. Experimental results show that: (1) geo-informative attributes are discriminative and useful for location recognition; (2) the discovered semantic interpretation is meaningful and can be exploited for further explorations.

Framework

Specifically, the proposed framework includes two stages: geo-informative attribute discovery and geo-informative attribute interpretation. (1) For geo-informative attribute discovery, discriminative analysis is first conducted at photo level, where non-discriminative photos are filtered so that candidate region number can be significantly reduced. After that, we propose a region-based latent Support Vector Machine model (RLSVM) to detect the discriminative regions. Candidate regions are generated by hierarchically segmenting the remained photos and assigned with a binary latent variable that indicates whether the region contributes to recognizing this location. RLSVM scores photos considering all region latent variables and infers the configuration that best matches the location label. Regions activated in the derived configuration are considered discriminative. For each location, the geo-informative attributes are obtained by clustering the detected discriminative regions. (2) For geo-informative attribute interpretation, the associated user tags in Flickr are utilized and we present a novel learning algorithm in a discriminative manner. We first learn a bundle of discriminative SVM classifiers to measure the relatedness between tags and photo regions. Then these classifiers are used to score the attribute set and generate its corresponding interpretation by a compact set of semantic tags. Location recognition can be performed directly using the proposed RLSVM model to simultaneously infer discriminative regions and estimate location label, or using the discovered attributes to constitute a geographical vocabulary, where any supervised methods can be combined. Moreover, the associated semantics enable interpretation of recognized results and provide potentials to high-level location exploration applications.

Experimental Results

We conduct the experiments on the dataset collected from GoogleStreetView and Flickr. In the GoogleStreetView datatset, we select 12 well-known cities: Barcelona, London, Paris, Chicago, Hongkong, Nyc, Sanfransisco, SaoPaulo, Singapore, Sydney, Taipei and Tokyo. For each panorama, we extract two perspective photos with one on each side of the capturing vehicle. This results in approximately 10,000 photos per city. In Flickr dataset, We downloaded data for 7 cities: Barcelona, London, Paris, Beijing, Berlin, Cairo and Istanbul. The number of photos in the final dataset for each city ranges from 2,000 to 3,000.

Geo-informative Attributes Discovery

In abover figure, we visualize some of the discovered visual attributes for different cities (each row corresponds to one cluster, i.e., attribute). It is shown that the discovered attributes are geo-informative: (1) discriminative, they well distinguish the city from others, e.g., the Mediterranean coastview and Gaudi's modern building of Barcelona make it very different from the inland and classical counterparts of \emph{London}; (2) representative, they describe featured aspects of the city. We can see that the discovered attributes provide a more intuitive description for the city from GoogleStreetView dataset. Stylistic things such as windows, building facades, and street signs are very indicative of the cities, e.g., Singapore with its busy harbor, renowned business district and mixed East-West architectural style.

Location Recognition Performance

(1) Location recognition mAP results for the examined approaches on GoogleStreetView dataset.

(2) Location recognition mAP results for the examined approaches on Flickr dataset.

The compared location recognition results are shown above. Our proposed method achieves the best performance among all the examined approaches. This demonstrates that the discovered attributes served as mid-level features are useful for location recognition and thus geo-informative. Geo-informative attributes are of great value for location recognition!

Geo-informative Attribute Interpretation

In above figure, we visualize tag-based interpretation for the discovered attributes in Flickr dataset. We can observe that the discovered attributes succeed to describe the visual attributes as well as capturing meaningful semantics. It provides a way for people to better understand the discovered attributes and conduct city exploration.

Publication

GIANT: Geo-Informative Attributes for Location Recognition and Exploration. [pdf] [slides] [supp] [code] [data]

Quan Fang, Jitao Sang and Changsheng Xu
In ACM Multimedia (MM), Barcelona, Spain, Oct. 2013, pp.13-22.