![]() |
![]() |
![]() | ||||||||||||
2011年以前的学术活动 2011年 法国的André Salem教授来访模式识别国家重点实验室 2011年10月18日上午, 法国巴黎第三大学的André Salem教授访问了模式识别国家重点实验室,做了题为 “Methods and corpora” 的学术报告,并与实验室师生进行了相关讨论。 报告摘要 (ABSTRACT):Researchers in a number of disciplines deal with large text sets requiring both text management and text analysis. Faced with a big amount of textual data collected in marketing surveys, literary investigations, historical archives and documentary data bases, these researchers require assistance with organizing, describing and comparing texts. Textometry combines a set of multivariate statistical methods such as correspondence analysis and cluster analysis. It can be used to investigate, assimilate and evaluate textual data. A succession of textometric applications to text corpora, written in various languages (English, French, Chinese, etc.), will allow us to appreciate the variety of actual and potential applications and the complementary processing methods.
报告人简介(BIOGRAPHY):
我所机器翻译研究再创佳绩
第七届全国机器翻译研讨会(CWMT)于2011年9月23-24日在厦门召开。我所模式识别国家重点实验室研究开发的多语言机器翻译系统在今年的机器翻译系统评测中再创佳绩,在全部参评的10项评测任务中取得了5个第一、4个第二的优异成绩。
日本的Taro Watanabe博士来访模式识别国家重点实验室 2011年8月8日上午,日本国家通讯信息技术研究所的Taro Watanabe博士访问了模式识别国家重点实验室,做了题为 “Third-order Variational Reranking on Packed-Shared Dependency Forests” 的学术报告,并与实验室师生进行了相关讨论。 报告摘要 (ABSTRACT):We propose a novel forest reranking algorithm for discriminative dependency parsing based on a variant of Eisner’s generative model. In our framework, we define two kinds of gener- ative model for reranking. One is learned from training data offline and the other from a for- est generated by a baseline parser on the fly. The final prediction in the reranking stage is performed using linear interpolation of these models and discriminative model. In order to efficiently train the model from and decode on a hypergraph data structure representing a forest, we apply extended inside/outside and Viterbi algorithms. Experimental results show that our proposed forest reranking algorithm achieves significant improvement when com- pared with conventional approaches.
报告人简介(BIOGRAPHY): He received the B.E. and M.E. degrees in informaiton science from Kyoto Univ., Kyoto, Japan in 1994 and 1997, respectively, and obtained the Master of Science degree in language and information technologies from the School of Computer Science, Carnegie Mellon University in 2000. In 2004, he received the Ph.D. in informatics from Kyoto Univ., Kyoto, Japan. Dr. Watanabe is a senior researcher of National Institute of Information and Communications Technology. His research interests include natural language processing, machine learning and statistical machine translation.
巴黎第三大学的Kim Gerdes博士来访模式识别国家重点实验室 2011年6月23日巴黎第三大学的Kim Gerdes博士访问了模式识别国家重点实验室,做了题为 “Poverty driven bilingual alignment” 的学术报告,并与实验室师生进行了相关讨论。
2010年 悉尼麦考瑞大学的Mark Johnson博士来访模式识别国家重点实验室
2010年12月2日悉尼麦考瑞大学的Mark
Johnson博士访问了模式识别国家重点实验室,做了题为 “Bayesian
models of language acquisition or Where do the rules come from? 报告摘要 (ABSTRACT):Each human language contains an astronomically large (if not unbounded) number of different sentences. How can something so large and complex possibly be learnt? Over the past decade and a half we've figured out how to define probability distributions over grammars and the linguistic structures they generate, opening up the possibility of Bayesian models of language acquisition. Bayesian approaches are particularly attractive because they can exploit "prior" (e.g., innate) knowledge as well as statistical generalizations from the input. This opens the possibility of an empirical evaluation of the utility of various kinds of innate knowledge. Structured statistical learners have two major advantages over other approaches. First, because the generalizations they learn and the prior knowledge they utilize are both expressed in terms of explicit linguistic representations, it is clear what is learnt and what information is exploited during learning. Second, because of the "curse of dimensionality", learners that identify and exploit structural properties of their input seem to be the only ones that have a chance of "scaling up" to learn real languages. This talk describes Bayesian methods for learning Context-Free Grammars and a generalization of them that we call Adaptor Grammars, and applies them to problems of morphological acquisition and word segmentation.
报告人简介(BIOGRAPHY):
Mark Johnson is a Professor of Language Science (CORE)
in the Department of Computing at Macquarie University in Sydney, Australia.
He has worked on a wide range of topics in computational linguistics, but
his main research area is parsing and its applications to text and speech
processing, and more recently on Bayesian methods for grammatical inference.
He was President of the Association for Computational Linguistics in 2003,
and was a professor from 1989 until 2009 in the Departments of Cognitive and
Linguistic Sciences and Computer Science at Brown University. 德国DFKI的徐佳博士来访模式识别国家重点实验室 2010年11月3日德国DFKI的徐佳博士访问模式识别国家重点实验室,做了题为
“Sequence
Segmentation for Statistical Machine Translation 报告摘要 (ABSTRACT):In the last decade, while statistical machine translation has advanced significantly, there is still much room for further improvements relating to many natural language processing tasks such as word segmentation, word alignment and parsing. Human language is composed of sequences of meaningful units. These sequences can be words, phrases, sentences or even articles serving as basic elements in communication and components for computational modeling. However, in monolingual text some sequences are not naturally separated by delimiters, and in bilingual text both sequence boundaries and their corresponding translations can be unlabeled. This work addresses solutions of sequence segmentation and alignment for statistical machine translation, including the following topics: Chinese word segmentation, Phrase training, Parallel sentence exploitation, and Domain adaptation. Experimental results on state-of-the-art, large-scale Chinese-English tasks show that the training speed can be increased with a factor of four and each above mentioned method leads to an enhancement of the translation quality up to 6% relatively.
报告人简介(BIOGRAPHY):
吴安迪博士和张民博士来访模式识别国家重点实验室
2010年8月29日美国的吴安迪博士和来自新加坡的张民博士访问模式识别国家重点实验室,并分别做了题为 “计算语言学技术在圣经研究领域的应用 报告摘要 (ABSTRACT)1:圣经是被研究、搜索和翻译得最多的文本,如何对其进行高精度的分析和搜索是对计算语言学的一个挑战。为实现这一目标,我们首先对原文和译文进行了详尽的词法句法分析,建立了旧约希伯来文树库,新约希腊文树库,以及中文各译本的句法树库,并将原文树库和译文树库相互链接对齐,形成双语平行树库。在此基础上我们开发了一系列新技术,其中包括:(1)同义词的自动识别和弹性搜索;(2) 意义相关词组、句子及经节的高精度自动搜索;(3) 多义词义项的自动识别;(4) 基于统计的译文质量自动评估。 希望以上技术也可以应用于其他领域的文本。
报告人简介(BIOGRAPHY)1: 报告摘要(ABSTRACT)2:
The talk includes two latest works conducted
at SMT research group of Institute for Infocomm Research, Singapore.
In the first part, I will introduce a newly proposed convolution forest kernel, which is proposed to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the pro-posed forest kernel significantly outperforms the baseline of the convolution tree kernel. In the second paper, I will take about two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypo-thesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language mod-el for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models. 报告人简介(BIOGRAPHY)2:
Dr. ZHANG Min received his Ph.D. degree in
computer science from Harbin Institute of Technology (HIT,
He joined the Institute for Infocomm Research (I2R) of 英国爱丁堡大学戴明博博士来访模式识别国家重点实验室
2010年7月8日英国爱丁堡大学戴明博博士访问模式识别国家重点实验室,做了题为 “The group nested Dirichlet process
and its application to entity resolution 报告摘要 (ABSTRACT):The Dirichlet process is the most widely used nonparametric model for distributions in Bayesian models. They are frequently used when the number of latent features or latent components in a mixture model are unknown. This has allowed them to be used in a number of problems including image segmentation, time-series modelling, topic modelling and haplotype inference. I will present a variant of this model, the group nested DP where classes are drawn from latent groups. For example, where the words in a corpus are separated into documents, it can then be desirable to cluster together the words into topics and cluster the topics themselves into a latent group structure. A nonparametric Bayesian generative model is to present based on the group nested DP which models author entities, topics and research groups in a corpus of documents. The research groups couple together the authors by utilising coauthor information and the topics are used to differentiate authors further. I will then describe results performed on citation databases and an entity disambiguation problem and compare results with other leading models.
报告人简介(BIOGRAPHY): 张玉洁博士访问实验室 2010年6月17日日本情报通信研究机构(NICT) 的张玉洁博士来到模式识别实验室访问。做了题为 Natural Language Processing at NICT的学术报告,并与实验室师生进行了相关讨论。 美国芝加哥伊利诺斯大学刘兵教授来访模式识别国家重点实验室
2010年5月11日美国芝加哥伊利诺斯大学的刘兵教授访问模式识别国家重点实验室,分别做了题为
“Sentiment
Analysis: A Multifaceted Problem”和“Learning from Positive and Unlabeled
Examples 报告摘要 (ABSTRACT)1:Sentiment analysis or opinion mining is the computational study of people’s opinions, appraisals, and emotions toward entities, events and their attributes. In the past few years, it attracted a great deal of attentions from both academia and industry due to many challenging research problems and a wide range of applications. Opinions are important because whenever we need to make a decision we want to hear others’ opinions. This is not only true for individuals but also true for organizations. However, there was almost no computational study on opinions before the Web because there was little opinionated text available. However, with the explosive growth of the social media content on the Web in the past few years, the world has been transformed. People can now post reviews of products at merchant sites and express their views on almost anything in discussion forums and blogs, and at social network sites. In this talk, I will first give an introduction to the field and present some technical challenges. We will see that sentiment analysis is not a single task, but a multifaceted problem containing many interrelated sub-problems. I will then share some of my thoughts on the past and future of sentiment analysis based on my research in the past few years and a brief experience in the industry, which will touch both the science and applications of sentiment analysis.
报告摘要(ABSTRACT)2:
报告人简介(BIOGRAPHY): 日本东京大学Junichi Tsujii教授来访模式识别国家重点实验室
2010年4月15日日本东京大学的Junichi
Tsujii教授访问模式识别国家重点实验室,做了题为 “Deep
Parsing and Deep Search – Application for the Bio-Medical Domain 报告摘要 (ABSTRACT):Research on deep parsing, which had been considered as impractical due to its efficiency and robustness, has made significant progress in these 10 years. Deep parsing has now become ready for practical application. After brief introduction of research in the University of Tokyo on deep parsing, this presentation will focus on how deep parsing can be combined with other technologies for intelligent information/relation extraction, semantic search, question answering and knowledge integration through domain ontologies. The two parsers (Enju and Mogura) successfully processed the whole MEDLINE (18 million abstracts). Two semantic serach systems (MEDIE and PathText) for the bio-medical domain, which use the parsed reults for indexing to gether with other ontology-based indexing, will be demonstrated. 报告人简介 (BIOGRAPHY):Professor Jun-ichi Tsujii is Professor of Natural Language Processing in the Department of Computer Science, University of Tokyo. He is also scientific director of the National Centre for Text Mining (NaCTeM) and professor of Text Mining in the School of Computer Science University of Manchester. He has worked since 1973 in Natural Language Processing, Question Answering, Text Mining and Machine Translation. He is Project leader of a Specially Promoted Research (2006 - 2011) on Advanced NLP and Text Mining, by MEXT in Japan, and Co-investigator of BBSRC, BB/E004431/1. He gave keynote speeches and invited talks at many conferences such as LREC (2004), IWSL (2004), SMBM (2005), ICSB (2006), BioCreative(2007), IEEE-ASRU(2007), etc. His recent research achievements include (1) Deep semantic parsing based on feature forest model, (2) Efficient search algorithms for statistical parsing, (3) Improvement of estimator for maximum entropy model, and (4) Construction of the gold standard corpus (GENIA) for Bio Text Mining. He was President of ACL (2006) and President of IAMT (2002-2004). He is Permanent member of ICCL (International Committee for Computational Linguistics, 1992-). 香港理工大学陆勤教授来访模式识别国家重点实验室
2010年3月18日香港理工大学陆勤教授访问模式识别国家重点实验室,做了题为 “Automatic
Ontology Construction 报告摘要 (ABSTRACT):An ontology representing a domain specific knowledge space is constructed through domain specific terms. The concepts behind these terms are described by certain attributes, and the relations among the different concepts. The discovery of new terms is most useful only if it can be used to extend the knowledge of a domain. This requires the identification of new terms and finding relationship of these new terms with existing concepts in the ontology. This talk is composed of two part. In the first part, a new terminology extraction method will be presented. The algorithm identifies features of the relatively stable and domain independent term delimiters rather than that of the terms. For term verification, a link analysis based method is proposed to calculate the relevance between term candidates and the sentences in the domain specific corpus from which the candidates are extracted. In the second part, a clustering based method is presented for domain relevant relation extraction including both relation type discovery and relation instance extraction. Given two raw corpora, one in the general domain, one in an application domain, domain specific verbs connecting different instances are extracted based on syntactic dependency as well as a small set of domain concept instance seeds. Relation types are then discovered based on verb clustering followed by relation instance extraction. 报告人简介 (BIOGRAPHY):Prof. Lu has over 20 years of working experience both in academic research and in industrial applications on open systems design, standardization, and natural language processing. Her earlier work on codeset announcement has been widely adopted in modern operating systems and programming languages to handle different encodings for different language environments. She has successfully helped to make structured encoding of Chinese character components which lead to a much faster encoding process for Chinese characters. She spearheaded the standardization of the Hong Kong Supplementary Character Set, the first and only commonly adopted character set for Prof. Lu’s research work is mostly focused on using natural language processing method on information extraction and text mining. She has conducted extensive work on Chinese collocation extraction, terminology extraction, and ontology construction. Her research has received over 2million funding from the CERG and over 10million funding from ITf. Her leadership has also lead to the completion of a Hong Kong Jockey Club funded project, ASAB98, using text-to-speech technology to assist the visually blind to access computers and the internet. Prof. Lu received her B.S. in E.E. from 新加坡国立大学 Hwee Tou Ng 教授来访模式识别国家重点实验室 2010年2月5日上午,新加坡国立大学的Hwee Tou Ng教授访问了模式识别国家重点实验室,并做了题为 "Word Sense Disambiguation for All Words without Hard Labor" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): While the most accurate word sense disambiguation systems are built using supervised learning from sense-tagged data, scaling them up to all words of a language has proved elusive, since preparing a sense-tagged corpus for all words of a language is time-consuming and human labor intensive. In this talk, a completely automatic approach to scale up word sense disambiguation to all words of English is proposed and implemented. The approach relies on English-Chinese parallel corpora, English-Chinese bilingual dictionaries, and automatic methods of finding synonyms of Chinese words. No additional human sense annotations or word translations are needed. A large-scale empirical evaluation was conducted on more than 29,000 noun tokens in English texts annotated in OntoNotes 2.0, based on its coarse-grained sense inventory. The evaluation results show that this approach is able to achieve high accuracy, outperforming the first-sense baseline and coming close to a prior reported approach that requires manual human efforts to provide Chinese translations of English senses. This talk is based on joint work with Zhi Zhong. 报告人简介(BIOGRAPHY): Dr. Hwee Tou NG is an Associate Professor of Computer Science at the National University of Singapore, Program Co-chair (Computer Science Program) of the Singapore-MIT Alliance, and a Senior Faculty Member at the NUS Graduate School for Integrative Sciences and Engineering. He received a PhD in Computer Science from the University of Texas at Austin, USA. His research focuses on natural language processing and information retrieval. He has published papers in premier journals and conferences including Computational Linguistics, ACM TOIS, ACL, EMNLP, SIGIR, AAAI, and IJCAI. He is the Editor-in-Chief of ACM Transactions on Asian Language Information Processing (TALIP), and an editorial board member of Journal of Artificial Intelligence Research (JAIR) and Natural Language Engineering. He has also served as an editorial board member of Computational Linguistics journal (2004 - 2006). He is an elected member of the ACL executive committee (2008 - 2010) and a steering committee member and former secretary of ACL SIGNLL. He was program co-chair of EMNLP-2008, ACL-2005, and CoNLL-2004 conferences, and has served as area chair of ACL, EMNLP, and SIGIR conferences and as session chair and program committee member of many past conferences including ACL, EMNLP, SIGIR, AAAI, and IJCAI.
2009年
周明教授来访模式识别国家重点实验室
2009年12月22日上午,微软亚洲研究院的周明教授访问了模式识别国家重点实验室,做了题为 "Generating Chinese Couplets and Poetry with Statistical Approach" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): We propose a novel statistical approach to automatically generate Chinese couplets and Chinese poetry. For Chinese couplets, the system takes as input the first sentence and generates as output an N-best list of second sentences using a phrase-based SMT model. A comprehensive evaluation using both human judgments and BLEU scores has been conducted and the results demonstrate that this approach is very successful. We then extended this approach to generate classic Chinese poetry using the quatrain as a case study. Given a few keywords describing a user's intention, a statistical model is used to generate the first sentence. Then a phrase-based SMT model is used to generate the other three quatrain sentences one by one. Evaluation using human judgment over individual lines as well as the quality of the generated poem as a whole demonstrates promising results. Besides this topic, at the beginning of my presentation, an overview of the current projects on NLP at MSRA will be given. Through my presentation, I will share my vision and experiences about how to develop useful NLP technologies in Internet age. 报告人简介(BIOGRAPHY): Ming Zhou received his B.Eng from Chongqing University in 1985 and M.S. and Ph.D. degrees from Harbin Institute of Technology (HIT) in 1988 and 1991. During his study at HIT, he developed China’s first Chinese-English machine translation system. During 1991-1999, he worked at Tsinghua University as a Postdoc Researcher and then as associate professor. During 1996-1999, he visited Kodensha Ltd. in Japan and as team leader developed J-Beijing machine translation software product. This product has become the most popular Chinese-Japanese translation product in Japan and was awarded Makoto Nagao Award in 2008. In 1999, he joined Microsoft Research Asia as researcher. In 2001, he became the group manager of Natural Language Group. Under his leadership, NLC group has invented many innovations. He has served as PC chair of first Asian Information Retrieval Symposium AIRS (2004) and area chairs of ACL, Coling, IJCAI, EMNLP, MT Summit, AAAI, and other NLP conferences about 15 times. He has published about 80 papers including 20 ACL papers. He currently is the member of editorial boards of Journal of Machine Translation, Journal of Computational Linguistics and associate editor of ACM Transaction of Asian Language Information Processing. He was the co-director of MOE-HIT Key Lab on NLP and Speech in 2001-2008 and now he is the co-director of MOE-MS Key Lab on Media and Network at Tsinghua University. 周国栋教授来访模式识别国家重点实验室 2009年12月18日下午,苏州大学的周国栋教授访问了模式识别国家重点实验室,并做了题为 "自然语言理解:句法分析、语义计算及篇章理解" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): 语言现象看似简单, 实际上却反映了人类智慧中最复杂也最本质的特点. 自然语言理解作为语言信息处理技术的一个高层次的重要研究方向, 一直是人工智能领域的核心课题, 它强调对意义和意图的解释, 以实现对语言的“深层”理解. 一个自然语言理解系统最主要的特点就是能够对句子意义的表示进行计算, 并且能够在推理任务中使用这些表示. 本报告将从以下三个层次探讨如何进行自然语言理解: 1) 句法分析 -- 关心句子的内部结构, 即词语之间是如何排列以组成符合语法规则的句子, 并决定每个词语在句子中所充当的结构角色, 以及短语之间的构成关系; 2) 语义计算 -- 关心句子的含义, 即句子中词语意义是如何相互结合以形成句子意义的; 3) 篇章理解 -- 关心句子之间的相互理解, 即前面的句子如何影响对后面句子的解译. 这种信息对代词的解释显得特别重要. 报告人简介(BIOGRAPHY): 1997年12月毕业于新加坡国立大学获得博士学位。1999年4月-2006年8月在新加坡资讯通信研究院分别担任副科学家、科学家博导和副主任科学家博导。2006年8月底加入苏州大学。研究方向:自然语言理解、机器翻译、信息抽取、机器学习等。 近5年来发表国际著名期刊和会议论文 80余篇,包括 IPM(3篇)、 ACL(6篇) 、EMNLP(5篇)、 COLING(6篇)和 IJCNLP(5篇) 等,其中SCI索引近20篇。2006年8月底回国以来, 主持和技术总负责多项国家和省部级科研项目。担任过许多著名的国际杂志和会议的评审和委员会委员,包括InS, IPM, CSL, TASLP, NLE, ACL, EMNLP, COLING, IJCNLP, IJCAI等。目前是ACM、IEEE Computer Society、ACL等多个国际学术机构的会员。 张家俊、宗成庆获 PACLIC23 国际会议最佳论文奖 在12月5日刚刚结束的第23届亚太地区语言、信息与计算(The 23rd Pacific Asia Conference on Language, Information and Computation, PACLIC)国际学术会议上,模式识别国家重点实验室博士生张家俊同学与导师宗成庆研究员共同署名的论文“A Framework for Effectively Integrating Hard and Soft Syntactic Rules into Phrase Based Translation”获得最佳论文奖。 PACLIC是自然语言处理领域一个历史悠久、影响较大的国际学术会议,据有关会议排名网站统计,在人工智能与机器学习系列的国际会议中位列前7%。该会议每年举办一次,今年是第23届,于12月3-5日在香港成功召开。本届PACLIC大会共收到论文投稿145篇,其中,58篇被接受为会议报告,最终2篇被评为最佳论文。 张家俊和宗成庆本次获奖的论文针对目前统计机器翻译中短语调序方法的缺陷,提出了一种高效利用句法知识指导调序的方法。其基本思想是:在翻译前首先获取句法调序知识,并用统一的格式表示人工调序规则和概率调序规则,然后将其作为一种特征融入改进后的翻译模型中,最终利用这些句法特征在翻译过程中指导调整短语的顺序。实验表明,该方法明显改善了翻译系统的译文质量。本届PACLIC论文评审委员会认为:该论文提出了一种新颖的融合句法调序规则的方法,没有采用传统的方法利用句法规则调整源语言的语序,而是将句法规则作为一种特征巧妙地融合到翻译模型中,在几乎不增加翻译时间的基础上,显著地改善了翻译性能,适合于大多数亚洲语言与欧洲语言之间的翻译。 慎习鹏博士来访模式识别国家重点实验室 2009年12月8日下午,美国 University of Rochester 的慎习鹏博士访问了模式识别国家重点实验室,并做了题为 "Making Programs Learn as Birds Do ---Pattern Recognition for Program Optimizations" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): Unlike a bird, which can learn to fly better and better, existing programs are kind of dumb---the one millionth run of a program is typically not a bit better than the first-time run. The last decade has seen an increasing uses of statistical learning techniques to inject intelligence into program optimizers to improve the effectiveness of compilers in optimizing programs. The results have been remarkable, especially for the problems that have been extremely difficult for traditional compilation technology to solve. This talk will start with a discussion on the relations between Pattern Recognition and Program Optimizations. It will then concentrate on input-centric program behavior analysis, a recently developed new paradigm of program optimizations, as an example to demonstrate the remarkable potential of the combination of the two disciplines. At the end, it will show how the new paradigm leads to an intelligent programming system, in which programs automatically learn and evolve. 报告人简介(BIOGRAPHY): Xipeng Shen has been an assistant professor at The College of William and Mary since 2006. He received his Ph.D. and Master degree in Computer Science from University of Rochester in 2006 and 2003 respectively. He received the M.S. degree in Pattern Recognition from Chinese Academy of Sciences in 2001, and the B.S. degree from The North China University of Technology. Xipeng Shen's main research lies in the area of Compiler Technology and Programming Systems, covering Optimizing Compilers, Parallel Computing, GPU Computing, and Program Behavior Analysis. He leads the Compilers and Adaptive Programming Systems research group at The College of William and Mary. The group have been focusing on integrating automatic learning, adaptation, and evolvement into different computing layers to form a whole-system synergy. Hans Uszkoreit教授来访模式识别国家重点实验室 2009年11月2日上午,德国萨尔州大学的 Hans Uszkoreit教授访问了模式识别国家重点实验室,并做了题为 "Learning of positive and negative patterns for relation extraction" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): Minimally supervised machine learning methods based on bootstrapping are an attractive approach to advanced information extraction. Complex patterns sigalling relevant semantic relations in free texts can be detected in this way. However, the potential and limitations of such methods are not yet sufficiently understood. We have systematically analyzed a bootstrapping approach. The starting point of the analysis is a pattern-learning graph, which is a subgraph of the bipartite graph representing all connections between linguistic patterns and relation instances exhibited by the data. It is shown that the performance of such general learning framework for actual tasks is dependent on certain properties of the data and on the seed construction. However, the greatest improvements can be obtained through the systematic learning of negative patterns. 报告人简介(BIOGRAPHY): Hans Uszkoreit is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. By cooptation he is also Professor of the Computer Science Department. Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards. He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.His current research interests are computer models of natural language understanding and production, advanced applications of language and knowledge technologies such as semantic information systems, translingual technologies, cognitive foundations of language and knowledge, deep linguistic processing of natural language, syntax and semantics of natural language and the grammar of German. 张民博士来访模式识别国家重点实验室 2009年10月26日上午,新加坡资讯通信研究院的 张民博士访问了模式识别国家重点实验室,并做了题为 "Language Technology Research in I2R" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): Information Technology and Industry is one of the key fundamental resources to drive the economy development of Singapore. As one of the major research agencies in Singapore, I2R conducts a broad range of research on information technology. The presentation will begin with a brief description of I2R/A-Star, and then focus mainly on language technology research in I2R, including three technical parts: (1) The spring of machine translation from 2010: the strong demand from market and the huge funding support from government and society call for the next technology breakthrough of MT. (2) Structured feature modeling by kernel-based machine learning. (3) Technical writing: how to sell your research results to the research community and outside market. The talk will be concluded with some observations and discussions. 报告人简介(BIOGRAPHY): Dr. ZHANG Min is a research scientist at the Institute for Infocomm Research (I2R) of Singapore. He earned his Ph.D. degree from Harbin Institute of Technology in 1997. From Dec. 1997 to Aug. 1999, he worked as a postdoctoral research fellow in Korean Advanced Institute of Science and Technology. He began his academic and industrial career as a researcher at Lernout & Hauspie Asia Pacific (Singapore) in Sep. 1999. He joined Infotalk Technology (Singapore) as a researcher in Jan 2001 and became a senior research manager in 2002. Dr. ZHANG joined I2R in Dec. 2003. He built up the statistical machine translation (SMT) team in the institute from scratch since 2007. Currently he oversees the SMT research and development in the institute. He is a recipient of the National Infocomm Awards 2001/02 in Singapore. He has authored/co-authored more than 80 papers in international leading journals and conferences, and co-edited 5 books. He serves as PC chairs, co-chairs, technical members and reviewers of a number of important international journals and conferences. He is the executive editor of the International Journal of Asian Language Processing. 口语翻译系统再创佳绩 近日,每年一度的国际口语翻译系统评测(IWSLT)结果公布,模式识别国家重点实验室开发的机器翻译系统(简称为NLPR翻译系统)脱颖而出,再创佳绩。在此次评测中NLPR翻译系统几乎囊括了IWSLT汉英、英汉双向翻译评测任务全部指标的第一名。 近几年来,系统评测在机器翻译领域受到广泛关注,它不仅有效地推动了机器翻译技术的快速发展,而且促进了机器翻译评测技术本身的发展。世界各国从事机器翻译技术研究和系统开发的专家们无不将每年一度的评测看作检验自己技术水平的大好时机,纷纷加入系统评测,一争高下。模式识别国家重点实验室开发的汉英口语翻译系统曾于2007年和2008年连续两次在IWSLT评测中超过美国CMU、MIT和德国RWTH等著名大学开发的翻译系统,取得人工评测第一名的优异成绩,今年再次以绝对优势位居第一。这充分表明,我们的口语翻译研究已进入稳步发展的快车道。 (2009年9月12日) 黄居仁教授访问实验室 2009年8月24日香港理工大学人文学院院长、台湾中央研究院语言学研究所研究员黄居仁博士访问模式识别国家重点实验室,与从事自然语言处理研究的师生进行了讨论。 吴德恺博士访问实验室 2009年4月10日香港科技大学吴德恺博士访问模式识别国家重点实验室,做了题为 Structured Models in Statistical Machine Translation 的学术报告,并与实验室师生进行了相关讨论。 Kevin Knight 教授来访模式识别国家重点实验室 2009年4月1日下午,美国南加州大学的 Kevin Knight 教授访问了模式识别国家重点实验室,并做了题为 "Small Models for Natural Language Processing" 的报告,和同学们进行了交流,受到热烈欢迎。 报告摘要(ABSTRACT): Occam's Razor says that the simplest explanation is the best, all other things being equal. This principle is well known to linguists, who strive for small, elegant models of human language. In natural language processing, minimal models are less often pursued. This talk explores how to explicitly optimize model size for the problems of word alignment and part-of-speech tagging, and we give empirical results. 报告人简介(BIOGRAPHY): Kevin Knight is a Senior Research Scientist and Fellow at the Information Sciences Institute of the University of Southern California. He is a Research Associate Professor in the Computer Science Department at USC, and he is also Chief Scientist at Language Weaver, Inc. Dr. Knight received his PhD from Carnegie Mellon University in 1991 and his BA from Harvard University in 1986. He is co-author, with Elaine Rich, of the textbook Artificial Intelligence. His main research interests are statistical natural language processing, machine translation, natural language generation, and decipherment. Dr. Knight has authored over fifty scientific papers on language translation, and he is active in building and deploying large-scale language translation systems. Previously, he served on the editorial boards of the Computational Linguistics journal, the Journal of Artificial Intelligence Research, and the ACM Transactions on Speech and Language Processing. Dr.Knight was general chair of the conference of the Association for Computational Linguistics (ACL) in 2005, and he was elected to serve as ACL president in 2011. 朱靖波教授等专家访问实验室
2009年3月6日下午,东北大学信息学院教授、计算机软件与理论研究所副所长、自然语言处理实验室主任朱靖波教授一行4人访问了实验室, 朱靖波教授并做了题为 Sentiment Analysis and Opinion Mining with Applications to Opinion Polling 的学术报告,受到热烈欢迎。
2008年
刘挺教授等专家访问实验室
2008年12月12日上午哈尔滨工业大学自然语言处理专家刘挺教授和车万祥博士、赵世奇博士一行访问了实验室,并做了关于依存句法分析、复述方法研究等方面的学术报告,受到热烈欢迎。
第四届全国机器翻译研讨会在我所成功举办
为了推动中国机器翻译研究的发展,促进自然语言处理领域国内外同行的交流,由中科院自动化所承办的“第四届全国机器翻译研讨会”(The 4th China Workshop on Machine Translation, CWMT’2008)于2008年11月27日-28日在我所成功举行。 2005年由中科院自动化研究所、计算所和厦门大学联合发起并组织了第一届统计机器翻译技术评测及学术研讨会,会议于2005年7月在厦门大学成功举办。2006年和2007年,由中科院计算所、自动化所、软件所、哈尔滨工业大学和厦门大学联合组织的第二届和第三届全国统计机器翻译研讨会分别在中科院计算所和哈尔滨工业大学成功召开。前三届会议的成功举办,为加强国内外同行的学术交流,促进中国机器翻译事业的发展,起到了很好的推动作用。为了进一步扩大会议影响,拓展会议视野,会议自本届起更名为“全国机器翻译研讨会(China Workshop on Machine Translation, CWMT)”,并延续了前三届的计数。 本届会议开幕式于11月27日上午9时正式开始,由程序委员会主席、模式识别国家重点实验室副主任宗成庆研究员主持并介绍了会议组织情况,中国中文信息学会常务副理事长曹右琦研究员到会讲话,中国科学院计算机语言信息工程中心主任、华建集团总裁黄河燕研究员致辞,并介绍了中国中文信息学会机器翻译专业委员会的筹建情况。来自国内外30多家高校、科研单位以及企业的110多名代表出席了本次大会。本次会议还专门邀请了国际机器翻译领域著名专家、英国爱丁堡大学的Philipp Koehn博士做了专题讲座,来自巴黎的国际著名机器翻译公司Systran公司的杨进博士介绍了该公司几十年来开发和经营实用机器翻译系统的经验。为期两天的会议得到了国内外同行的广泛好评,取得圆满成功。 本届研讨会专门组织了机器翻译系统评测,来自国内外的15家单位参加评测,并在研讨会上分别介绍了各自系统的实现技术。我所开发的新闻领域汉英机器翻译系统和科技领域英汉机器翻译系统在此次评测中成绩优异。 本次大会的成功举办和会议期间介绍的技术成果,体现了国内机器翻译研究的长足进步,远远超出预期的参会人数标志着机器翻译研究正蓬勃发展。为期两天的会议为代表们提供了一个相互了解、共同学习的机会,并为今后进一步研究和相互合作奠定了良好的基础。我们相信,本届研讨会必将为我国机器翻译研究的发展产生积极而深远的影响。
我所开发的机器翻译系统在国际评测中再获佳绩
我所开发的CASIA汉英口语自动翻译系统在今年国际口语翻译系统评测中再次取得了人工评测第一名的优异成绩,并在首次参加英汉翻译系统评测中取得了人工评测第二的好成绩。 国际口语翻译系统评测活动由国际语音翻译先进研究联盟(C-STAR)组织,每年举办一次,所有参评系统在相同的时间段和相同的测试集上运行,并采用相同的评判标准和指标对系统运行结果进行打分和排名。之后,各参评单位提交系统研究报告和技术论文,通过召开学术研讨会(International Workshop on Spoken Language Translation, IWSLT)大家相互交流,取长补短,以达到彼此促进、共同进步的目的。因此,一年一度的IWSLT系统评测是对该领域过去一年内技术进展状况的一次检阅,是公平、公开的技术竞争,具有很强的说服力。今年的IWSLT研讨会于10月20-21日在美国夏威夷召开。 今年的评测活动对汉英、英汉、阿拉伯语到英语、汉语到西班牙语、汉语-英语-西班牙语的口语翻译系统进行了评测,并首次可以同时对文本翻译结果、语音识别翻译结果以及自发口语翻译结果进行对比,跟往年相比,今年的评测任务具有更大的挑战性。此次测评活动吸引了众多国际著名的研究机构和大学参加,其中包括美国CMU、MIT、德国RWTH(Aachen University)、意大利ITC、东芝和日本NICT-ATR、NTT等14家单位。在今年的评测任务中,CASIA 一共参加了三个评测任务,即文本语音的汉英自动翻译、自发语音的汉英自动翻译和英汉自动翻译,所谓“文本语音的汉英自动翻译”是指人对着文本朗读出来经语音识别后再进行翻译的输出结果,而“自发语音的自动翻译”是指没有任何文本作为参考,人自发说出的句子经语音识别后再进行翻译的输出结果。我所汉英自动翻译系统在文本语音和自发语音两种情况下,均取得了人工评测第一名的优异成绩,CASIA英汉翻译系统首次参加了自发语音英汉翻译评测,取得了人工评测第二名的好成绩。这是继2007年CASIA口语翻译系统在IWSLT评测中获得人工评测第一名之后的又一佳绩,标志着自动化所的口语自动翻译系统有较好的稳定性,并在近几年来始终处于国际先进水平。 2008年毕业生完成答辩 2008年5月30日, 硕士研究生方李成同学完成了他的硕士学位答辩,他的指导教师是宗成庆研究员,此次答辩的答辩委员会主席是孙乐副研究员,答辩委员会委员有王小捷教授和赵军副研究员。 2008年6月1日, 博士研究生陈钰枫和李寿山同学完成了他们的博士学位答辩,他们的指导教师是宗成庆研究员,此次答辩的答辩委员会主席是高庆狮院士,答辩委员会委员有黄泰翼研究员,冯志伟教授,刘群研究员和孙茂松教授。
张瑞强研究员来访模式识别国家重点实验室 2008年4月24日,日本NICT-ATR的张瑞强研究员来访模式识别国家重点实验室, 此次来访他主要介绍了目前统计机器翻译的研究现状,尤其是日本机器翻译研究的最新技术进展情况,以及目前日本 NICT-ATR 承担的几个主要项目。 高玉清和张昱琪博士来访模式识别国家重点实验室 2008年2月25日,美国IBM研究中心的高玉清博士和德国RWTH的张昱琪博士来研究所访问,她们分别做了题为“Speech-to-Speech Translation Technologies and Systems”和“Chunk-Level Reordering of Source Language Sentences with Automatically Learned Rules for Statistical Machine Translation”的报告。
2007年 Alex Waibel教授来访模式识别国家重点实验室 2007年12月11日,美国卡内基梅隆大学的Alex Waible教授应邀出席了“2007年全国模式识别学术会议”并作了题为Computer Supported Human-Human Multilingual Communication的大会特邀报告。2007年全国模式识别学术会议(Chinese Conference on Pattern Recognition, CCPR2007)”由中国自动化学会和中国科学院自动化研究所主办,中国自动化学会模式识别与机器智能专业委员会和模式识别国家重点实验室共同承办。 Christian Boitet教授做客模式识别国家重点实验室 2007年12月6日,法国格勒诺贝尔第一大学的Christian Boitet教授访问了模式识别国家重点实验室自然语言处理研究组,并作了题为"Linguistic and computational MT architectures are orthogonal, elements for choice"的报告。 黄亮博士来访模式识别国家重点实验室 2007年11月15日,宾夕法尼亚大学的黄亮博士访问了模式识别国家重点实验室自然语言处理研究组,并作了题为K-best Algorithms in Parsing and Machine Translation的报告。 万建成教授来访模式识别国家重点实验室 2007年11月8日,山东大学的万建成教授访问了模式识别国家重点实验室自然语言处理研究组,并作了题为Natural Language Formalization and its Knowledge Induction with BCG 的报告。 赵树彬博士来访模式识别国家重点实验室 2007年11月1日,赵树彬博士访问了模式识别国家重点实验室自然语言处理研究组,并作了题为Information Extraction from Multiple Syntactic Sources的报告。赵树彬博士2006年毕业于美国纽约大学,现在美国 Google 公司从事信息抽取和检索技术的研究。 Stephan Vogel 博士来访模式识别国家重点实验室 2007年10月19日,美国卡耐基梅隆大学的Stephan Vogel教授应邀来 到模式识别国家重点实验室作交流访问,期间他还出席了中科院自动化所主办的2007国际自动化科学与技术前沿会议,并作了题为Multilingual Natural Language Processing - Machine Translation and Beyond的 大会特邀报告。会后,作为一名马拉松爱好者,他还参加了2007北京国际马拉松赛并跑完了全程。
CASIA口语翻译系统在国际评测中取得优异成绩 我所开发的CASIA汉英口语自动翻译系统在今年国际口语翻译系统评测中取得了人工评测第一名的好成绩。 国际口语翻译系统评测活动由国际语音翻译先进研究联盟(C-STAR)组织,每年举办一次,所有参评系统在相同的时间段和相同的测试集上运行,并采用相同的评判标准和指标对系统运行结果进行打分和排名。之后,各参评单位提交系统研究报告和技术论文,通过召开学术研讨会(International Workshop on Spoken Language Translation, IWSLT)大家相互交流,取长补短,以达到彼此促进、共同进步的目的。因此,一年一度的IWSLT系统评测是对该领域过去一年内技术进展状况的一次检阅,是公平、公开的技术竞争,具有很强的说服力。 今年的评测活动对汉英、阿拉伯语与英语、日英和意大利语与英语四种语言对的口语翻译系统进行了评测,10月15-16日IWSLT研讨会在意大利特兰托(Trento)召开,包括来自美国CMU、MIT、德国RWTH(Aachen University)、意大利ITC和日本NICT-ATR、NTT等著名研究机构的15个系统参加了此次汉英翻译系统评测。根据人工评测的结果,我所开发的CASIA汉英口语翻译系统脱颖而出,在15个参评系统中名列第一。这是继2005年CASIA口语翻译系统在IWSLT评测中获得自动评测指标(BLEU)第一名之后的又一佳绩,标志着自动化所的口语翻译研究处于国际先进水平。 著名学者李林山教授访问我所 2007年9月10日语音语言技术领域国际著名学者李林山教授访问自动化所。李林山教授分别于1975年和1977年在美国斯坦福大学(Stanford University)获得硕士和博士学位,1977年至1979年在美国加利福尼亚工业界从事通讯技术的研究。1982年起他成为台湾大学电气工程与计算机科学系的教授,1982年至1987年期间曾担任台湾大学电气工程系主任,1991 年至1997年担任台湾中研院信息科学研究所所长。 李林山教授在数字通讯技术、语音识别与合成、自然语言处理等领域有很深的造诣,1992年当选IEEE Fellow。此次李林山教授作为IEEE 杰出演讲者(Distinguished Lecturer)访问我所,并专门看望了他的老朋友黄泰翼研究员。模式识别国家重点实验室副主任宗成庆研究员陪同李林山教授参观了模式识别国家重点实验室和数字内容技术研究中心在语音语言技术研究方面的部分成果。 外国专家来访模式识别国家重点实验室 意大利计算语言学研究所主任 Nicoletta Calzolari Zamorani 教授和澳大利亚 Macquarie University 语言技术中心主任 Robert Dale 教授分别于2007年9月3日和4日访问模式识别国家重点实验室自然语言处理研究组。 王海峰研究员和吴华研究员访问模式识别国家重点实验室 2007年7月12日,东芝(中国)研发中心的首席研究员王海峰博士以及吴华研究员访问模式识别国家重点试验实并做了精彩的学术报告。王海峰博士的报告主要包括以下几方面的内容:1)东芝研发中心语音、语言技术研究概括;2)统计机器翻译方法进展;3)学术论文撰写方法介绍。吴华研究员于2001年从我实验室毕业获得工学博士学位,此次她向本实验室的师弟、师妹们着重介绍了她和王海峰博士在今年ACL会议(国际最重要的NLP会议)上发表的文章“Pivot Language Approach for Phrase-Based Statistical Machine Translation”,并针对基于短语统计机器翻译的一些方法和同学们进行了交流。 张玉洁博士来访模式识别国家重点实验室 2007年6月15日,日本情报通信研究机构(NICT) 的张玉洁博士来到模式识别实验室进行学术交流。NICT是日本在自然语言处理领域与中国交流合作的窗口单位,近几年来在自然语言处理领域有较大的投入。此次来访,张玉洁博士向我室语音语言技术研究组的同学们介绍了NICT在机器翻译研究方面的最新进展和未来设想,并对我室从事自然语言处理研究的两位硕士研究生的毕业论文提出了指导意见和建议,使同学们受益匪浅。 著名学者苏克毅教授来访模式识别国家重点实验室
2007年5月30日,自然语言处理领域的国际著名学者苏克毅(Keh-Yih Su)教授应邀访问中科院自动化研究所模式识别国家重点实验室,并做了题为“Problems of Current NLP Approaches”的学术报告。
苏克毅教授长期致力于机器翻译技术的理论研究和应用开发工作,在国际机器翻译领域享有很高的声誉。他早年留学美国,在AT&T贝尔实验室做过访问学者,于台湾清华大学任教十余年,曾是台湾清华大学最年轻的教授,2002年创办BDC公司(Behavior
Design Corporation)。他曾于上个世纪80年末期在W.
Weaver提出的机器翻译思想的基础上率先阐述了基于噪声信道模型的统计机器翻译方法,而成为这一方法的奠基人之一,曾担任国际计算语言学领域最具权威的学术组织ACL(Association
for Computational
Linguistics)的执行主席以及该领域国际顶级学术刊物Journal
of Computational Linguistics和Machine
Translation等杂志的编委。多年来,苏教授与模式识别国家重点实验室一直保持着良好的学术交流和合作关系。此次来访,他与实验室从事相关研究的老师和学生就自然语言处理领域的一些具体问题和方法进行了深入讨论,并详细阐述了他对当前自然语言处理方法的一些深入思考,耐心解答了同学们的提问,使大家受益匪浅。 2005年 美国密执安州立大学研究生圆满结束在 模式识别国家重点实验室的学习访问
美国密执安州立大学(Michigan
State University)研究生Cassandran
N. Jackson
女士圆满结束在自动化所模式识别国家重点实验室的学习访问,已于2005年8月10日完成学术报告,将于近期回国。 由我国科技部与美国科学基金共同签署的“中美青年科技人员交流计划”于2005年6月19日至8月23日执行。该计划的主要内容是安排美国研究生暑期赴中国的大学和研究所实验室与中国青年研究人员共同开展研究工作,旨在促进中美两国青年学者之间的交流、增进友谊;加深两国青年科技工作者对对方国家科技发展状况的了解,宣传我国取得的科技成就,推动两国间的合作;为未来中美科技合作事业培养人才,为我国大学和科研院所开辟新的国际合作渠道,为未来中美之间的进一步合作奠定基础。 在今年的“中美青年科技人员交流计划”执行中,有29名美国研究生来华开展合作研究。这些研究生根据他们不同的研究兴趣分别赴我国20多所大学和研究所从事相关研究的实验室进行学习访问,来自美国密执安州立大学的研究生Cassandran N. Jackson女士由我所模式识别国家重点实验室宗成庆研究员负责接待,研究主题是自然语言处理中的句法分析技术。Cassandran N. Jackson女士在我所学习访问期间,与模式识别国家重点实验室中文信息处理研究组的同学进行了广泛接触和友好合作,圆满完成了研究计划,达到了预期的目的,取得了很好的效果,为进一步合作奠定了良好的基础。8月10日下午Cassandran N. Jackson女士提交了研究报告,并在模式识别国家重点实验室做了题为“Probabilistic Base-NP Parser”的演讲报告。8月10日晚上,模式识别国家重点实验室中文信息处理研究组的部分同学和老师为Cassandran N. Jackson女士举行了送别晚宴。 “中美青年科技人员交流计划”是我们国家促进青年学者进行国际交流与合作、加深中美青年学生相互了解的重要举措,这项计划得到了中美两国学生的普遍欢迎,我们相信此次交流访问必将为我所自然语言处理领域的国际合作提供新的良机。
中科院自动化研究所与意大利科学技术研究中心 正式签署合作备忘录 2005年7月4日中科院自动化研究所(CASIA)与意大利科学技术研究中心(The Center for Scientific and Tecnological Research, ITC-irst[1])正式签署合作备忘录,签字仪式在意大利驻华大使馆举行,中科院自动化研究所的宗成庆研究员和意大利科学技术研究中心的Gianni Lazzari 研究员分别代表合作双方在备忘录上签字。自动化所所长助理张恭清研究员和意大利驻华大使馆科技参赞Roberto Coïsson教授出席了本次签字仪式。
意大利科学技术研究中心成立于1976年,座落于意大利北部城市Trento。该中心以解决实际科学与技术问题为目标,以社会和工业技术创新需求为驱动,致力于应用基础研究,在信息技术、微波系统等领域具有雄厚的研发实力。该研究中心的交互感知系统实验室在人类语言技术的研究与开发方面已有15年的历史,在语音识别和口语自动翻译等技术领域,处于国际先进水平,尤其在欧洲具有较大的影响。2002年至2004年,该中心作为国际语音翻译先进研究联盟(Consortuim for Speech Translation Advanced Research International, C-STAR)的主席单位主持C-STAR的工作。 多语言自动翻译问题是21世纪信息社会全球化所面临的主要障碍之一。而随着中国经济的迅速崛起,汉语已成为或正在成为倍受国际社会关注的主流语言,英语与汉语等世界主要语言的互译问题已经成为急待解决的科学与技术难题。自从包括意大利在内的很多欧洲国家对华旅游全面开放以来,欧洲20种官方语言与汉语的自动互译,已经成为社会重大需求的一种产业。 此次CASIA与ITC-irst的合作,拟在中、意双方科技部和中科院等有关部门的支持下,面向2008年北京奥运会这一目标以及5-10年的中长期目标,在人类语言与通信技术领域开展实用技术和应用基础的研究,力求在多语言通信、多语言互译等领域取得实用技术的关键性突破。中意双方考虑在条件成熟时,联手成立“人类语言技术与通信联合实验室”。
中、日、韩语言信息技术合作研讨会在自动化所成功举行 中国科学院自动化研究所(CASIA)与日本国际电气通信基础技术研究所(ATR)和韩国电子通信研究院(ETRI)关于语言信息技术合作的国际研讨会于2005年6月3日在自动化所成功举行。 中国科学院自动化研究所与日本国际电气通信基础技术研究所和韩国电子通信研究院的三方合作又称CJK(China-Japan-Korea)合作,其合作协议正式签署于2004年4月,主要目标是面向现代通信技术和信息社会重大需求,尤其是面向2008年北京奥运会多语言口语自动翻译和通信问题,进行语音、语言技术的联合攻关,在此框架下三方进行中、日、韩三种语言的口语语料共同开发与共享、语音识别与合成以及汉英自动翻译技术模块的相互交换等实质性合作。由于中、日、韩合作三方同属国际语音翻译先进研究联盟(Consortium for Speech Translation Advanced Research international, C-STAR)的核心成员,又是亚洲相邻的国家,有着共同的研究兴趣和目标,而且有多年友好的合作历史,因此,自CJK合作协议签订以来,三方合作非常顺利,到目前为止已完成25万句汉、日、韩三种语言口语句子级对照语料的共同开发工作,这是自2003年C-STAR完成16万句BTEC多语言(汉、英、日、德、意、韩6种语言)口语对照语料以来又一次国际性更大规模的多语言口语平行语料资源共同开发和共享行动。至此,中科院自动化所仅在国际合作框架下就已拥有了汉、日、韩三种语言各40多万句的大规模多语言口语对照语料使用权。这也是目前为止国际上规模最大、对照语言种类最多的口语平行语料。在本次CJK研讨会上,合作三方还相互交换了汉语语音识别模块、汉语语音合成模块、汉英自动翻译模块以及日、韩对应技术模块的动态链接库。另外,此次研讨会三方还就下一步新的合作目标和计划进行了富有成效的磋商。 随着全球化信息时代的到来,国际合作已经成为一种必然,尤其要解决多语言自动翻译和跨语言通信问题,没有多国科学家的联合攻关和携手合作是不可想象的。我们相信,CJK合作实现的语料资源和技术模块共享必将为语音自动翻译技术的深入研究奠定很好的基础。 2004年
| ||||||||||||||
2003年
| ||||||||||||||
2002年
| ||||||||||||||
2000年
| ||||||||||||||
如有问题请联系 zlu@nlpr.ia.ac.cn |