Search
Search Funnelback University
1 -
34 of
34
search results for `news corpora` |u:mi.eng.cam.ac.uk
Fully-matching results
-
Bin Jia, Khe Chai Sim et al: CU-HTK RT03 ...
mi.eng.cam.ac.uk/research/projects/EARS/pubs/jia_rt03s.pdf23 Jun 2003: Language Model. • Sources of data (using LDC character-to-word segmentor)– Acoustic training data (modifier Kneser-Ney)– News corpora: TDT[2,3,4], China Radio, People’s Daily, Xinhua (Good-. ... Acoustic 206.6 190.8AcousticNews Corpora 199.6 179.8 -
Experiments in Broadcast News Transcription
mi.eng.cam.ac.uk/reports/full_html/woodland_icassp98.html/1 Mar 2000: Experiments in Broadcast News Transcription. P.C. Woodland, T. Hain, S.E. Johnson, T. ... FX. all other speech (e.g. spontaneous non-native). Table 1: Broadcast news focus conditions. -
References [1] Control-DAG: Constrained decoding for…
mi.eng.cam.ac.uk/~wjb31/PUBS/29 Apr 2024: 19] uFACT: Unfaithful alien-corpora training for semantically consistent data-to-text generation. ... We propose uFACT (Un-Faithful Alien Corpora Training), a training corpus construction method for data-to-text (d2t) generation models. -
DEVELOPMENT OF THE CUHTK 2004 MANDARIN CONVERSATIONAL TELEPHONESPEECH …
mi.eng.cam.ac.uk/~mjfg/gales_ICASSP05.pdf22 Nov 2006: The two acoustictraining data sources, and each of the news corpora, were kept asdistinct sources for language model (LM) generation. ... The total contributionfrom all the news corpora was about 0.12, with the majority fromPeople’s Daily (0.09). -
Cross-Lingual Spoken Language Understanding from Unaligned Data…
mi.eng.cam.ac.uk/~sjy/papers/lemy10.pdf20 Feb 2018: in-domainutterance pairs, and up to 91.4% when adding the out-of-domainbilingual corpora detailed in Section 2.2. ... 11] J. Tiedemann, “News from OPUS - A collection of multi-lingual parallel corpora with tools and interfaces,” in Re-cent Advances -
Active Memory Networks for Language Modeling O. Chen, A. ...
mi.eng.cam.ac.uk/~ar527/chen_is2018.pdf15 Jun 2018: Experimentswere conducted on the Penn Tree Bank and BBC Multi-GenreBroadcast News (MGB) corpora, where the proposed approachsignificantly outperforms standard forms of recurrent models inperplexity. ... PTB consists mainly oftext related to finance, -
IMPROVING BROADCAST NEWS TRANSCRIPTION BY LIGHTLY…
mi.eng.cam.ac.uk/reports/svr-ftp/chan_icassp2004.pdf27 May 2004: The rest of the paper is organised as follows. In Section 2, wedescribe the English broadcast news corpora that used in this work.Then, our lightly supervised discriminative training approach ispresented ... Rich Tran-scription Workshop, 2003. [4] D. -
References [1] An inner table retriever for robust table ...
mi.eng.cam.ac.uk/~wjb31/bak.PUBS/3 Nov 2023: 13] uFACT: Unfaithful alien-corpora training for semantically consistent data-to-text generation. ... We propose uFACT (Un-Faithful Alien Corpora Training), a training corpus construction method for data-to-text (d2t) generation models. -
DEVELOPMENT OF THE CUHTK 2004 RT04F MANDARIN CONVERSATIONALTELEPHONE…
mi.eng.cam.ac.uk/~mjfg/rt04f_mandarin.pdf23 Dec 2004: The total contribution fromall the news corpora was about 0.12, with the majority from Peo-ple’s Daily (0.09). ... All experiments use the interpolated language modelwith the news corpora. Language Model System (S3) CER (%)dev04. -
The 1997 HTK Broadcast News Transcription System
mi.eng.cam.ac.uk/reports/full_html/woodland_darpa98.html/1 Mar 2000: 41-48 (Lansdowne,VA, Feb. 1998). The 1997 HTK BROADCAST NEWS TRANSCRIPTION SYSTEM. ... using the broadcast news training texts, the acoustic training data and 1995 Marketplace transcriptions. -
Abstract for evermann_icassp00
mi.eng.cam.ac.uk/reports/abstracts/evermann_icassp00.html27 Jul 2020: The effectiveness of these techniques is demonstrated on the broadcast news and the conversational telephone speech corpora where improvements both in terms of word error rate and normalised cross entropy were -
sig-004.dvi
mi.eng.cam.ac.uk/~sjy/papers/gayo07.pdf20 Feb 2018: The reviewconcludes with a case study of LVCSR for Broadcast News andConversation transcription in order to illustrate the techniquesdescribed. ... The N -gram parameters areestimated by counting N -tuples in appropriate text corpora. -
Machine Learning for Speech & LanguageProcessing Mark Gales 28 ...
mi.eng.cam.ac.uk/~mjfg/FCSW_talk.pdf19 Jul 2006: Text data: used to train the ASR language model:– large news corpora available;– systems built on > 1 billion words of data. • -
STRUCTURAL METADATA RESEARCH IN THE EARS PROGRAM Yang Liu1,5 ...
mi.eng.cam.ac.uk/reports/svr-ftp/tomalin_icassp05.pdf12 May 2005: 2.3. MDE Corpora. Conversational telephone speech (CTS) and broadcast news (BN)are used for the structural event detection tasks in EARS. ... The MDE effort in theEARS program aims to explore these tasks more extensively, us-ing different corpora and -
Bitext Alignment forStatistical Machine Translation Yonggang Deng A…
mi.eng.cam.ac.uk/~wjb31/ppubs/YDengDissertationDec05.pdf16 Feb 2008: 72. 5.9 Percentage of Usable Arabic-English Bitext. English tokens for Arabic-English news and UN parallel corpora under different alignment pro-cedures. ... in real data, for example, parallel corpora mined from web pages, automatic bitext. -
Improving Abstractive Summarization and Information Consistency…
mi.eng.cam.ac.uk/~mjfg/thesis_pm574.pdf9 Jul 2024: This paradigm has shownimpressive results on standard summarization tasks such as news summarization [159, 340].However, there is a challenge in applying a large foundation model to long-documentsummarization such as -
EXPERIMENTS IN BROADCAST NEWS TRANSCRIPTION P.C. Woodland, T. Hain,…
mi.eng.cam.ac.uk/reports/svr-ftp/woodland_icassp98.pdf10 Apr 2000: EXPERIMENTS IN BROADCAST NEWS TRANSCRIPTION. P.C. Woodland, T. Hain, S.E. Johnson, T.R. ... Young S.J. (1997) TheDevelopment of the 1996 Broadcast News Transcription Sys-tem. -
Bitext Alignment for Statistical Machine Translation
mi.eng.cam.ac.uk/~wjb31/ppubs/YDengDefenseDec05.pdf16 Feb 2008: English Arabic-English. Used all parallel corpora available from LDCC-E: 200M En. ... words (news, all UN bitexts). Y. Deng (Johns Hopkins) Bitext Alignment for SMT 39 / 42. -
THE DEVELOPMENT OF THE1996 HTK BROADCAST NEWS TRANSCRIPTION SYSTEM ...
mi.eng.cam.ac.uk/reports/svr-ftp/woodland_darpa97.pdf8 Mar 2000: THE DEVELOPMENT OF THE1996 HTK BROADCAST NEWS TRANSCRIPTION SYSTEM. P.C. Woodland, M.J.F. ... 5. CONCLUSIONThis paper has described our initial efforts to develop systemsfor broadcast news transcription. -
paper.dvi
mi.eng.cam.ac.uk/~mjfg/liao_ICASSP07.pdf15 Aug 2007: Experiments are conductedon theResource Management and Broadcast News corpora. Index Terms— Speech recognition, Robustness. ... 4. EXPERIMENTS. A simplified Broadcast News system based on the 2003 CU-HTKsystem [11] was evaluated. -
The 1997 HTK Broadcast News Transcription SystemP.C. Woodland, T. ...
mi.eng.cam.ac.uk/reports/svr-ftp/woodland_darpa98.pdf8 Mar 2000: The 1997 HTK Broadcast News Transcription SystemP.C. Woodland, T. Hain, S.E. ... news development test data andjust 15.8% on the 1997 evaluation test set. -
tech.dvi
mi.eng.cam.ac.uk/~mjfg/liao_tr552.pdf21 Sep 2007: trained on multistyle data sets such as broadcast news or conversational telephone speech. ... large vocabulary Broadcast News corpus of collected broadcast recordings. 1 Introduction. -
RECENT ADVANCES IN BROADCAST NEWS TRANSCRIPTION D.Y. Kim, G. ...
mi.eng.cam.ac.uk/reports/svr-ftp/kim_asru2003.pdf25 Sep 2003: 12, pp. 75-98. [5] D. Graff (2002). “An Overview of Broadcast News Corpora.”Speech Communication, Vol.37, pp. ... Broadcast News Data. Acoustic training data. Development data. Text corpora. Acoustic model building. -
The Cambridge Multimedia Document Retrieval (MDR) Project : Summary…
mi.eng.cam.ac.uk/reports/full_html/sparckjones_cltr517.html/10 Oct 2001: full audio material including non-news items. Figure 1: Details of the TREC data sets. ... 5. Details of the various corpora are given in the tables with the results. -
sig-004.dvi
mi.eng.cam.ac.uk/~mjfg/mjfg_NOW.pdf19 Mar 2008: The reviewconcludes with a case study of LVCSR for Broadcast News andConversation transcription in order to illustrate the techniquesdescribed. ... The N -gram parameters areestimated by counting N -tuples in appropriate text corpora. -
thesis.dvi
mi.eng.cam.ac.uk/reports/svr-ftp/nock_thesis.pdf14 Jun 2006: time of writing include the tran-scription of real radio and television news broadcasts (eg. ... 72] finds differences in part-of-speechdistributions found in the conversational Switchboard [54], dictated Wall StreetJournal [128] and the mixed speaking -
Article Submitted to Computer Speech and Language Automatic…
mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/kim_csl04.pdf9 Aug 2005: 3. Corpora and evaluation measuresTwo different sets of data, the Broadcast News (BN) text corpus and the 100-hourHub-4 BN data set, were available as training data for the experiments ... For example: News in “Lisa Stark, A. B. C. News, Washington” -
LARGE VOCABULARY DECODING AND CONFIDENCE ESTIMATIONUSING WORD…
mi.eng.cam.ac.uk/reports/svr-ftp/evermann_icassp00.pdf5 May 2000: It is also interesting tonote that the improvement is consistent over the varioustypes of data found in broadcast news. ... The effectiveness of these techniques was demonstratedon the broadcast news and the conversational telephonespeech corpora where -
Joint Training Methods for Tandem and Hybrid Speech Recognition…
mi.eng.cam.ac.uk/~cz277/doc/Thesis-PhD.pdf11 Jul 2017: Joint Training Methods for Tandem andHybrid Speech Recognition Systems. using Deep Neural Networks. Chao Zhang. Department of EngineeringUniversity of Cambridge. This dissertation is submitted for the degree ofDoctor of Philosophy. Peterhouse July -
PhD Thesis
mi.eng.cam.ac.uk/~mjfg/thesis_kcs23.pdf16 Nov 2007: 8.1 Summary of various speech training corpora for CTS-E, BN-E and CTS-M 102. ... News (BN) English transcription tasks are used. 2. Hidden Markov Model Speech Recognition. -
/home/blue7/jjjb2/2009-03-02_ZH-EN/results/HMMcomp.f2e.ps
mi.eng.cam.ac.uk/~wjb31/ppubs/jbrunningthesis.pdf20 Oct 2010: Figure 1.1: Graphical representation of noisy channel models and decoding. 1http://news.xinhuanet.com/english/2007-08/31/content_6637522.htm. ... The amount of text in electronic parallel corpora that can be used forthis purpose is rapidly increasing: -
DESIGN OF FAST LVCSR SYSTEMS G. Evermann & P.C. ...
mi.eng.cam.ac.uk/reports/svr-ftp/evermann_asru2003.pdf23 Sep 2003: More details on the effectiveness of these tech-niques on Broadcast News can be found in [8]. ... InProc. IEEE ASRU Workshop, 1997. [4] D. Graff. An Overview of Broadcast News Corpora.SpeechCommunication, 37:15–26, 2002. -
"Refinements in Hierarchical Phrase-Based Translation…
mi.eng.cam.ac.uk/~wjb31/ppubs/jpino2015HieroRefinementsThesis.pdf6 Feb 2015: In order to address thisconcern and also in order to obtain more data, parallel data can also beextracted automatically from comparable corpora (Smith et al., 2013). ... How-ever, the widespread availability of machine translation and the developmentof -
TOWARDS IMPROVED LANGUAGE MODEL EVALUATION MEASURES Philip Clarkson…
mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/clarkson_eurospeech99.pdf9 Aug 2005: Different quantities of thetraining corpora were used to train each language model, andvarious cutoffs were applied. ... Christie, and A. Robinson. TheTranscription of Broadcast Television and Radio News:The 1996 Abbot System.
Search history
Recently clicked results
Recently clicked results
Your click history is empty.
Recent searches
Recent searches
Your search history is empty.