Pruning False Unknown Words to Improve Chinese Word Segmentation

Asahara, Masayuki; Matsumoto, Yuji; Goh, Chooi-Ling; 浅原, 正幸; 松本, 裕治

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "c9414b68-717f-4563-8625-66472a36707d"}, "_deposit": {"created_by": 3, "id": "28862", "owners": [3], "pid": {"revision_id": 0, "type": "depid", "value": "28862"}, "status": "published"}, "_oai": {"id": "oai:waseda.repo.nii.ac.jp:00028862", "sets": ["2080"]}, "author_link": ["50020", "50017", "50019", "50018", "50021"], "item_10003_biblio_info_90": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2005-11-16", "bibliographicIssueDateType": "Issued"}, "bibliographicPageEnd": "150", "bibliographicPageStart": "139", "bibliographic_titles": [{}]}]}, "item_10003_creator_87": {"attribute_name": "著者別名", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Asahara, Masayuki"}], "nameIdentifiers": [{"nameIdentifier": "50020", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "Matsumoto, Yuji"}], "nameIdentifiers": [{"nameIdentifier": "50021", "nameIdentifierScheme": "WEKO"}]}]}, "item_10003_description_123": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"subitem_description": "text", "subitem_description_type": "Other"}]}, "item_10003_description_88": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "During the process of unknown word detection in Chinese word segmentation, many detected word candidates are invalid. These false unknown word candidates deteriorate the overall segmentation accuracy, as it will affect the segmentation accuracy of known words. Therefore, we propose to eliminate as many invalid word candidates as possible by a pruning process. Our experiments show that by cutting down the invalid unknown word candidates, we improve the segmentation accuracy of known words and hence that of the overall segmentation accuracy.", "subitem_description_type": "Abstract"}]}, "item_10003_publisher_116": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "Logico-Linguistic Society of Japan"}]}, "item_10003_relation_124": {"attribute_name": "シリーズ", "attribute_value_mlt": [{"subitem_relation_name": [{"subitem_relation_name_text": "Oral Session"}]}]}, "item_10003_relation_125": {"attribute_name": "関係URI", "attribute_value_mlt": [{"subitem_relation_name": [{"subitem_relation_name_text": "http://www.decode.waseda.ac.jp/PACLIC18/"}]}]}, "item_10003_subject_100": {"attribute_name": "日本十進分類法", "attribute_value_mlt": [{"subitem_subject": "801.06", "subitem_subject_scheme": "NDC"}]}, "item_10003_subject_110": {"attribute_name": "米国議会図書館件名標目", "attribute_value_mlt": [{"subitem_subject": "Computational linguistics--Congresses", "subitem_subject_scheme": "LCSH"}]}, "item_10003_text_144": {"attribute_name": "URI", "attribute_value_mlt": [{"subitem_text_value": "http://hdl.handle.net/2065/567"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Goh, Chooi-Ling"}], "nameIdentifiers": [{"nameIdentifier": "50017", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "浅原, 正幸"}], "nameIdentifiers": [{"nameIdentifier": "50018", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "1000080379528", "nameIdentifierScheme": "NRID", "nameIdentifierURI": "https://nrid.nii.ac.jp/ja/nrid/1000080379528"}]}, {"creatorNames": [{"creatorName": "松本, 裕治"}], "nameIdentifiers": [{"nameIdentifier": "50019", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "1000010211575", "nameIdentifierScheme": "NRID", "nameIdentifierURI": "https://nrid.nii.ac.jp/ja/nrid/1000010211575"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2016-11-28"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "oral-11.pdf", "filesize": [{"value": "490.4 kB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 490400.0, "url": {"label": "oral-11.pdf", "url": "https://waseda.repo.nii.ac.jp/record/28862/files/oral-11.pdf"}, "version_id": "37e60dc9-707d-4db6-a9ff-9fb6330800b2"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "conference paper", "resourceuri": "http://purl.org/coar/resource_type/c_5794"}]}, "item_title": "Pruning False Unknown Words to Improve Chinese Word Segmentation", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "Pruning False Unknown Words to Improve Chinese Word Segmentation", "subitem_title_language": "en"}]}, "item_type_id": "10003", "owner": "3", "path": ["2080"], "permalink_uri": "http://hdl.handle.net/2065/567", "pubdate": {"attribute_name": "公開日", "attribute_value": "2008-04-28"}, "publish_date": "2008-04-28", "publish_status": "0", "recid": "28862", "relation": {}, "relation_version_is_last": true, "title": ["Pruning False Unknown Words to Improve Chinese Word Segmentation"], "weko_shared_id": -1}

Pruning False Unknown Words to Improve Chinese Word Segmentation

http://hdl.handle.net/2065/567

名前 / ファイル	ライセンス	アクション
oral-11.pdf (490.4 kB)

Item type

会議発表論文 / Conference Paper(1)

公開日

2008-04-28

タイトル

言語

タイトル

Pruning False Unknown Words to Improve Chinese Word Segmentation

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者

Goh, Chooi-Ling

浅原, 正幸

WEKO 50018
NRID 1000080379528

	浅原, 正幸

Search repository

松本, 裕治

WEKO 50019
NRID 1000010211575

	松本, 裕治

Search repository

著者別名

Asahara, Masayuki

Matsumoto, Yuji

抄録

内容記述タイプ

Abstract

内容記述

During the process of unknown word detection in Chinese word segmentation, many detected word candidates are invalid. These false unknown word candidates deteriorate the overall segmentation accuracy, as it will affect the segmentation accuracy of known words. Therefore, we propose to eliminate as many invalid word candidates as possible by a pruning process. Our experiments show that by cutting down the invalid unknown word candidates, we improve the segmentation accuracy of known words and hence that of the overall segmentation accuracy.

書誌情報

p. 139-150, 発行日 2005-11-16

件名

主題Scheme

NDC

主題

801.06

件名

主題Scheme

LCSH

主題

Computational linguistics--Congresses

出版者

Logico-Linguistic Society of Japan

データタイプ

内容記述タイプ

Other

内容記述

text

HDL URI

http://hdl.handle.net/2065/567

戻る

views

See details

	Views

Versions

Ver.1

2023-07-28 03:30:57.241710

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Pruning False Unknown Words to Improve Chinese Word Segmentation

× Goh, Chooi-Ling

× 浅原, 正幸

× 松本, 裕治

× Asahara, Masayuki

× Matsumoto, Yuji

Versions

Share

Cite as

エクスポート