Acemap Knowledge Graph


As knowledge graph plays a more and more important role in this artificial intelligence era, many research groups are trying to organize the knowledge in their domain into a machine-readable knowledge graph, which stores knowledge in triple.

Acemap Knowledge Graph (AceKG), supported by Acemap, is now open to everyone for research and non-commercial use. We hope this knowledge graph will benefit the research and development for academic data mining.


AceKG describes 114.30 million academic entities based on a consistent ontology, including 61,704,089 papers, 52,498,428 authors, 50,233 research fields, 19,843 academic institutes, 22,744 journals, 1,278 conferences and 3 special affiliations. In total, AceKG consists of 2.2 billion pieces of relationship information. The schema of AceKG is provided as follows.

Compared with other existing open academic KGs or datasets, AceKG has the following advantages. First, AceKG offers a heterogeneous academic information network, i.e., with multiple entity categories and relationship types, which supports researchers or engineers to conduct various academic data mining experiments. Second, AceKG is sufficiently large (the size of this dataset is nearly 100G) to cover most instances in the academic ontology, which makes the experiments based on AceKG more convincing and of practical value. Third, AceKG is fully organized in structured RDF triples, which is machine-readable and easy to process.

In the future, we will provide datasets of different sizes for people to conduct research and develop some interactive applications on the Acemap academic system based on SPARQL querying.

Data Description

In this part, we specifically introduce its data schema In the table below. All the linked data are stored in turtle format.

Class Subject Predicate Object Example
Paper ace:Paper_ID rdf:type ace:Paper ace:7CE9EBAC rdf:type ace:Paper .
ace:Paper_ID ace:paper_citation_count xsd:int ace:7CE9EBAC ace:paper_citation_count 67 .
ace:Paper_ID ace:paper_cs_relevant xsd:boolean ace:7CE9EBAC ace:paper_cs_relevant “false”^^xsd:boolean .
ace:Paper_ID ace:paper_future_rank xsd:int ace:7CE9EBAC ace:paper_future_rank 462498 .
ace:Paper_ID ace:paper_keyword xsd:string ace:7CE9EBAC ace:paper_keyword “neural networks”^^xsd:string .
ace:Paper_ID ace:paper_rank xsd:int ace:7CE9EBAC ace:paper_rank 18165 .
ace:Paper_ID ace:paper_publish_date xsd:date ace:7CE9EBAC ace:paper_publish_date "2014”^^xsd:date .
ace:Paper_ID ace:paper_sci_citation xsd:int ace:7CE9EBAC ace:paper_sci_citation 11 .
ace:Paper_ID ace:paper_title xsd:string ace:7CE9EBAC ace:paper_title “Dropout:A simple way to prevent neural networks from overfitting”^^xsd:string .
ace:Paper_ID ace:paper_is_written_by ace:Author_ID ace:7CE9EBAC ace:paper_is_written_by ace:218FC062 .
ace:Paper_ID ace:paper_is_in_field ace:Field_ID ace:7CE9EBAC ace:paper_is_in_field ace:0304C748 .
ace:Paper_ID ace:paper_publish_on ace:Venue_ID ace:7BFA9BE5 ace:paper_publish_on ace:07179FAA .
ace:Paper_ID ace:paper_cit_paper ace:Paper_ID ace:7CE9EBAC ace:paper_cit_paper ace:7BFA9BE5 .
Author ace:Author_ID rdf:type ace:Paper ace:218FC062 rdf:type ace:Author .
ace:Author_ID ace:author_name xsd:string ace:218FC062 ace:author_name “geoffrey e hinton”^^xsd:string .
ace:Author_ID ace:author_citation_count xsd:int ace:218FC062 ace:author_citation_count 32891 .
ace:Author_ID ace:author_number_of_paper xsd:int ace:218FC062 ace:author_number_of_paper 388 .
ace:Author_ID ace:author_sci_citation xsd:int ace:218FC062 ace:author_sci_citation 7565 .
ace:Author_ID ace:author_is_in_field ace:Field_ID ace:218FC062 ace:author_is_in_field ace:0304C748 .
ace:Author_ID ace:work_in ace:Institute_ID ace:218FC062 ace:work_in ace:0B0ADEB6 .
Institute ace:Institute_ID rdf:type ace:Institute ace:0B0ADEB6 rdf:type ace:Institute .
ace:Institute_ID ace:institute_name xsd:string ace:0B0ADEB6 ace:institute_name “university of toronto”^^xsd:string .
Venue ace:Venue_ID rdf:type ace:Conference ace:094D2874
ace:Venue_ID ace:conference_full_name xsd:string ace:094D2874 ace:conference_full_name “KDD 2015:21th ACM SIGKDD Conference or Knowledge Discovery and Data Mining”^^xsd:string .
ace:Venue_ID ace:conference_short_name xsd:string ace:094D2874 ace:conference_short_name “KDD 2015”^^xsd:string .
ace:Venue_ID rdf:type ace:Journal ace:07179FAA rdf:type ace:Journal .
ace:Venue_ID ace:journal_name xsd:string ace:07179FAA ace:journal_name “journal of Machine Learning Research”^^xsd:string .
Field ace:Field_ID rdf:type ace:Field ace:0304C748 rdf:type ace:Field .
ace:Field_ID ace:field_name xsd:string ace:0304C748 ace:field_name “Artificial neural network”^^xsd:string .
ace:Field_ID ace:field_level xsd:string ace:0304C748 ace:field_level “L2”^^xsd:string.
ace:Field_ID ace:field_papers_num xsd:int ace:0304C748 ace:field_papers_num 330252.
ace:Field_ID ace:field_reference_count xsd:int ace:0304C748 ace:field_reference_count 1713867 .
ace:Field_ID ace:field_is_part_of ace:Field_ID ace:0304C748 ace:field_is_part_of ace:0724DFBA .
Affiliation ace:Affiliation_ID rdf:type ace:Affiliation ace:C9 rdf:type ace:Affiliation .
ace:Affiliation_ID ace:affiliation_name xsd:string ace:qs50 ace:affiliation_name "qs50"^^xsd:string .
ace:Institute_ID ace:is_part_of_affiliation ace:Affiliation_ID ace:0B0ADEB6 ace:is_part_of_affiliation ace:qs50 .

Some example files are provided to help you use this knowledge graph.

Data Provision Architecture

Our data set is hosted using Apache Jena framework. Apache Jena framework stores the data in TDB database and provides access to linked data via a SPARQL search engine. In addition, it provides Fuseki HTTP service for any Web clients and complete Java API to query linked data.


Files Size Description
acemap.ttl 30K schema
affiliation.ttl 2.2M affiliation information
dump_author.ttl 38G author property
dump_authorname.ttl 4.7G author name and author ID
dump_paper.ttl 16G paper property 1
dump_paperauthoraffiliations.ttl 11G paper_author triplet and author_institute triplet
dump_paperrelation.ttl 14G paper property 2
dump_ref.ttl 22G paper reference information
dump_title.ttl 7.5G paper property 3
field.ttl 21M field information
jc.ttl 2.8M venue information
league.ttl 5.7K affiliation information