Talk:Extraction
From AIRWiki
Revision as of 10:37, 30 July 2009 by DavideEynard (Talk | contribs) (New page: = Useful material = == Documentation == * [http://docs.google.com/Doc?docid=0Ac5SBJf9Fj2UZGR4NmtkcmpfMThmajczazdjeg&hl=en An early project description from Noustat] * [http://docs.google...)
Contents
Useful material
Documentation
- An early project description from Noustat
- The shared project document (work always in progress)
- A collection of basic tutorials on PCA and Correspondence Analysis
- A collection of papers from Murtagh
- A collection of papers about ontology learning from text
Suggestions:
- the first document is a summary of the paper from Murtagh whose file is called "auto_onto_2.pdf" and gives you an idea about the whole project
- the second document has some hints about how we decided to proceed for the development of the software
- the third collection of documents is very useful then to understand the basics of what you are dealing with, in particular Correspondence Analysis and PCA
- the rest might be useful for the State of the Art of your thesis - not fundamental to understand the project but very useful to get related ideas about how to improve Murtagh's work to make something new for your thesis.
Programming libraries used in the prototype
- Colt - Java matrix library
- Jena - A Semantic Web Framework for Java
- Lucene - Text indexing and Search Engine
Source code
- You can find the last public version of the source code of Extraction here: http://davide.eynard.it/svn/noustat/trunk (login: noustat, pass: t4t5u0n - the project folder is "noustat")
Todo list
- create the project page, following the template you can find in an already "semantified" project page (for instance, http://airwiki.elet.polimi.it/mediawiki/index.php/Wikipedia_Social_Network). More info about how to create a new project page can be found at http://airwiki.elet.polimi.it/mediawiki/index.php/Projects_-_HOWTO
- organize the "useful material" section and publish it in the "Related resources" section on the project page (this can be postponed right now, after we talk about what can be made public)
- when a timeline is defined with deliverables, publish the timeline here and make the deliverables available here whenever a deadline is met
- when you complete an action from this todo list, delete it :)