Nobal Tech: Tools

Showing posts with label Tools. Show all posts

Generating Java objects from XML files

I found a great tool called XStream that can generate Java objects directly from the XML. Previously, I used XMLBeans but this requires the XMLSchema to be generated. Moreover, installation process is bit hard. But with XStream, its pretty easy to create java objects from XMLs and getting XMLs from objects. Here I provide complete example, which is the extension of the code given in XStream's official site.

Requirements:
Three Files: XstreamTest.java , Person.java, PhoneNumber.java
Library: xstream-1.4.3.jar
JDK: 1.6xxx

1. XstreamTest.java
import com.thoughtworks.xstream.XStream;
import com.thoughtworks.xstream.io.xml.StaxDriver;

public class XstreamTest {
    /**
    * @param args
    */
    public static void main(String[] args) {
        Person joe = new Person("Joe", "Walnes");
        joe.setPhone(new PhoneNumber(123, "1234-456"));
        joe.setFax(new PhoneNumber(123, "9999-999"));
        XStream xstream = new XStream(new StaxDriver());
        String xml = xstream.toXML(joe);

        System.out.println(xml);
        Person p1 = (Person)xstream.fromXML(xml);
        System.out.println("Phone no:"+p1.getPhone().getNumber());
    }

}

2. Person.java
public class Person {
    private String firstname;
    private String lastname;
    private PhoneNumber phone;
    private PhoneNumber fax;
        public Person(String fname,String lname){
            this.firstname=fname;
            this.lastname=lname;
        }
        public String getFirstname() {
            return firstname;
        }
        public void setFirstname(String firstname) {
            this.firstname = firstname;
        }
        public String getLastname() {
            return lastname;
        }
        public void setLastname(String lastname) {
            this.lastname = lastname;
        }
        public PhoneNumber getPhone() {
            return phone;
        }
        public void setPhone(PhoneNumber phone) {
            this.phone = phone;
        }
        public PhoneNumber getFax() {
            return fax;
        }
        public void setFax(PhoneNumber fax) {
            this.fax = fax;
        }
}

3. PhoneNumber.java
public class PhoneNumber {
    private int code;
    private String number

    public PhoneNumber(int i, String string) {
        this.code=i;
        this.number=string;
    }
    public int getCode() {
        return code;
    }

    public void setCode(int code) {
        this.code = code;
    }

    public String getNumber() {
        return number;
    }

    public void setNumber(String number) {
        this.number = number;
    }

}

OUTPUT:
JoeWalnes1231234-4561239999-999
Phone no:1234-456

Generate Java code using XML Schema

Many tools are available regarding XMLBeans: XMLBeans Tools

Here I only copied description of scomp which compiles schema to JAVA code. I already described how to get xsd from xml file here.

Generate Java code from XSD

If you want to get right to it with your own XML schema and instance, follow these basic steps:

Install XMLBeans.
Compile your schema. Use scomp to compile the schema, generating and jarring Java types. For example, to create a employeeschema.jar from an employeesschema.xsd file:
```
scomp -out employeeschema.jar employeeschema.xsd
```

Write code. With the generated JAR on your classpath, write code to bind an XML instance to the Java types representing your schema. Here's an example that would use types generated from an employees schema:

File xmlFile = new File("c:\employees.xml"); 

// Bind the instance to the generated XMLBeans types.
EmployeesDocument empDoc = 
 EmployeesDocument.Factory.parse(xmlFile); 

// Get and print pieces of the XML instance.
Employees emps = empDoc.getEmployees(); 
Employee[] empArray = emps.getEmployeeArray(); 
for (int i = 0; i < empArray.length; i++) 
{ 
 System.out.println(empArray[i]); 
}

Read a tutorial.

Read their tutorial to get a sense of XMLBeans basics.

***

Writing LaTeX equations online

Recently I needed to write a lot of math equations for a homework of Inference Theory course. I could have used MSWord or OpenOffice to write the equations. However, I know LaTeX as well. The problem is that I don't want to install it. I found following website which can be used to edit equations using LaTeX command.

LaTeX Online Equation Editor: codecogs.com
Equation Help: Wikipedia

Below is a sample equation I've written using this website.

$f(x) = \begin{cases} 1 & -1 \le x < 0 \\ \frac{1}{2} & x = 0 \\ 1 - x^2 & \text{otherwise} \end{cases}$

Generate XSD from XML

Sometimes we need to generate XML-Schema (e.g. XSD) from a given XML document. There are many tools but the one I use is called trang. Here are simple steps to generate xsd of a xml file (input.xml):

1. Download and Unzip trang-20030619.zip
2. Go to trang-20030619 folder and use following command:
java -jar trang.jar -I xml -O xsd input.xml output.xsd

Searching Assertion using TextRunner

TextRunner searches hundreds of millions of assertions extracted from 500 million high-quality Web pages.
Links:

Paper
Demo

Weka and R Tutorial

World is amazing because of web. One can solve problems based on other's approach by looking them in web. One can teach what he knows to others by releasing article, audio or video lectures in web. Here I've found a very interesting site, SentimentMining.net, that is very useful for people who are interested in data mining or sentiment mining and statistical analysis. I particularly liked weka video tutorial in the site. It also has tutorials for R but I've not explored that much.

Twitter Language

tweets: Messages in Twitter (max 140 character)
twitter alphabet soup : The Twitter characters with special meaning are: @, d, RT and #:
@: Talk publicly to another person
d: Talk privately to another person
RT: Repeat another person's tweet
#: Tag a message with a label

Better Search using Solr and Lucene

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Apache Tomcat.

A good tutorial for beginner: Better Search with Apache Lucene and Solr
Tutorial at Solr HomePage
Slides:

Apache Solr

Lucene revisited

Lucene is an open-source full-text search library which makes it easy to add search functionality to an application or website. Want to understand Lucene in 5 minutes ? Go here. The following slide provides a quick review of Lucene.

Figure: Steps in building applications using Lucene [Source: IBM ]

Lucene Introduction

Why Lucene ? From this DOC.

Incremental versus batch indexing
Data sources
Indexing Control
File Format
Content Tagging
Stop Word Processing
Stemming
Query Features
Concurrency
Non-English Support

Go through this document that presents the fundamental concept of Lucent e.g. Index, Document, Field, Term, Segment and Query Term. I recommend to read that for the beginners.

Searching and Indexing

Lucene is able to achieve fast search responses because, instead of rearching the text directly, it searches an index instead. This would be the equivalent of retrieving pages in a book related to a keyword by searching the index at the back of a book, as opposed to searching the words in each page of the book.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).

Lucene's Drawback and Nutch

Lucene provides a powerful indexing and search library which may be used as a base for online search engines, however on its own the library doesn't include any form of web crawling or HTML parsing abilities. These features are necessary in order to create a fully functional online search engine. Several projects have modified Lucene with the intent of adding this missing functionality. One of the most notable of these efforts is Nutch, a SourceForge.net project.

More Resources:

TPTP - A Java Profiling Tool

In software engineering, program profiling, software profiling or simply profiling, a form of dynamic program analysis (as opposed to static code analysis), is the investigation of a program's behavior using information gathered as the program executes. The usual purpose of this analysis is to determine which sections of a program to optimize - to increase its overall speed, decrease its memory requirement or sometimes both.

The set of profiling tools provides software developers or testers with the ability to analyze the performance of a Java program or to gain a comprehensive understanding of the overall performance of an application. Eclipse Test and Performance Tools Platform (TPTP) is such a tool used for profiling. A good tutorial is here: Tutorial.

WEKA - Data Mining Software in Java

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

HtmlCleaner - A HTML parser in JAVA

I wanted to parse webpages of a website. At first I looked at the pages' design and guessed that the HTML pages were well-formed. However, DOM parser couldn't parse the pages and informed the pages were not well-formed. More closer look revealed that some of the tags were not closed.

My next step was to search tools that facilitate parsing of HTML pages using JAVA. I found that a number of HTML Parsers are available to do so. Among them I chose HtmlCleaner, a tool that can CLEAN HTML web pages and can give us the DOM document. Since the pages contains Nepali characters, I must use UTF-8 encoding. Fortunately, HtmlCleaner has that capacity.

The website of HtmlCleaner doesn't show a complete sample example. However, a user has posted a sample program ( given in this URL ) that really helped me to start HTML parsing.

KompoZer - Dreamweaver like tool

I was looking for an open source tool that works similar to dreamweaver. I found KompoZer. Its official website says :

KompoZer is a complete web authoring system that combines web file management and easy-to-use WYSIWYG web page editing. KompoZer is designed to be extremely easy to use, making it ideal for non-technical computer users who want to create an attractive, professional-looking web site without needing to know HTML or web coding.

Though I haven't used it much, my first encounter with this tool was impressive.

Using Multiple Search Engines

I just found a useful website. It is really useful because it saves user's time by presenting the results of a query from two different search engines. In other words, with same input effort, one can get results from two search engines. As an example, if I want to search NEPAL in Google, I would be happy if I can get the results for NEPAL from other search engine e.g. Yahoo, on side-by-side. This would be much interesting if your screen is big enough.

Happy browsing !!

Using TreeTagger

I recently used TreeTagger to get the part-of-speech (POS) of English and French texts. As mentioned in its original website [1], TreeTagger is a language independent tool used for annotating text with part-of-speech and lemma information.

Installation is pretty easy. One just needs to follow the instructions given in the website. After the installation it will tell you something like :

You should add /home/nobal/TreeTagger/cmd and /home/nobal/TreeTagger/bin to the command search path.

And here is how you can add path (in ubuntu) :

sudo gedit /etc/bash.bashrc
at the end of file

PATH=$PATH:~/TreeTagger/bin:~/TreeTagger/cmd
export PATH

Don't forget to restart the terminal to get the effects. To verify, use this command:

echo $PATH

External Links:
[1]. TreeTagger

Ontology Search Engines

I was looking for an ontology. I found that there exist many ontology search engines. Beauty of these search engines is that they only search ontologies unlike Google, and Yahoo which are general web search engines.

Why do we need ontology search engines?

They help to find the suitable ontologies for given user requirements so that reuse of knowledge bases can be made.

Examples

SWOOGLE: An ontology search engine.
SCARLET: Discovering relations between two concepts.
WATSON: Search Ontology and Semantically Related Documents

Nobal Tech