Nobal Tech: Tutorial

Showing posts with label Tutorial. Show all posts

Expected Value of a Function

Definition (From WorlframAlpha):
The expectation value of a function

in a variable

is denoted

. For a single discrete variable, it is defined by

where

is the probability density function.

Example:

Source: econ.lse.ac.uk

Note: I've not written this article. Rather I've copied here because I found this tutorial very useful. It makes me clear about the Monte Carlo method. Readers are advised to visit the original website (here) to get the original article's taste.
----------------------------------
How can Monte Carlo be used to calculate value of Pi ?
----------------------------------
We can play Dart game to calculate value of Pi. Consider we have following board for darts:

If you are a very poor dart player, it is easy to imagine throwing darts randomly at figure, and it should be apparent that of the total number of darts that hit within the square, the number of darts that hit the shaded part (circle quadrant) is proportional to the area of that part. In other words,

If you remember your geometry, it's easy to show that

If each dart thrown lands somewhere inside the square, the ratio of "hits" (in the shaded area) to "throws" will be one-fourth the value of pi. If you actually do this experiment, you'll soon realize that it takes a very large number of throws to get a decent value of pi...well over 1,000. To make things easy on ourselves, we can have computers generate random* numbers.

If we say our circle's radius is 1.0, for each throw we can generate two random numbers, an x and a y coordinate, which we can then use to calculate the distance from the origin (0,0) using the Pythagorean theorem. If the distance from the origin is less than or equal to 1.0, it is within the shaded area and counts as a hit. Do this thousands (or millions) of times, and you will wind up with an estimate of the value of pi. How good it is depends on how many iterations (throws) are done, and to a lesser extent on the quality of the random number generator. Simple computer code for a single iteration, or throw, might be:

x=(random#)
 y=(random#)
 dist=sqrt(x^2 + y^2)
 if dist.from.origin (less.than.or.equal.to) 1.0 
  let hits=hits+1.0

Writing LaTeX equations online

Recently I needed to write a lot of math equations for a homework of Inference Theory course. I could have used MSWord or OpenOffice to write the equations. However, I know LaTeX as well. The problem is that I don't want to install it. I found following website which can be used to edit equations using LaTeX command.

LaTeX Online Equation Editor: codecogs.com
Equation Help: Wikipedia

Below is a sample equation I've written using this website.

$f(x) = \begin{cases} 1 & -1 \le x < 0 \\ \frac{1}{2} & x = 0 \\ 1 - x^2 & \text{otherwise} \end{cases}$

Khan Academy: Random Variables and Probability Distribution

People are so nice. They share their knowledge for free. Salman Khan is one such example which runs a organization called the Khan Academy. The Khan Academy is a not-for-profit educational organization created by Salman Khan. With the stated mission "of providing a high quality education to anyone, anywhere", the Academy supplies a free online collection of over 2,000 videos on mathematics, history, finance, physics, chemistry, astronomy, and economics (Source Wikipedia).

I knew this website after I watched the video on Random Variables and Probability Distribution which I'm learning now. Lectures are pretty good. I love such people and such activities !

Support Vector Machine (SVM) - A Practical Guide

Support Vector Machine (SVM) is a very popular classification method. Following is a useful documents if you are new to SVM. Note that recent and good document for the following presentation slides is here.

A Practical Guide to Support Vector Classification

Scanner Class in Java

When I started learning Java around 8 years ago, I had a problem. Because I'd already known C and C++, it was very hard for me to get input from console using Java code. C and C++ need just a line to get input whereas for Java we'd required a lot. Now it's more easier using Scanner class :

Reading From Keyboard :
import java.util.Scanner;

public class ScannerDemo {
public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
//
// Read string input for username
//
System.out.print("Username: ");
String username = scanner.nextLine();
//
// Read string input for password
//
System.out.print("Password: ");
String password = scanner.nextLine();
//
// Read an integer input for another challenge
//
System.out.print("What is 2 + 2: ");
int result = scanner.nextInt();
if (username.equals("admin") && password.equals("secret") && result == 4) {
System.out.println("Welcome to Java Application");
} else {
System.out.println("Invalid username or password, access denied!");
}
}
}

Note: Code is taken from this URL.

Reading From File:
import java.util.Scanner;
import java.io.*;
class HelpFile
{
   public static void main(String[] args) throws IOException
   {
   Scanner scanner = new Scanner(new File("test.txt"));
      while (scanner.hasNextLine())
   System.out.println(scanner.nextLine());
   }
}

Reading From Socket:

Scanner remote = new Scanner(socket.getInputStream());

PrintWriter out = new PrintWriter(socket.getOutputStream(), true);

//read line from keyboard
line = keyboard.nextLine();
//send line to remote node
out.println(line);
//wait for a line from remote node

line = remote.nextLine();

Sharing Folders between Windows7 host and Ubuntu Guest in VirtualBox

I'm using Ubuntu as a guest in Windows7 host using VirtualBox. I needed to share folder between the host and the guest. I searched for the solution and found a good link that explains the process. Click here to go the link. Basically following are the steps to be taken:

1.Share a Folder in Windows7 (Say C:/SharedFolder_Win7) using virtual box's GUI
2. Start or Restart Ubuntu
3. Run following commands :
3.1 sudo mkdir /mnt/SharedFolder
3.2 sudo mount.vboxsf SharedFolder /mnt/SharedFolder_Win7

Linked Data

Linked Data is a method to publish data on the Web and to interlink data between different data sources. Linked Data can be accessed using Semantic Web browsers, just as traditional Web documents are accessed using HTML browsers. However, instead of following document links between HTML pages, Semantic Web browsers enable surfers to navigate between different data sources by following RDF links. RDF links can also be followed by robots or Semantic Web search engines in order to crawl the Semantic Web.

The DBpedia data set is interlinked with various other data sources. The diagram below gives an overview of some of these data sources:

Introduction to SWRL

This slide explains what is SWRL and why we need it. It also shows how we write SWRL rule in Protege. Though the protege 3.2 editor shown in the slide, it can be used similarly in protege 4.0 as well ( View -> Ontology View -> Rules ). Below is a line that explains limitation of OWL and suggests use of rules to in ontology :

In OWL it is not possible to establish that a person is the boss of a secretary, only that a person is a boss.

SWRL Tips:

A SWRL rule contains an antecedent part, which is referred to as the body, and a consequent part, which is referred to as the head
Both the body and head consist of positive conjunctions of atoms. SWRL does not support negated atoms or disjunction. Thus, a SWRL rule may be read as meaning that if all the atoms in the antecedent are true, then the consequent must also be true.
How to write rule like : Prop1(?x,?y) V Prop2(?y,?x) -> Prop3(?y,?x) ?. Answer: As disjunction is not allowed in SWRL, we can break it down into two sub-rules. R1: Prop1(?x,?y) -> Prop3(?y,?x) and R2: Prop2(?y,?x) -> Prop3(?y,?x)

SWRL Tutorial 01

Operations on Ontologies

Source: Ontologies and Semantic Web

Operations on ontology include Merging, Mapping, Alignment, Refinement, Unification, Integration and Inheritance. Not all of these operations can be made for all ontologies. In general, these are very difficult tasks that are in general not solvable automatically -- for example because of undecidability when using very expressive logical languages or because of insufficient specification of an ontology that is not enough to find similarities with another ontology. Because of these reasons these tasks are usually made manually or semi-automatically, where a machine helps to find possible relations between elements from different ontologies, but the final confirmation of the relation is left on human. Human then decides based on natural language description of the ontology elements or decides only based on the natural language names of the ontology elements and common sense.

Ontology Reasoning

Why do we need reasoning in ontology?

Reasoning in ontologies and knowledge bases is one of the reasons why a specification needs to be formal one. By reasoning we mean deriving facts that are not expressed in ontology or in knowledge base explicitly. Reasoners are used to reason the ontology.

Tasks of Ontology Reasoners
A few examples of tasks required from reasoner are as follows.
Satisfiability of a concept - determine whether a description of the concept is not contradictory, i.e., whether an individual can exist that would be instance of the concept.

Subsumption of concepts - determine whether concept C subsumes concept D, i.e., whether description of C is more general than the description of D.

Consistency of ABox with respect to TBox - determine whether individuals in ABox do not violate descriptions and axioms described by TBox.

Check an individual - check whether the individual is an instance of a concept

Retrieval of individuals - find all individuals that are instances of a concept

Realization of an individual - find all concepts which the individual belongs to, especially the most specific ones

OWL Reasoners

A reasoner is a key component for working with OWL ontologies. In fact, virtually all querying of an OWL ontology (and its imports closure) should be done using a reasoner. This is because knowledge in an ontology might not be explicit and a reasoner is required to deduce implicit knowledge so that the correct query results are obtained. The OWL API includes various interfaces for accessing OWL reasoners. In order to access a reasoner via the API a reasoner implementation is needed. There following reasoners (in alphabetical order) provide implementations of the OWL API OWLReasoner interface:

FaCT++.
HermiT
Pellet
RacerPro

Visitor Design Pattern

Visitor Pattern is a type of behavioral design pattern. Wikipedia says: the visitor design pattern is a way of separating analgorithm from an object structure it operates on. A practical result of this separation is the ability to add new operations to existing object structures without modifying those structures.

Example:

Few useful Unix commands

Replace a text in a file by another using perl command.
perl -pi -e 's/nobal/nobal niraula/g' a.xml

This command replaces "nobal" by "nobal niraula" in a.xml. we can do this task in multiple file. e.g. just give *.xml as argument if you want to replace in multiple xml file.

Find line(s) in a file containing a given text
grep "nobal" file.txt

RDF Revisited

Resource
The Resource Description Framework (RDF) is a standard (technically a W3C Recommendation) for describing resources.

Statements
Each arc in an RDF Model is called a statement. Each statement asserts a fact about a resource. A statement has three parts:

the subject is the resource from which the arc leaves
the predicate is the property that labels the arc
the object is the resource or literal pointed to by the arc

A statement is sometimes called a triple, because of its three parts.

RDF Syntax

RDF/XML
N-triple
N3
Turtle
JSON
TRiX

Via: Cell Phones

False Positive vs False Negative

The terms false positive and false negative (along with true positive and true negative) come to us from the world of diagnostic tests. An anti-spam product is like a pregnancy test - it eventually comes down to yes or no.

False positive means the test said the message was spam, when in reality it wasn't.
A false negative means that the test said a message was not spam, when in reality it was.

We often think in terms of error rates, but with many diagnostic tests the kind of error is a big deal. It's not enough to know that the test is wrong 29% of the time. We want to know what kind of wrong. Spam tests are exactly like that. A false positive means that good mail might have gotten lost, while a false negative is just annoying. We care more about false positives than we do about false negatives (unless the CEO is getting inundated with false negatives). In addition to wanting to know how many errors there are, we also want to know what type they are.

Source

NetworkWorld.com

Twitter Language

tweets: Messages in Twitter (max 140 character)
twitter alphabet soup : The Twitter characters with special meaning are: @, d, RT and #:
@: Talk publicly to another person
d: Talk privately to another person
RT: Repeat another person's tweet
#: Tag a message with a label

Google search tips

Searching is almost compulsory to get the job done. One can use Google, Yahoo!, Bing and other search engines to search stuffs in web. Personally, I use Google more often than any other.

The faster one can search things, the more productive he becomes. To find things quickly, we need to know search tips. Here I'm providing some URLs which talk about the tips in searching web using Google.

Actually, I'm not using many of these tips till today... However, I now try to use these tips. Hope I'll be more productive :) !

Tips for using Google Search:

Better Search using Solr and Lucene

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Apache Tomcat.

A good tutorial for beginner: Better Search with Apache Lucene and Solr
Tutorial at Solr HomePage
Slides:

Apache Solr

Lucene revisited

Lucene is an open-source full-text search library which makes it easy to add search functionality to an application or website. Want to understand Lucene in 5 minutes ? Go here. The following slide provides a quick review of Lucene.

Figure: Steps in building applications using Lucene [Source: IBM ]

Lucene Introduction

Why Lucene ? From this DOC.

Incremental versus batch indexing
Data sources
Indexing Control
File Format
Content Tagging
Stop Word Processing
Stemming
Query Features
Concurrency
Non-English Support

Go through this document that presents the fundamental concept of Lucent e.g. Index, Document, Field, Term, Segment and Query Term. I recommend to read that for the beginners.

Searching and Indexing

Lucene is able to achieve fast search responses because, instead of rearching the text directly, it searches an index instead. This would be the equivalent of retrieving pages in a book related to a keyword by searching the index at the back of a book, as opposed to searching the words in each page of the book.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).

Lucene's Drawback and Nutch

Lucene provides a powerful indexing and search library which may be used as a base for online search engines, however on its own the library doesn't include any form of web crawling or HTML parsing abilities. These features are necessary in order to create a fully functional online search engine. Several projects have modified Lucene with the intent of adding this missing functionality. One of the most notable of these efforts is Nutch, a SourceForge.net project.

More Resources:

Nobal Tech