Random Vectors http://randomvectors.com I hope some people will find it to their advantage to decipher all this mess. Évariste Galois Mon, 26 Oct 2015 15:45:17 +0000 en-US hourly 1 Upgrading from XP to Windows 7 http://randomvectors.com/blog/2011/05/13/upgrading-from-xp-to-windows-7/ http://randomvectors.com/blog/2011/05/13/upgrading-from-xp-to-windows-7/#comments Fri, 13 May 2011 15:02:55 +0000 http://randomvectors.com/?p=278 Well as you may notice there is no clean way to get from Windows XP directly to Windows 7. However, the Windows XP -> Vista -> Windows 7 route is workable. As noted elsewhere you can perform this two step upgrade without having to active Vista thus giving you the option not to buy Vista ourright. As long as you have a friend or source for a full or upgrade version of Vista with key.



http://randomvectors.com/blog/2011/05/13/upgrading-from-xp-to-windows-7/feed/ 0
Audio Analysis with WaveSurfer http://randomvectors.com/blog/2011/03/07/audio-analysis-with-wavesurfer/ http://randomvectors.com/blog/2011/03/07/audio-analysis-with-wavesurfer/#comments Mon, 07 Mar 2011 19:49:24 +0000 http://randomvectors.com/?p=216 Wave Surfer is under active development again and I am delighted.


If you want a complete tool kit for analysis of audio this is it. The system runs on Windows, OSX and  Linux what more could you ask for? Precompiled binaries are available as well as source code.

Wave Surfer Sample Screen

Sample Screen

http://randomvectors.com/blog/2011/03/07/audio-analysis-with-wavesurfer/feed/ 0
Latent Semantic Indexing http://randomvectors.com/blog/2011/03/07/latent-semantic-indexing/ http://randomvectors.com/blog/2011/03/07/latent-semantic-indexing/#comments Mon, 07 Mar 2011 19:18:56 +0000 http://randomvectors.com/?p=205 Here is a toy program that demonstrates LSI using the SVD contained within the COLT linear algebra library. The method is taken from here.

import cern.colt.matrix.linalg.*;
import cern.colt.matrix.*;
import cern.colt.matrix.impl.*;
import java.util.*;
import java.io.*;

public class SVDText
 public static void main (String args[])
       DenseDoubleMatrix2D source = new DenseDoubleMatrix2D(3,11);

       Scanner sc = new Scanner(new File("input.txt"));
       for (int row=0;row<11;row++)
          for (int col=0;col<3;col++)
             float value = sc.nextFloat();

       DenseDoubleMatrix2D query = new DenseDoubleMatrix2D(1,11);

       sc = new Scanner(new File("query.txt"));
       for (int col=0;col<11;col++)
          long value = sc.nextLong();

       Algebra alg = new Algebra();

       SingularValueDecomposition svd = new SingularValueDecomposition(source);

       // reduce rank
       DoubleMatrix2D reducedU = alg.subMatrix(alg.transpose(svd.getU()),0,1,0,10);
       DoubleMatrix2D reducedS = alg.subMatrix(alg.transpose(svd.getS()),0,1,0,1);
       DoubleMatrix2D reducedV = alg.subMatrix(alg.transpose(svd.getV()),0,1,0,2);

       DoubleMatrix2D reducedVt = alg.transpose(reducedV);

       DoubleMatrix2D inverseS = alg.pow(reducedS,-1);

       DoubleMatrix2D q1 = alg.mult(inverseS,reducedU);
       System.out.println("q1 = " + q1);
       DoubleMatrix1D queryVector = alg.mult(q1,alg.transpose(query)).viewRow(0);

       System.out.println("query vector " + queryVector);
       DoubleMatrix1D d1 = alg.subMatrix(reducedVt,0,0,0,1).viewColumn(0);
       System.out.println("d1 = " + d1);

       DoubleMatrix1D d2 = alg.subMatrix(reducedVt,1,1,0,1).viewColumn(0);
       System.out.println("d2 = " + d2);

       DoubleMatrix1D d3 = alg.subMatrix(reducedVt,2,2,0,1).viewColumn(0);

       System.out.println("Doc 1 measure = " + queryVector.zDotProduct(d1) / ((alg.norm1(queryVector)*alg.norm1(d1))));

       System.out.println("Doc 2 measure = " + queryVector.zDotProduct(d2) / ((alg.norm1(queryVector)*alg.norm1(d2))));

       System.out.println("Doc 3 measure = " + queryVector.zDotProduct(d3) / ((alg.norm1(queryVector)*alg.norm1(d3))));
   catch (Exception e)
http://randomvectors.com/blog/2011/03/07/latent-semantic-indexing/feed/ 0
Curve Fitting via SVD http://randomvectors.com/blog/2011/03/07/curve-fitting-via-svd/ http://randomvectors.com/blog/2011/03/07/curve-fitting-via-svd/#comments Mon, 07 Mar 2011 19:02:43 +0000 http://randomvectors.com/?p=121 I used this method to delete a gradient effect from uneven lighting of some of my book scans.

Here is a simple method for curve fitting various polynomials. This works for non linear equations as long as the actual coeficients are linear.

For example you can NOT fit the following type of function:


Note that the coeficients a_2 is an argument to a nonlinear function. related link

The specific one done here is a bicubic polynomial expressed by the following.

We are going to fit the following curve:

P(x,y) ={a_00+a_01y+a_02y^2+a_03y^3+a_10x+a_11xy+a_12xy^2+a_13xy^3+a_20x^2+a_21x^2y+a_22x^2y^2+a_23x^2y^3+a_30x^3+a_31x^3y+a_32x^3y^2+a_33x^3y^3}

Or more compactly:


 import cern.colt.matrix.linalg.*;
import cern.colt.matrix.*;
import cern.colt.matrix.impl.*;

import edu.umbc.cs.maple.utils.*;

 public DoubleMatrix2D calcFit(ArrayList<Point> measures)
    int cols = 16; // number of coeficients
    int rows = measures.size();  // size of your dataset

    DenseDoubleMatrix2D M = new DenseDoubleMatrix2D(rows,cols);
    DenseDoubleMatrix2D P = new DenseDoubleMatrix2D(rows,1);

    for (int k=0;k<measures.size();k++)
       Point measure = measures.get(k);  // measures from your dataset
       P.set(k,0,measure.getValue());    // Point contains values P , x and y

       double x = measure.getX();
       double y = measure.getY();

       // populate the matrix with the appropriate polynomial values
       // from the bicubic above
       int col=0;
       for (int i=0;i<4;i++)
         for (int j=0;j<4;j++)
            double cal = pow(x,i)*pow(y,j);

    // solve SVD
    SingularValueDecomposition svd = new SingularValueDecomposition(M);
    DoubleMatrix2D U = svd.getU();
    DoubleMatrix2D V = svd.getV();
    DoubleMatrix2D S = svd.getS();

    // S is the diagonal matrix
    // Invert by replacing its diagonal elements by their reciprocals
    for (int i=0;i<16;i++)
       double val = 1.0/S.get(i,i);

       if (S.get(i,i) == 0)  // avoid numeric underflow
         val = 0.0;

       S.set(i,i, val);

    Algebra alg = new Algebra();

    // X pseudoinverse
    DoubleMatrix2D VS = alg.mult(V,S);

    DoubleMatrix2D Minv = alg.mult(VS,alg.transpose(U));

    // matrix has coeficients
    DoubleMatrix2D C = alg.mult(Minv, P);

   // If you would like the closeness of the fit use the following

   DoubleMatrix2D eP = alg.mult(M,C); // estimated values

   ColtUtils utils = new ColtUtils();
   DoubleMatrix2D dP = utils.minus(P,eP); // difference between estimated and observed values
  double sum = utils.dotproduct(utils.getcol(dP,0),utils.getcol(dP,0));

  System.out.println("sum = " + sum);  // how close was our fit?


In practice you may need to normalize your measurements if they are large to avoid numeric under and/or over flow.

http://randomvectors.com/blog/2011/03/07/curve-fitting-via-svd/feed/ 0
Itchy Ear Syndrome http://randomvectors.com/blog/2011/03/04/ichy-ear-syndrome/ http://randomvectors.com/blog/2011/03/04/ichy-ear-syndrome/#comments Fri, 04 Mar 2011 21:41:18 +0000 http://randomvectors.com/?p=112 I and my family are afflicted with itchy ears. I’m not talking about occasional dryness I’m talking about everyday itchiness that can drive you to distraction.

Here are my findings on controlling this condition.

There are two major ways to reduce itchiness:

1. Coverage: I use a product called Lansinoh. This is actually found in the maternity section of your local drug or big box stores. This product is 100% Lanolin and very effective for reducing that itchy feeling.

2. Reduction of the histamine response. Daily ingestion of Clairiton or occasional children’s Benadryl prior to going to sleep will greatly reduce this problem. Now that generic Allegra is on the market this is also a good choice I find it more effective then Clairiton but more expensive.

Occasionally your constant ear mining will cause tissue damage usually minor abrasions of the ear canal. This can lead to what is called “gunk ear”. Typically a build of of fluid (not blood but possibly other liquid from the capillaries lining this area). I find simple vinegar or in some extreme cases isopropyl alcohol will reduce this symptom. You can purchase a product for simmers ear (simply alcohol in an easy to use dropper) which works great for short term use. Overuse of alcohol will dry out your ear leading to sever dryness and scaring.

You may also get a prescription for a fluorinated topical steroid for short term use. Long term use of a steroid can cause permanent tissue damage. If the inching is bad enough I think a perscription can be justified.

There is also a prescription called Ciprofloxacin-hydrocortisone otic which can be effective for extreme cases. Although it is VERY expensive, almost $200 per 10ml bottle! A cheaper alternative is Neomycin and Polymyxin B sulfate and Hydrocortisone Otic for about $15 I find this just as effective.

This has been a life long issue for me and others and I’ll post this in case others are looking for help.

A great link is also included below:


http://randomvectors.com/blog/2011/03/04/ichy-ear-syndrome/feed/ 1
Dell outlet http://randomvectors.com/blog/2011/03/04/dell-outlet-the-best-value-in-computing/ http://randomvectors.com/blog/2011/03/04/dell-outlet-the-best-value-in-computing/#comments Fri, 04 Mar 2011 21:09:22 +0000 http://randomvectors.com/?p=65 If you haven’t checked out the Dell outlet I would recommend that you do. For laptops you can get a slightly older technology laptop for around $300. A nice deal for what is quickly becoming a disposable item for some people. After one purchase you will start receiving coupons for 15-25% off.

My personal favorite is the Inspiron 15-1545 for  $379

The real values are the larger storage devices and servers. In many cases these are effectively 1K to half price. And well worth a look if you are in the market for a low cost dedicated web server or compute server. The other large advantage is that the server is already built and ready to ship substantially reducing the time that you need to wait to get your equipment.

You get the same warranty as a new computer which makes it even more attractive. And you can call yourself green because you recycled some older equipment.

http://randomvectors.com/blog/2011/03/04/dell-outlet-the-best-value-in-computing/feed/ 0
Dailing in the Atiz DIY Scanner http://randomvectors.com/blog/2011/03/04/dailing-in-the-atiz-diy-scanner/ http://randomvectors.com/blog/2011/03/04/dailing-in-the-atiz-diy-scanner/#comments Fri, 04 Mar 2011 16:32:14 +0000 http://randomvectors.com/?p=49 My modifications to the Atiz DIY scanner to optimize image quality and speed.

As a rabid scanner and collector of books, it quickly became obvious that a flatbed scanner was not going to be cost effective. Reviewing the options, I decided upon the Atiz DIY Scanner. Two features where important to me. Speed the Atiz uses Canon DSLR cameras to quickly capture the pages of the book and this naturally leads to the next feature which is that the Atiz is also Non Destructive.  This means that you do not have to cut the binding from the book. This was especially important due to my interest in books that where out of copywrite.

Once received building the scanner itself took about an hour. Well time to plug it in and start scanning. Well if it was only that easy.

Choice of lens

The recommended lens setup is the Canon 50mm fixed focus lens. This is probably my favorite photography lens due to its rather large aperture (light gathering capabilities) and price. In some cases less then $50. Typically $80 retail. A great lens just not great for my choice of camera. I chose the Canon Xsi which at the time was the highest resolution camera from Canon available at 12 mega pixels. This equated to just less than 300 dpi for the documents I was scanning. Perfect for OCR. Well the CMOS sensor on the low consumer grade cameras (i.e. the ones less then $1000) is not a full 35mm sensor but a truncated dimension. This turned my 50mm lens into an (50mm x 1.6=) 80mm lens. There simply is not enough travel on the camera mounts to account for this conversion factor without cropping the document pages. I could indeed get the entire page on the scanner but I had to move the camera so far back that the actual page was only a fraction of the usable image space. Effectively reducing my DPI measure, which was not the objective. So off to choose a different lens. At this point, my total cost for the system is about $10,000 so price is an object at this point I ended up choosing the Tamron 55-200mm zoom lens. With this setup, I can move the camera far enough back and zoom onto the page to capture the page and fill in the entire usable image space. There is some pin holing on the images but most modern OCR system can handle this. This use of the zoom lens actually makes the book setup much faster then with a fixed lens and I can optically magnify books that are smaller formats. (In some cases, I have found valuable dictionaries that are less then 5 inches tall)

Adjusting the self-centering mechanism.

After switching out the lenses, I found that the self-centering mechanism was not functioning properly. As you move through the book, the spine should self-center saving you readjustment of the cameras. This needed to be raised a bit to avoid binding on the lower support beam. This is accomplished by loosening the bolts under the bottom book support.

Self Centering Enlarged

Centering the cameras

The camera mounts are not self-centering. Therefore, you need to adjust the angle using a long yardstick or other measuring device. I simply lay it across the book mounts and take photos until the center measurement is in the center of the captured image and both images from my right and left camera appear at the same scale. I would have preferred to use a level but the Canon cameras do not have a level back plane on which to mount the level.

Overhead lighting

Capturing of books goes smoothly until I find an older book, which has the text rather close to the spine. It became obvious that I have some major glare issues where the seam of the clear scanner plate is bent at at 45 degree angle. This creates a concave surface that is guaranteed to shine the overheard light right into the camera. At first, I assume I can simply polarize the light source and put a linear polarizer on the camera. This does not work in practice, as the auto focus mechanism will fail to work if you use a polarizing filter. So removing the lighting from the top of the scanner was required. At first, I simply mounted a few home depot florescent lights on the wall that the scanner was located. This worked very well and eliminated the glare issue. Once I performed more OCR the gradient of the lighting across the page while not visible to the human eye was creating artifacts for the OCR system. I then mounted linear LED lights on both sides of the scanner just above my head and replaced the florescent lights on the far wall. [photo] This works well but can be dark for color photography. Luckily, I am scanning only books with text in them.

Offset Lights

LED Lighting Closeup

LED Lighting Closeup


LED Mounting Method

LED Mounting Method

Wood Clamp Stablizer

Wood Clamp used to Stablize LEDs

Plastic Clamps Closeup

Plastic Clamps Closeup

Stick Holding LED lighting Clamps to Shade

Stick Holding LED lighting Clamps to Shade



Protecting your Camera

To get the correct angle on the cameras I resorted to placing a few business cards at the bottom of the camera where it pressed against the camera mount. This allows me to adjust slighting the angle of the camera. This worked out great until I loosened on of the cameras too much and it dropped on the platen. Luckily the camera and platen where unharmed. Now I have some old shoelaces holding the camera up in addition to the camera mount. Stupid looking but better safe then sorry. [photo]

Extending the shroud

I extended the original shroud to the tabletop to avoid glare from the white colored wall at the far end of the scanner.

Renfrew tape is your friend

Some of the wires I used where white for the LED lighting. Simple renfrew tape available from any sporting store can easily recolor the black.

Software tweaks

The DIY software is a bit buggy but usable. However, in some cases they get out of sync with the client capture system. In this case, it is an easy matter to copy the Canon SDK (.dll files) to the home directory of the client software to allow control of new cameras.

I have had issues with memory leaks and race conditions that forced me to restart the capture software to scan large books. Some of these errors are random and are perhaps actually in the Canon SDK themselves. This does not have much of an effect on my actual processing of documents other then it is annoying.

In the future, I will probably be writing my own capture software to avoid these issues. I already have an imagej script to post process images.

Would I recommend you buy one of these? I am not sure. For the money, it is effective. Like all things automated, it is a bit finicky and less then you would expect for the price. However, given the alternatives for me it is worth it.

http://randomvectors.com/blog/2011/03/04/dailing-in-the-atiz-diy-scanner/feed/ 3
Liquid Web http://randomvectors.com/blog/2011/03/04/liquidweb-heroic-support-for-the-most-part/ http://randomvectors.com/blog/2011/03/04/liquidweb-heroic-support-for-the-most-part/#comments Fri, 04 Mar 2011 15:46:37 +0000 http://randomvectors.com/?p=72 Liquid web is my new favorite web hosting company.

After using some of the big names in hosting including Verio and Dell, I feel I am pretty informed about what to expect from a hosting company. Moreover, after being hacked to the point of feeling violated and foolish I have come to respect the value of a support team that takes multi level security to heart.

To be fair none of my other hosting companies was responsible for my hacking issues. I would blame those on wordpress, joomula and Zen cart. On the other hand, perhaps the fact that you need to update these packages and follow the instructions to the letter to avoid issues. That said it would be cost prohibitive to rewrite these packages and perhaps in the short run even more insecure.

The bottom line is that without multiple levels of protection any software package is a target for hackers.

Proactive Support

Liquid web not only monitors the usual port 80 and other services but profiles CPU and memory usage. This was brought home to me when they sent me a message that my apache server was using more memory then it should be. The reason was that my site was a low bandwidth site and the server was spawning off too many children to service only a few web page requests.

How many hosting companies do you know that would send you a message like the one below?

<strong><em>Our Sonar Server Health monitoring alerted me to high load and/or memory use</em></strong>
<strong><em> on your server today. We were able to log in and prevent the server from</em></strong>
<strong><em> crashing and requiring a reboot. The following actions were taken to prevent</em></strong>
<strong><em> the crash:</em></strong>
<strong><em> </em></strong>
<strong><em> enabled apache's piped logging to address a memory leak issue as described</em></strong>
<strong><em> here: <a href="http://kb.liquidweb.com/how-and-why-enabling-apaches-piped-logging/" target="_blank">http://kb.liquidweb.com/how-and-why-enabling-apaches-piped-logging/</a></em></strong>

Bottom line we found a problem and fixed it without having to bother you. Now that is good business!

Responsive Customer support

I have found their support to be prompt and easily within their advertised response time of 30 minutes. I have had them reset my IP address when the firewall has automatically blocked me and had then install PHP packages with a minimum of fuss. I would have to say they are the most responsive in this regard.

I have asked their first line of support some tough questions about configuration and the like and can tell that they also have supervisors monitoring the communications because sometime they will enter the email stream to provide clarification on information I am requesting.

In addition, when they fix something they usually tell me what files they changed or what scripts they ran. This creates a smarter end users and empowers me to fix issues myself if need be.

Excellent Security (for the most part)

The good:

They have a very good security posture provided by mod_apache for the most part but also use a system configuration that is accredited for credit cards transactions. This provides a baseline for you to work with.

Here is an example of something I did not expect from them regards a scan of my current server modifications.

<strong><em>Note: If this is the first time you received this mail, it contains the history for the entire month so far.

 Below are the recently upload scripts that contain code to send email.  You may wish to inspect them to ensure they are not sending out SPAM.

</em></strong>In this case, the change was known but this would be an excellent tip  off if there were hacker activities that I did not know about.

The bad:

The only real issue I had with them are some strange default settings for the cpanel.

For my virtual private server anonymous ftp was turned on by default! Whoops I would not have expected that. For some reason the site name was blocked but the IP address was not (this is handled at the WMH panel level ). Which leads to a confusing sense of security that you actually turned it off in the cpanel.
That and the sometimes over aggressive SQL injection filtering are the only real annoyances. When editing this article the mod_apache filter broke my request because I had an entry like:

There are nice things to s elect f rom. (that took a bit of investigating)

http://randomvectors.com/blog/2011/03/04/liquidweb-heroic-support-for-the-most-part/feed/ 2
NIST challenge http://randomvectors.com/blog/2011/02/15/the-nist-challenges/ http://randomvectors.com/blog/2011/02/15/the-nist-challenges/#comments Tue, 15 Feb 2011 16:45:53 +0000 http://randomvectors.com/?p=28 Can an amateur compete in the NIST challenges?

Absolutely! And have fun doing it as well.

The National Institute of Standards and Technology (NIST) sponsors a large number of ongoing technical challenges. Currently this is the domain of think tanks and colleges. This is my encouragement to those that are not.

Can a person with limited resources really compete with success against these giants? Yes and here is why.

  • The need to get published limits options
  • Over investment in one approach or theory
  • Lack of resources

In many cases the need to get published can lead to a not invented here syndrome. Theories and algorithms that are not brand new tend not to get published. Even though an established or classical approach can be superior to what is already out there. This almost brutal need to find the next new thing can leave untapped gems laying at your feet ready to be utilized. So you don’t have to come up with a fancy new idea or approach survey the literature and resurrect a promising approach.

In many cases researchers in a single field tend to stay in their field of research. This is tends to generate a rather limited view of other things going on in other areas. Also a university may be pursuing one area of resource with an established code base and ready resources. They need to prove the value of these resources and thus are more inclined to lean on them in repeated attempts to improve their results.  Since you are not a “professional” in this field you can apply a wide variety of approaches to the problem you have no preconceived notions or emotional investment in prior work. In this case knowing or having too much experience is a hindrance.

Also you have one resources most of these folks do not have.  You have time. In many cases students and companies have alloted a limited amount of time to work on these challenge problems. This provides you a distinct advantage to play with ideas and more importantly fail more often then the established competitors. And if you are anything like me you can spend a little more money on that new computer you’ve been looking at. And for the most part that may be my actual motivation for competing. (Just don’t tell my wife)

I’m not saying anyone can compete. You do have to have some basic skills. I am saying if you have some basic knowledge you can in fact compete in these challenges. I have found the NIST personnel to be professional and accommodating. After all they already fund a large body of basic resource. They don’t need results on things they are already funding. They are looking for that new disruptive approach that you may develop.

Ok what did you really do?

It is easy to spout off on what someone can do.  Here is my practical experience with the NIST folks.

You can find current challenges at the NIST.GOV site by simply searching for challenge or benchmark.
The challenges I gravitate to can be found here:


I have a background in Natural Language Processing so this challenge was not too much of a stretch. But keep in mind I am not part of a university and do not have PhD.  My practical experience is in Machine Translation and the integration of these technologies. Does that make me an expert of renowned authority? Not by a long shot.

The competition I signed up for was the Machine Translation Metrics Challenge.

In essence you need to write a program to rate the output of a machine translation system. How well does Google translate? Is essentially the question you are trying to answer.

How hard was it? Actually the schedule was pretty short and I didn’t get everything done that I would have liked. I did in essence research and formulate an approach, generated new data and developed the code base within the time frame provided.  In reality much more work then a university would need to do as they probably already have code available or at least some framework to work within. So it is doable even starting from nothing.

The one lesson I would take away from this was that in their first challenge I provided an automated installation program. They promptly ran into security issues with my installation program. I also provided a Graphical User Interface (GUI) for the system. Again this did nothing for them as they wanted to automate a large amount of data processing to test my system. I ended up simply sending them a single java jar file that they could invoke on the command line. So my initial lesson was don’t try to do too much to make you system fancy. Create a stand alone binary that can be invoked on the command line. This will save you and the evaluators time in the long run. Think research quality not commercial quality.

After some back and forth with the evaluators on how to setup the program they were finally able to run my data through the test set.

Oh crap I found a bug after reviewing the results. I notified the evaluators and sent them a new copy. They were able to rerun the  results and didn’t even give me a hard time about wasting their valuable time. I think this may happen pretty regularly :)

So after the evaluation results are done how did I do.

My Metric name is Badger. (An inside joke with some of my old colleges)

2008 – After fully expecting to be in the middle of the pack after reviewing the sample data, I was pretty depressed to find my performance was almost at the bottom of the rankings.


2010- Much better now I have a place in the middle of the pack and I actually scored the highest in one data set!


But the most impressive thing for me was the fact that I was actually cited for my 2008 work by a team competing in the 2010 competition. The NCD (Normalized Compression Distance) was very similar to my earlier work of 2008. So much so that they cited me in their paper. Not bad for an amateur :)

Am I an amateur? Ok not really. Did I think I was going to win, no way. Can someone with limited resources and no university affiliations have fun working with other folks in a field of personal interests? Absolutely. Did I have fun? Yes. And it was a great way to challenge myself intellectually.

So check on the current challenges and see if you can find one that you may think is fun. Form a team if you need one and challenge yourself!

http://randomvectors.com/blog/2011/02/15/the-nist-challenges/feed/ 0