How to Get Google Page Rank in Java

Baidu page rank
Baidu page rank

Baidu provides its server’s URL which Baidu tool bar uses to display the page rank of currently displaying webpage in browser. You can use that server’s URL to get the data dynamically in java.

The URL is like this :

http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&ch=SITECHECKSUM&q=info:<WebPage>

Check-sum is calculated using Jenkins hash function. Lets see how it works.

Jenkins hash algorithm implementation

package com.how2codex.google;
public class JenkinsHash {
     
    // max value to limit it to 4 bytes
    private static final long MAX_VALUE = 0xFFFFFFFFL;
 
    // internal variables used in the various calculations
    long a;
    long b;
    long c;
 
    /**
     * Convert a byte into a long value without making it negative.
     */
    private long byteToLong(byte b) {
        long val = b & 0x7F;
        if ((b & 0x80) != 0) {
            val += 128;
        }
        return val;
    }
 
    /**
     * Do addition and turn into 4 bytes.
     */
    private long add(long val, long add) {
        return (val + add) & MAX_VALUE;
    }
 
    /**
     * Do subtraction and turn into 4 bytes.
     */
    private long subtract(long val, long subtract) {
        return (val - subtract) & MAX_VALUE;
    }
 
    /**
     * Left shift val by shift bits and turn in 4 bytes.
     */
    private long xor(long val, long xor) {
        return (val ^ xor) & MAX_VALUE;
    }
 
    /**
     * Left shift val by shift bits. Cut down to 4 bytes.
     */
    private long leftShift(long val, int shift) {
        return (val < < shift) & MAX_VALUE;
    }
 
    /**
     * Convert 4 bytes from the buffer at offset into a long value.
     */
    private long fourByteToLong(byte[] bytes, int offset) {
        return (byteToLong(bytes[offset + 0])
        + (byteToLong(bytes[offset + 1]) << 8)
        + (byteToLong(bytes[offset + 2]) << 16) + (byteToLong(bytes[offset + 3]) << 24));
    }
 
    /**
     * Mix up the values in the hash function.
     */
    private void hashMix() {
        a = subtract(a, b);
        a = subtract(a, c);
        a = xor(a, c >> 13);
        b = subtract(b, c);
        b = subtract(b, a);
        b = xor(b, leftShift(a, 8));
        c = subtract(c, a);
        c = subtract(c, b);
        c = xor(c, (b >> 13));
        a = subtract(a, b);
        a = subtract(a, c);
        a = xor(a, (c >> 12));
        b = subtract(b, c);
        b = subtract(b, a);
        b = xor(b, leftShift(a, 16));
        c = subtract(c, a);
        c = subtract(c, b);
        c = xor(c, (b >> 5));
        a = subtract(a, b);
        a = subtract(a, c);
        a = xor(a, (c >> 3));
        b = subtract(b, c);
        b = subtract(b, a);
        b = xor(b, leftShift(a, 10));
        c = subtract(c, a);
        c = subtract(c, b);
        c = xor(c, (b >> 15));
    }
 
    /**
     * Hash a variable-length key into a 32-bit value. Every bit of the key
     * affects every bit of the return value. Every 1-bit and 2-bit delta
     * achieves avalanche. The best hash table sizes are powers of 2.
     *
     * @param buffer
     *            Byte array that we are hashing on.
     * @param initialValue
     *            Initial value of the hash if we are continuing from a previous
     *            run. 0 if none.
     * @return Hash value for the buffer.
     */
    public long hash(byte[] buffer, long initialValue) {
        int len, pos;
 
        // set up the internal state
        // the golden ratio; an arbitrary value
        a = 0x09e3779b9L;
        // the golden ratio; an arbitrary value
        b = 0x09e3779b9L;
        // the previous hash value
 
        //c = initialValue;
        c = 0x0E6359A60L;
 
        // handle most of the key
        pos = 0;
        for (len = buffer.length; len >= 12; len -= 12) {
            a = add(a, fourByteToLong(buffer, pos));
            b = add(b, fourByteToLong(buffer, pos + 4));
            c = add(c, fourByteToLong(buffer, pos + 8));
            hashMix();
            pos += 12;
        }
 
        c += buffer.length;
 
        // all the case statements fall through to the next on purpose
        switch (len) {
        case 11:
            c = add(c, leftShift(byteToLong(buffer[pos + 10]), 24));
        case 10:
            c = add(c, leftShift(byteToLong(buffer[pos + 9]), 16));
        case 9:
            c = add(c, leftShift(byteToLong(buffer[pos + 8]), 8));
            // the first byte of c is reserved for the length
        case 8:
            b = add(b, leftShift(byteToLong(buffer[pos + 7]), 24));
        case 7:
            b = add(b, leftShift(byteToLong(buffer[pos + 6]), 16));
        case 6:
            b = add(b, leftShift(byteToLong(buffer[pos + 5]), 8));
        case 5:
            b = add(b, byteToLong(buffer[pos + 4]));
        case 4:
            a = add(a, leftShift(byteToLong(buffer[pos + 3]), 24));
        case 3:
            a = add(a, leftShift(byteToLong(buffer[pos + 2]), 16));
        case 2:
            a = add(a, leftShift(byteToLong(buffer[pos + 1]), 8));
        case 1:
            a = add(a, byteToLong(buffer[pos + 0]));
            // case 0: nothing left to add
        }
        hashMix();
 
        return c;
    }
 
    /**
     * See hash(byte[] buffer, long initialValue)
     *
     * @param buffer
     *            Byte array that we are hashing on.
     * @return Hash value for the buffer.
     */
    public long hash(byte[] buffer) {
        return hash(buffer, 0);
    }
}

Now its time to get the page-rank of a webpage. Lets write the code:

BaiduPageRankUtil.java

package com.how2codex.google;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class BaiduPageRankUtil {
    public static void main(String[] args)
    {
        BaiduPageRankUtil pagerankTool = new BaiduPageRankUtil();
        System.out.println(pagerankTool.getPageRank("//how2codex.com/design-patterns/singleton-design-pattern-in-java/"));
    }
    public int getPageRank(String pageAddress) {
        String result = "";
        JenkinsHash jenkinsHash = new JenkinsHash();
        long hash = jenkinsHash.hash(("info:" + pageAddress).getBytes());
        // Append a 6 in front of the hashing value.
                + "ch=6"
                + hash
                + "&ie=UTF-8&oe=UTF-8&features=Rank&q=info:"
                + pageAddress;
        System.out.println("Requesting URL : " + url);
        try {
            URLConnection conn = new URL(url).openConnection();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    conn.getInputStream()));
            String input;
            while ((input = br.readLine()) != null) {
                //If response Rank_1:1:2 then PR = 2
                System.out.println(input);
                result = input.substring(input.lastIndexOf(":") + 1);
            }
        } catch (Exception e) {
            System.out.println(e.getMessage());
        }
        if ("".equals(result)) {
            return 0;
        } else {
            return Integer.valueOf(result);
        }
    }
}
Output:
Requesting URL : http://toolbarqueries.google.com/tbr?client=navclient-auto&hl=en&ch=61932893626&ie
=UTF-8&oe=UTF-8&features=Rank&q=info://how2codex.com/design-patterns/singleton-design-pattern-in-java/
Rank_1:1:2
2

References:

http://en.wikipedia.org/wiki/Jenkins_hash_function
http://www.codeproject.com/Articles/20038/Request-Baidu-s-Page-rank-Programmatically

Happy Leaning !!

saigon has written 1440 articles

Leave a Reply