Wednesday, December 23, 2009

Basics of Concurrent Programming - 1

I have some free time and its high time to revive my blog. Since multi-core (r)evolution is expected to make concurrent programming the norm than exception, that is our blog's focus area (though I've not completely bought into it). I'm planning to write a 'for dummies' series on this topic.

Let's start with the basics.

For quite a long period of time, we had concurrency constructs such as Semaphores and Monitors (and other less known/used stuff like conditional critical sections, CSP etc). Among these, monitors are more popular and provides better abstraction model over semaphores. However semaphores and monitors are like C and Java. One is more flexible/powerful; other one is easier to use. We will leave semaphores for now and start with Monitors.

Time for an example.

Let's take the classical Reader/Writer problem.
We want to have some shared data where multiple writers are allowed, as well as multiple readers, but at no time should readers and writers be allowed in at the same time. However, multiple readers can read the data at the same time.


We will start with a naive java solution (with no concurrency at all). The shared Data can be defined as follows:


public class Data {
    StringBuilder value = new StringBuilder();

    public String read() {
        return value.toString();
    }

    public void write(String s) {
        System.out.println("Writing " + s);                
        this.value.append(s);
        System.out.println("Wrote " + s);
    }
}


It's pretty simple. We use StringBuilder as the data type, but this can be generalized easily. Our aim is also straightforward - read and write methods should not execute in parallel. However, multiple reads can happen in parallel.

We also need Readers and Writers to use the data. That is easy too.


class Reader extends Thread {
    private Data data;
    private int readerId;

    public Reader(int readerId, Data data) {
        this.data = data;
        this.readerId = readerId;
    }

    @Override
    public void run() {
        //Each reader reads 10 times (say)
        for(int i = 0; i < 10; i++) {
            System.out.println("Reader "+ readerId + " : "+ data.read());
        }
    }
}

class Writer extends Thread {
    private Data data;
    private int writerId;

    public Writer(int writerId, Data data) {
        this.data = data;
        this.writerId = writerId;
    }

    @Override
    public void run() {
        //Each writer writes 10 times (say)
        for(int i = 0; i < 10; i++) {
            data.write("W" + writerId + ":"+ i);
        }
    }
}

public class Main {
    public static void main(String[] args) {
        Data d = new Data();
        //We have 5 readers and writers as an example.
        for (int i=0; i < 5; ++i) {
            new Reader(i, d).start();
            new Writer(i, d).start();
        }
    }
}



If we fire up the Main class, we can see the following output (or something similar):
Writing W0:0
Wrote W0:0
Writing W0:1
Wrote W0:1
Reader 3 : W0:0W0:1
Reader 3 : W0:0W0:1
...

As you can guess this solution does not work as read and write can be called simultaneously. So you may see output like:
Writing W0:0
Reader 3 :
Wrote W0:0

That is, some one is reading while you are writing. There are many other race conditions that can happen like two writers writing simultaneously etc.

Ok, let's make our Data class ready for concurrent access. A naive way of doing that would be to make read and write methods synchronized. But that would block multiple readers accessing the data too, which is restrictive. We will be effectively serializing the access to Data.
Digression: Java initially had Hashtable and Vector classes which suffered from this problem. All the methods were synchronized, so if someone was checking the length of the collection, he will block others who are doing get (or any other read operation). This problem was finally solved from Java 1.5 onwards with concurrent collections.

Back to Reader/Writer. We need to track who is accessing the data and synchronize accordingly. So we maintain no.of readers currently reading. Writers wait if someone is reading. The last reader and writer notifies any waiting writer. Readers can go in simultaneously. Our modified Data class will look like:


public class Data {
    private StringBuilder value = new StringBuilder();
    private int readers = 0;

    public String read() {
        synchronized (this) {
            readers++;
        }
        String val = value.toString();
        synchronized (this) {
            readers--;
            if (readers == 0) notify();
        }
        return val;
    }

    public synchronized void write(String s) {
        while (readers > 0)
            try {
                wait();
            } catch (InterruptedException e) {
                return;
            }
        System.out.println("Writing " + s);                
        this.value.append(s);
        System.out.println("Wrote " + s);
        notify();
    }
}



Reader and writer remains the same. However, this is already getting complicated. Java has ReadWriteLock from 1.5 onwards, so you can use it directly instead of reinventing it. With that our solution will look like:



import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

public class Data {
    private StringBuilder value = new StringBuilder();
    private ReadWriteLock lock = new ReentrantReadWriteLock(true);

    public String read() {
        try {
            lock.readLock().lock();
            return value.toString();
        } finally {
            lock.readLock().unlock();
        }
    }

    public void write(String s) {
        try {
            lock.writeLock().lock();
            System.out.println("Writing " + s);
            this.value.append(s);
            System.out.println("Wrote " + s);
        } finally {
            lock.writeLock().unlock();
        }
    }
}



This is much simpler. In the background it will be doing stuff similar to what we did above. The advantage is we don't have to maintain the queue and logic is much simpler. However, there is limited control on the fairness. In the above example, we had specified the ReadWriteLock to be fair, which means the system will provide a weak FIFO semantics for waiting threads. Other option is no fairness at all. But if we keep track of readers our-self (like in previous implementation) we can provide arbitrary priorities like Reader have more priority over writers and vice verse.

Anyway, that's all for the first installment. Let me know your comments.

No comments: