What are Win32 Thread Libraries


26.4 Programming with the POSIX threads

In order to be able to use POSIX threads, you must first install the Pthread library. Under Linux it can be easily installed with the respective package manager. The Pthread library can also be easily installed for Windows. The first port of call should be the website http://sourceware.org/pthreads-win32/. So that I don't waste several pages on instructions for installing and using the Pthread library, you will find a description of how you can create applications with the Pthread library on the book CD. In addition to Linux, the use of Pthreads with development environments such as Code :: Blocks and Visual C ++ from Microsoft is discussed.


26.4.1 A serial example

To demonstrate the threads to you, I used an absolutely simple example. We are using two int arrays with 100,000 elements in descending order - which means that all elements of the two arrays need to be sorted. I'm just trying to speed up two CPU-intensive tasks. I used the somewhat slower bubble sort as the sorting algorithm. The following example should then be created in parallel:

/ * bubblesort.c * / #include #include

The sorted elements can then be found in the myoutput.txt file in the same directory in which the program is executed. When executing the program, my CPU (with dual core) is used approx. 50% (see Figure 26.5). In addition to the display of the CPU utilization, the progress of both CPUs is also displayed with the utilization. This also shows that only one CPU is busy sorting the array. Of course, I also monitored the individual processes so that another process with a computationally intensive application didn't interfere.

Figure 26.5 "Bubblesort.c" when running without threads

If the program was executed several times, the average execution time was always between 25 and 28 seconds (which of course also depends on the computing power).

The aim of this example should now be to sort the two arrays in parallel using the POSIX thread library - in other words, each CPU should sort an array here. We hope that this will lead to a considerably faster execution time for the program.


26.4.2 The framework for a multi-threaded program

Before you run the program for sorting the arrays in parallel, the basic functions of Pthreads that are necessary for this should be described in more detail here.


Note

All functions of the Pthread library return 0 on success and -1 on failure.


"Pthread_create" - create a new thread

You can create a new thread with the pthread_create () function:

#include int pthread_create (pthread_t * thread, const pthread_attr_t * attribute, void * (* function) (void *), void * arguments);

Each thread has its own identification number of the data type pthread_t, which is stored in the address of the first parameter of pthread_create (). You can assign attributes for the thread with the second parameter. If NULL is specified for this, the standard attributes are used. The attributes of a thread are dealt with separately. With the third parameter you specify the "function" for the thread itself - this is the actual new thread. To do this, the start address of the function must be specified. You can pass the arguments for the thread from the third parameter with the fourth parameter. Usually this argument is used to pass data to the thread. In practice this is mostly a structure variable.

End a thread

There are basically two ways to end a thread: Either you use the function-typical return or the function pthread_exit. In both cases the return value must be of the type void *. The syntax for pthread_exit () looks like this:

#include void pthread_exit (void * value);

With both options, only the respective thread is terminated. You can then query the return value with the pthread_join function.


Note

As is typical for C, the return value of a thread must not be a local memory object, as with the usual functions, since the memory is no longer valid after the thread has been terminated.


However, if any thread calls the standard function exit () anywhere in the program, this means the end of all threads including the main thread.

You can use the pthread_cleanup_push () and pthread_cleanup_pop () functions to set up an exit handle so that you do not have to worry about cleaning work such as releasing resources after a thread has ended. A handle set up in this way is always executed when a thread has been terminated with pthread_exit or return. Basically, these functions can be compared to the standard library function atexit (). From the suffixes _push and _pop, you can already guess that the stack principle is also used here. Here is the syntax of the two functions:

#include void pthread_cleanup_push (void (* function) (void *), void * arg); void pthread_cleanup_pop (int exec);

Use pthread_cleanup_push () to set up the exit handle. The first parameter you enter is the function that is to be carried out. The second parameter is used for the arguments that you want to pass to the function. You can remove the last set exit handle from the stack with the pthread_cleanup_pop () function. However, if you specify a value not equal to 0 as the exec parameter, this function will be executed beforehand, which is not done if 0 is specified.


Note

pthread_cleanup_push () and pthread_cleanup_pop () are implemented as macros. Pthread_cleanup_push () is implemented with an opening curly bracket and pthread_cleanup_pop () with a closing curly bracket. That means: You have to execute both functions in the same statement block. So you always have to use a _push and a _pop, even if you know that a _pop will never be reached.


"Pthread_join" - waiting for the thread to end

The pthread_join () function is used to wait for the end and return value of individual threads from the main thread:

#include int pthread_join (pthread_t thread, void ** thread_return);

pthread_join () stops the calling thread (usually the main thread), which created a thread with pthread_create, until the thread thread of type pthread_t has ended. The exit status (or return value) of the thread is written to the address of thread_return. If you are not interested in the return value, you can also use NULL here.

A thread that terminates is not "released" or recognized as a terminated thread until another thread calls pthread_join. Therefore you should call pthread_join once for each created thread, unless you have "detached" a thread with pthread_detach.

A parallel example

With these few functions it is now possible to create a real parallel application. The example for sorting the arrays with bubble sort follows - with the difference from the previous example that each CPU now gets an array for sorting.

/ * thread1.c * / #include #include #include / * 100000 elements * / #define MAX 100000 / * an array of large to small values ​​* / int test_array1 [MAX]; int test_array2 [MAX]; / * create in reverse order * / void init_test_array (int * array) {int i, j; for (i = MAX, j = 0; i> = 0; i -, j ++) array [j] = i; } // Thread 1 static void * bubble1 (void * val) {static int i, temp, elements = MAX; printf ("Thread bubble1 () started \ n"); while (elements--) for (i = 1; i <= elements; i ++) if (test_array1 [i-1]> test_array1 [i]) {temp = test_array1 [i]; test_array1 [i] = test_array1 [i-1]; test_array1 [i-1] = temp; } printf ("Thread bubble1 () has ended \ n"); // We are not interested in the return value. return NULL; } // Thread 2 static void * bubble2 (void * val) {static int i, temp, elements = MAX; printf ("Thread bubble2 () started \ n"); while (elements--) for (i = 1; i <= elements; i ++) if (test_array2 [i-1]> test_array2 [i]) {temp = test_array2 [i]; test_array2 [i] = test_array2 [i-1]; test_array2 [i-1] = temp; } printf ("Thread bubble2 () has ended \ n"); // We are not interested in the return value. return NULL; } int main (void) {pthread_t thread1, thread2; int i, rc; // output to a text file freopen ("myoutput.txt", "w +", stdout); printf ("Main thread main () started \ n"); // initialize both arrays with values ​​init_test_array (test_array1); init_test_array (test_array2); // create thread 1 rc = pthread_create (& thread1, NULL, & bubble1, NULL); if (rc! = 0) {printf ("Couldn't create thread 1 \ n"); return EXIT_FAILURE; } // create thread 2 rc = pthread_create (& thread2, NULL, & bubble2, NULL); if (rc! = 0) {printf ("Couldn't create thread 2 \ n"); return EXIT_FAILURE; } // Main thread is waiting for both threads. pthread_join (thread1, NULL); pthread_join (thread2, NULL); // write the result of the sorting to the file // myoutput.txt for (i = 0; i

The execution of the program itself corresponds to the serial example. The arrays are also sorted here, and the result is written to the myoutput.txt file. We are more interested in the utilization of the CPUs and of course the time this sorting takes with the parallel version. A look at the utilization of the CPU now shows the desired result. The two CPUs are both fully utilized in parallel and also do their work at the same time.

Figure 26.6 Bubble location when running in parallel

The execution time has also changed tremendously. Instead of the previous 25–28 seconds, our program now does its work in 11–14 seconds. We have almost halved the execution time.


Note

The example could certainly be further optimized. The processor's cacheline size, which is usually 128 bytes, has not yet been taken into account. It can happen, for example, that variables from two different threads are in a cacheline. If a thread changes the value of its variable, the invalid bit is set for the other processor. This means that the other processor has to reload the value into the cache. This can slow down the performance of the application considerably.



26.4.3 Summary

This brief introduction to POSIX threads shows how explosive and current the topic is. However, it should be said that threads should really only be used where they are absolutely needed. Don't let this chapter fool you: threads are not always that easy to implement. Synchronization mechanisms often have to be implemented when several threads share the data. If you don't know exactly what you are doing here, it can happen that the threads run amok or that there is a data salad. There are problems that can be parallelized very well and easily. Others, on the other hand, require simple synchronization mechanisms, and still others require a huge administrative effort, so that parallelization is almost not worthwhile.


Note

You can find more about the POSIX threads on the book CD in a chapter from the book "Linux-UNIX-Programming" from the same publisher (and from my pen ;-)). The examples can of course also be run and used under Windows, provided you have installed the Phtread library. How to do this is also described on the book CD.




your opinion

How did you like the Openbook? We always look forward to your feedback. Please send us your feedback as an e-mail to [email protected]