Process, Thread and Fiber

andrew_tanenbaumIn the past few months I have been asked about the differences between a process and a thread. No one so far has mentioned a fiber even though fibers are related to threads. Let me describe them and later give some examples on how I have and continue to use threads in my work.

I searched for different descriptions / definitions on-line and combined / distilled them into single and simple sentences. Of course, as one continues to talk about each topic, the definitions expand.

I recall a college course that used “Operating Systems Design and Implementation” by Andrew Tanenbaum. The book came with a copy of MINIX including source code. The course, book and MINIX provided me with a very good base on operating systems.minix_mascot

A process is an executable image of code associated with a program. When you ask a computer to execute an application or service, you could type the name of the program on a command prompt including optional arguments. A product that I developed and continue to work on is a distributed storage server named iCAS. On a Windows console one can enter the following command to start the iCAS program:

C:\> sdm -debug

The operating system creates a process and a single thread of execution to be scheduler to run by the Windows scheduler. The single thread of execution in a C program is labeled main() as illustrated by the following code snippet:

int __cdecl   sdmMain              (

int    argc,

char   *argv[]

)

//     ***************************************************************@@

//     – This function is the main entry point for the SDM server part

//     of the iCAS.

//     *****************************************************************

{

::::::::

The reason the entry point is not named main() is due to the fact that I used wrapper code in order to be able to allow the software run interactively for debugging / maintenance purposes (as previously illustrated) or as a Windows service:

C:\> net start sdm

threadAs discussed, the process contains the image of the program to be executed, memory to hold the executable code, the heap and stack. The process also contains references to resources assigned to the program by the operating system. The iCAS is able to manage billions of files in local and distributed systems. The process has security credentials to access resources and to keep them private from other processes or users. A process holds the processor state. Keep in mind that some of the state is also shared by threads.

As soon as the main thread of the iCAS starts, it verifies that the necessary resources are available (e.g., databases, file systems) and starts a set of threads to handle different operations. A main socket is open to listen for requests. The RequestThread() is started to listen for client requests (e.g., store, query and retrieve).

A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler. Typically, each core in a CPU is able to perform a single sequence of instructions at a time. The instructions belong to a thread of execution. Typically after a short period of time (a millisecond or less), the scheduler part of the operating system, switches the CPU to execute a different thread. To a human, it gives the impression that multiple threads, and processes are executing simultaneously. In reality, each thread executes in a serial fashion. Of course, modern CPUs have multiple cores and some computers (like the one I use for software development) have multiple CPUs with multiple cores. In such hardware, multiple threads may execute in parallel.

The RequestThread() creates a thread to service each request. The following code illustrates how a new thread is spawned:

// **** START a thread to process this request ****

threadHandle = (HANDLE)_beginthread(     (THREAD_START)ThisRequestThread,

(unsigned)0,         // stack size

(void *)request);

if (threadHandle == (HANDLE)-1)                        // something went wrong

{

strcpy(errorString, strerror(errno));

EventLog(EVENT_ERROR,

“RequestThread <<< _beginthread ThisRequestThread errno: %d errorString ==>%s<== line: %d file ==>%s<==\n”,

errno, errorString, __LINE__, __FILE__);

retVal = WAR_COULD_NOT_START_THREAD;                   // flag the issue

goto done;                                      // to perform clean up

}

A new thread is created to handle a client request. The thread is thread_poolscheduled by the operating system scheduler. Several synchronization mechanisms need to be used to assure that data and resource integrity is maintained. Typically one may uses mutexes and semaphores.

For performance reasons, it is possible to create a set of idle threads and dispatch them as needed. That eliminates the expense (resources and time) to create, execute and destroy threads. I also developed a large library of primitives that different applications and services use to eliminate reinventing the wheel and reduce development time and testing. One of such primitives provides management of thread pools.

// **** start the pool of worker threads ****

status = ThreadWorkerThreads(     totalThreads,        // # of threads requested

&workQ,                    // queue allocated by this call

&workMutex,          // mutex allocated by this call

&workSema,           // semaphore allocated by this call

&WorkerThread);            // address of the worker thread

CDP_CHECK_STATUS(“TestThreadWorkerThreads <<< ThreadWorkerThreads”, status);

In a nutshell, the developer or the program at run time, decides the number of threads that needs to be created to handle a specific task. The threads are created and wait for work in a queue.

I would like to clarify that the development of iCAS code and associated libraries were started some time ago. Today you are able to find all the primitives that I developed in libraries for most programming languages. At this time, I am in the process of architecting a next generation product which will be written in an object oriented programming language (e.g., C++, C# or Java). I will be using libraries / modules readily available. This will allow the team to concentrate on the actual storage server and not in the development of auxiliary / supporting functions and methods.

java_librarySo far, a process appears to be static and a thread appears to be dynamic. A fiber uses co-operative multitasking while a thread uses pre-emptive multitasking. Scheduling of a fiber is performed by user code. This allows reducing even further the resources and management associated with a thread. One may think of a fiber as a light weight thread.

The concept and utilization of threads takes some time to sink in and to become proficient in their use. Of course the syntax to handle a thread is different if you are using C, C# or Java. That set aside, in general the concerns of using threads are similar.

My suggestion to anyone that wishes to learn threads is to get a good book on the subject and practice. Modern services and applications require the proper use of threads in order to be extensible and reliable.

As usual, if you have comments or questions please do not hesitate and send me a message to my email account.

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.