Process Scheduling (2024)

Who gets to run next?

Paul Krzyzanowski

February 18, 2015

The finest eloquence is that which gets things done; the worst is that which delays them.
— David Lloyd George (speech, Jan. 1919)

If you look at any process, you’ll notice that it spends some time executing instructions(computing) and then makes some I/O request, for example, to read or write data to a file or toget input from a user. After that, it executes more instructions and then, again,waits on I/O. The period of computation between I/O requests is called the CPU burst.

Interactive processes spend more time waiting for I/O andgenerally experienceshort CPU bursts. A text editor is anexample of an interactive process with short CPU bursts.

Compute-intensive processes, conversely, spend more time running instructions andless time on I/O. They exhibit long CPU bursts. A video transcoder isan example of a process with long CPU bursts. Even though it readsand writes data, it spends most of its time processing that data.

Most interactive processes, in fact, spend the vast bulk of their existencedoing nothing but waiting on data. As I write this on my Mac, I have 44 processes runningjust under my user account. This includes a few browser windows, a word processor,spreadsheet, several shell windows, Photoshop, iTunes, and variousmonitors and utilities. Most ofthe time, all these processes collectively are using less than 3% of the CPU. This is not surprisingsince most of these programs are waiting for user input, a network message, or sleeping andwaking up periodically to check some state.

Consider a 2.4 GHz processor. It executes approximately 2,400 millioninstructions per second (and this does not even count multi-coreor hyperthreaded processors).It can run 24 billion instructions in the ten seconds it mighttake you to skim a web page before you click on a link — or 1.2 billion instructionsin the half second it might take you to hit the next key as you’re typing. The big ideain increasing overall system throughput was the realization thatwe could keep several programs in memory and switch the processorto run another process when a process has to wait on an I/O operation.This is multiprogramming.The next big idea wasrealizing that you could preempt a process and letanother process run and do thisquickly enough to give the illusion that many processes are running at the same time.This is multitasking.

Most systems have a large number of processes with short CPU bursts interspersedbetween I/O requests and a small number of processes with long CPU bursts.To provide good time-sharing performance, we may preempt a running processto let another one run. The ready list, also known as arun queue, in theoperating system keeps a list of all processesthat are ready to run and not blocked on input/output or anotherblocking system request, such as a semaphore. The entries in this list arepointers to a process control block, which stores all information and state about a process.

When an I/O request for a processis complete, the process moves from the waiting state to theready state and gets placed on the run queue.

The process scheduler is the component of theoperating system that is responsible for decidingwhether the currently running process should continue running and, if not,which process should run next.There are four events that may occur where the scheduler needs tostep in and make this decision:

The current process goes from the running to the waiting state because it issues an I/O request or some operating system request that cannot be satisfied immediately.
The current process terminates.
A timer interrupt causes the scheduler to run and decide that a process has run for its allotted interval of time and it is time to move it from the running to the ready state.
An I/O operation is complete for a process that requested it and the process now moves from the waiting to the ready state. The scheduler may then decide to preempt the currently-running process and move this newly-ready process into the running state.

A scheduler is a preemptive scheduler if it hasthe ability to get invoked by an interrupt and move a process outof a running state to let another process run.The last two events in the above list may causethis to happen. If a scheduler cannot take the CPU away from a processthen it is a cooperative, or non-preemptivescheduler. Old operating systems such asMicrosoft Windows 3.1 or Apple MacOS prior to OSXare examples of cooperative schedulers.Older batch processing systems hadrun-to-completion schedulers where a process ran totermination before any other process would be allowed to run.

The decisions that the scheduler makes concerning the sequence and length of timethat processes may run is called the scheduling algorithm(or scheduling policy). These decisionsare not easy ones, as the scheduler has only a limited amount ofinformation about the processes that are ready to run. A goodscheduling algorithm should:

Be fair – give each process a fair share of the CPU, allow each process to run in a reasonable amount of time.
Be efficient – keep the CPU busy all the time.
Maximize throughput – service the largest possible number of jobs in a given amount of time; minimize the amount of time users must wait for their results.
Minimize response time – interactive users should see good performance.
Be predictable – a given job should take about the same amount of time to run when run multiple times. This keeps users sane.
Minimize overhead – don’t waste too many resources. Keep scheduling time and context switch time at a minimum.
Maximize resource use – favor processes that will use underutilized resources. There are two motives for this. Most devices are slow compared to CPU operations. We’ll achieve better system throughput by keeping devices busy as often as possible. The second reason is that a process may be holding a key resource and other, possibly more important, processes cannot use it until it is released. Giving the process more CPU time may free up the resource quicker.
Avoid indefinite postponement – every process should get a chance to run eventually.
Enforce priorities – if the scheduler allows a process to be assigned a priority, it should be meaningful and enforced.
Degrade gracefully – as the system becomes more heavily loaded, performance should deteriorate gradually, not abruptly.

It is clear that some of these goals are contradictory. For example,minimizing overhead means that jobs should run longer, thus hurtinginteractive performance. Enforcing priorities means that high-priorityprocesses will always be favored over low-priority ones, causingindefinite postponement. These factors make scheduling atask for which there can be no perfect algorithm.

To make matters even more complex, the scheduler does not know muchabout the behavior of each process and certainly has no idea ofwhat the process will do in the future.As we saw earlier, some processes perform a lotof input/output operations but use little CPU time (examples areweb browsers, shells and editors).They spend much of their time in the blockedstate in between little bursts of computation. The overallperformance of these I/O bound processes is constrainedby the speed of the I/O devices.CPU-bound processes and spend most of theirtime computing (examples are ray-tracing programs and circuitsimulators). Their execution timeis largely determined by the speed of the CPU and the amount of CPU time they canget.

To help the scheduler monitor processes and the amount of CPU timethat they use, a programmable interval timer interrupts the processorperiodically (typically 100 times per second). This timeris programmed when the operating system initializes itself.At each interrupt, theoperating system’s scheduler gets to run and decide whether thecurrently running process should be allowed to continue runningor whether it should be suspended and another ready processallowed to run. This is the mechanism that enables preemptive.

Preemptive scheduling allows the scheduler to control response timesby taking the CPU away from a process that it decided has been runningtoo long in order to let another process run.It incurs more overhead than nonpreemptive scheduling since it hasto deal with the overhead of context switching processes instead of allowinga process to run to completion or run until the next I/O operation or othersystem call. However,it allows for higher degrees of concurrency and better interactive performance.

The scheduling algorithm has the task of figuring out whether a process shouldbe switched out for another process and which process should get torun next. The dispatcher is the component of the schedulerthat handles the mechanism of actually getting that process to run on the processor.This requires loading the saved context of the selected process, which is storedin the process control block and comprises the set of registers, stack pointer,flags (status word), and a pointer to the memory mapping (typicallya pointer to the page table). Once this context is loaded, the dispatcher switchesto user mode via a return from interrupt operation that causes the processto execute from the location that was saved on the stack at the time that theprogram stopped running — either via an interrupt or a system call.

In the following sections, we will explore a few schedulingalgorithms.

Let’s first introduce some terms.

Turnaround time

Turnaround time is the elapsed time between the time the job arrives (e.g., you type a command) and the time that it terminates. This includes the delay of waiting for the scheduler to start the job because some other process is still running and others may be queued ahead.

Start time

Also known as release time, the start time is the time when the task is scheduled to run and actually gets to start running on the CPU.

If we look a process as a series of CPU bursts the starttime applies to each CPU burst. It is the time wheneach CPU burst starts to run.

Response time

This is the delay between submitting a process and it being scheduled to run (its start time). Again, if we look a process as a series of CPU bursts the response time applies to each CPU burst. It is the delay between a task being ready to run and actually running.

Completion time

This is the time when the process terminates.

Throughput

Throughput refers to the number of processes that complete over some unit of time. By comparing throughput on several schedulers, we can get a feel of whether one scheduler is able to get more processes to complete than another over some period of time. This can be due to several factors: keeping the CPU busy, scheduling I/O as early as possible to keep disks an other slow devices busy, and the amount of overhead spent doing all of this.

Possibly the most straightforward approach to scheduling processesis to maintain a FIFO (first-in, first-out) run queue. New processesgo to the end of the queue. When the scheduler needs to runa process, it picks the process that is at the head of the queue.This scheduler is non-preemptive. If the process has to block on I/O,it enters the waiting state and the scheduler picksthe process from the head of the queue. When I/O is complete andthat waiting (blocked) process is ready to run again,it gets put at the end of the queue.

With first-come, first-served scheduling, a process with a longCPU burst will hold up other processes, increasing their turnaroundtime. Moreover, it can hurt overall throughput since I/O on processesin the waiting state may complete while the CPU bound process isstill running. Now devices are not being used effectively.To increasethroughput, it would have been great if the schedulerinstead could have briefly run some I/O bound process so that couldrun briefly, request some I/O and then wait for that I/O tocomplete. Because CPU bound processes don’t getpreempted, they hurt interactive performance because the interactiveprocess won’t get scheduled until the CPU bound one has completed.

Advantage: FIFO scheduling is simple to implement. It is alsointuitively fair (the first one in line gets to run first).

Setting the quantum size

What should the length of a quantum be to get “good” performance?A short quantum is good because it allows many processes to circulatethrough the processor quickly, each getting a brief chance to run.This way, highly interactive jobsthat usually do not use up their quantum will not have to wait aslong before they get the CPU again, hence improving interactiveperformance. On the other hand, a short quantum is bad because theoperating system must perform a context switch whenever a processgets preempted. This is overhead: anything that the CPU does otherthan executing user code is essentially overhead.A short quantum impliesmany such context switches per unit time, taking the CPU away fromperforming useful work (i.e., work on behalf of a process).

The overhead associated with a context switch can be expressed as:

context switch overhead = C / (Q+C)

where Q is the length of the time-slice and C is the context switchtime. An increase in Q increases efficiency but reduces averageresponse time. As an example, suppose that there are ten processesready to run, Q = 100 ms, and C = 5 ms. Process 0 (at the headof the run queue, the list of processes that are in the ready state)gets to run immediately. Process 1 can run onlyafter Process 0’s quantum expires (100 ms) and the context switchtakes place (5 ms), so it starts to run at 105 ms. Likewise,process 2 can run only after another 105 ms. We can compute theamount of time that each process will be delayed and compare thedelays between a small quantum (10 ms) and a long quantum (100ms.):

	Q = 100ms	Q = 10ms
Proc #	delay (ms)	delay (ms)
0	0	0
1	105	15
2	210	30
3	315	45
4	420	60
5	525	75
6	630	90
7	735	105
8	840	120
9	945	135

We can see that with a quantum of 100 ms and ten processes, aprocess at the end of the queue will have to wait almost a secondbefore it gets a chance to run. This is much too slow for interactivetasks. When the quantum is reduced to 10 ms, the last process hasto wait less than 1/7 second before it gets the CPU. The downsideof this is that with a quantum that small, the context switchoverhead (5/(10+5)) is 33¹⁄₃%. Thismeans that we are wasting over a third of the CPU just switchingprocesses! With a quantum of 100 ms, the context switch overheadis just 4%.

The shortest remaining time first (SRTF) scheduling algorithm is a preemptiveversion of an older non-preemptive algorithm known asshortest job first (SJF) scheduling. Shortest job firstscheduling runs a process to completion before running the next one.The queue of jobs is sorted by estimated job length so that shortprograms get to run first and not be held up by long ones. This minimizesaverage response time.

Here’s an extreme example. It’s the 1950s and three userssubmit jobs (a deckof punched cards) to an operator. Two ofthe jobs are estimated to run for 3 minutes each while the third jobwhile the third user estimates that it will take about 48 hours to runthe program. With a shortest job first approach, the operator will runthe three-minute jobs first and then let the computer spend time onthe 48-hour job.

With the shortest remaining time first algorithm, we take intoaccount the fact that a process runs as a series of CPU bursts:processes may leave the running state because they needto wait on I/O or because their quantum expired. The algorithmsorts the run queue by the the process’ anticipated CPU burst time,picking the shortest burst time. Doing so will optimize the averageresponse time of processes.

Let’s consider an example of five processes in the run queue. If we processthem in a FIFO manner, we see that all the CPU bursts add up to 25(pick your favorite time unit; this is just an example). The meanrun time for a process, however, is the mean of all the run times,where the run time is the time spent waiting to run +the CPU burst time of the process. In this example, ourmean run time is (8 + 11 + 21 + 23 + 25)/5, or 17.6.

If we reorder the processes in the queue by the estimatedCPU burst time, we still have the same overall total (theprocesses take the same amount of time to run) butthe mean run time changes. It is now (2 + 4 + 7 + 15 + 25), or 10.6.We reduced the average run time for our processes by 40%!

Estimating future CPU burst time

The biggest problem with sorting processes this way is thatwe’re trying to optimize our schedule using data that we don’teven have! We don’t know what the CPU burst time will be fora process when it’s next run. It might immediately request I/Oor it might continue running for minutes (or until the expirationof its time slice).

The best that we can do is guess and try to predict the nextCPU burst time by assuming that it will be related to past CPU bursts for that process.All interactive processes follow the following sequence ofoperations:

compute — I/O — compute — I/O — compute — I/O

Suppose that a CPU burst (compute) period is measured as T₀. The next compute period ismeasured as T₁, and so on.The common approach to estimate the length of the next CPU burst is by using a time-decayedexponential average of previous CPU bursts for the process. We willexamine one such function, although there are variations on the theme.Let T_n be the measured time of the n^th burst;s_n be the predicted size of the n^th CPU burst; anda be a weighing factor, 0 ≤ a ≤ 1.Define s₀ as some default system average burst time.The estimate of the next CPU burst period is:

s_n+1 = aT_{n + (1 - a)s_n}

The weighing factor, a, can be adjusted how much to weigh past history versusconsidering the last observation. If a = 1, then only the last observation ofthe CPU burst period counts. If a = ½, then the last observation has asmuch weight as the historical weight. As a gets smaller than ½, thehistorical weight counts more than the recent weight.

Here is an example with a = 0.5. The blue bars represent the actualCPU burst over time. The red bars represent the estimated value. With a weightingvalue of ½, we can see how the red bars are strongly influenced by the previousactual value but factor in the older values.

Now let’s see what happens when we set a = 1. This ignores history and onlylooks at the previous CPU burst.We can see that each red bar (current estimate) has exactly the same value as theprevious blue bar (latest actual CPU burst).

For a final example, let’s set a= 0.25. Here, the last measured value onlycounts for 25% of the estimated CPU burst, with 75% being dictated by history. We cansee how immediate changes in CPU burst have less impact on the estimate when comparedwith the above graph of a = 0.5. Note how the estimates at 2, 3, 13, and 14 still remainrelatively high despite the rapid plunge of actual CPU burst values.

Advantage of shortest remaining time first scheduling:This scheduling is optimal in that italways produces the lowest mean response time.Processes with short CPU bursts are given priorityand hence run quickly (are scheduled frequently).

Disadvantages: Long-burst (CPU-intensive) processes are hurtwith a long mean waiting time. In fact, if short-burst processesare always available to run, the long-burst ones may never getscheduled.The situation where a process never gets scheduled to run is calledstarvation.Another problem with the algorithm is that the effectivenessof meeting the scheduling criteria relies on our ability to estimatethe length of the next CPU burst.

Round robin scheduling assumes that all processes are equallyimportant. This generally is not the case. We would sometimes liketo see long CPU-intensive (non-interactive) processes get a lowerpriority than interactive processes. These processes, in turn,should get a lower priority than jobs that are critical to theoperating system.

In addition, different users may have different status. A systemadministrator’s processes may rank above those of a student’s.

Dealing with starvation

One approach to the problem of indefinite postponement is to use dynamicpriorities. At the expiration of each quantum,the scheduler can decrease the priority of the current runningprocess (thereby penalizing it for taking that much CPU time).Eventually its priority will fall below that of the next highestprocess and that process will be allowed to run.

Another approach is to have the scheduler keep track of low priorityprocesses that do not get a chance to run and increase theirpriority so that eventually the priority will be high enough so thatthe processes will get scheduled to run. Once it runs for its quantum,the priority can be brought back to the previous low level.

This periodic boosting of a process’ priorityto ensure it gets a chance to run is called process aging.A simple way to implement aging is to simply increaseevery process’ priority and then have them get readjustedas they execute.

What happens if several processes get assigned the same priority?This is a realistic possibility since picking a unique priority levelfor each of possibly hundreds or thousands ofprocesses on a system may not be feasible.

We can group processes into priority classes and assigna separate run queue for each class. This allows us to categorize andseparate system processes, interactive processes, low-priority interactiveprocesses, and background non-interactive processes.The scheduler picks the highest-priority queue (class) that hasat least one process in it. In this sense, it behaves like a priorityscheduler.

Each queue mayuse a different scheduling algorithm, if desired.Round-robin scheduling per priority level is the most common.As long as processes are ready in a high priority queue, thescheduler will let each of run for their time slice. Onlywhen no processes are available to run at that priority levelwill the scheduler look at lower levels. Alternatively,some very high priority levels might implement the non-preemptivefirst-come, first-served scheduling approach to ensure thata critical real-time task gets all the processing it needs.

The scheduler may also choose a different quantum for each prioritylevel.For example, it is common to give low-prioritynon-interactive processes a longer quantum. Theywon’t get to run as often since they are in a low priority queue but,when they do, the scheduler will let them run longer. Linux, on theother hand, does the opposite. It gives a longer quantum to high-priorityprocesses on the assumption that they are important and that theyare likely to be interactive so they will usually block long beforeusing up their time slice.

One problem with multilevel queues is that the process needs to beassigned to the most suitable priority queue a priori.If a CPU-bound process is assigned to a short-quantum, high-priorityqueue, that’s not optimal for either the process nor for overallthroughput.

Multi-level queues are generally used as a top-level scheduling disciplineto separate broad classes of processes, such as real-time,kernel threads, interactive, and background processes. Specificschedulers within each class determine which process gets to runwithin that class. Most operating systems, including Windows, Linux,and OS X support a form of multilevel queues and scheduling classes.

A variation on multilevel queues is to allow the scheduler toadjust the priority (that is, use dynamic priorities)of a process during execution in orderto move it from one queueto another based on the recent behavior of the process.

The goal of multilevel feedback queuesis to automatically place processes into priority levels based on theirCPU burst behavior. I/O-intensive processes will end up on higher priorityqueues and CPU-intensive processes will end up on low priority queues.

A multilevel feedback queue uses two basic rules:

A new process gets placed in the highest priority queue.
If a process does not finish its quantum (that is, it blockson I/O) then it will stay at the same priority level (round robin)otherwise it moves to the next lower priority level

With this approach, a process with long CPU bursts willuse its entire time slice, get preempted and get placedin a lower-priority queue. A highly interactive processwill not use up its quantum and will remain at a highpriority level.

Although not strictly necessary for the algorithm,each successive lower-priority queue may be given a longerquantum. This allows a process to remain in the queuethat corresponds to its longest CPU burst.

Process aging

One problem with multilevel feedback queues isstarvation. If there are a lot of interactive processesor if new processes are frequently created, there is alwaysa task available in a high-priority queue and the CPU-boundprocesses in a low-priority queue will never get scheduled.

A related problem is that an interactive process mayend up at a low priority level. If a process ever hasa period where it becomes CPU-intensive, it tricklesdown to a low priority level and is foreverdoomed to remain there. An example is a game that needsto spend considerable CPU time initializing itself butthen becomes interactive, spending most of its time waitingfor user input.

Both these problems can be solved with process aging.As we saw earlier, we periodically increase the priority of a processto ensure that it will be scheduled to run. A simpleapproach is to periodically bring all processes to thehighest priority queue.

An advantage of a multilevel feedback queue is that thealgorithm is designed to adjust the priority of a processwhenever it runs, so a CPU-bound process will quickly trickleback down to a low priority level while an interactiveprocess will remain at a high level.

Gaming the system

If a programmer knows how the scheduler works and wants towrite software that will ensure that the process remainsat a high priority level, she can write code that will forcethe system to block on some low-latency I/O operation (e.g.,sleep for a few milliseconds) just before the quantum expires.That way, the process will be rewarded for not using upits quantum even if it repeatedly uses up a large chunk of it.

A solution to this approach is to modify the way the schedulerdecides to demote the priority of a process.Instead of simply checking whether a process uses upits time slice, it keeps track of the total time that the processspent running over a larger time interval (several time slices).Each priority queue will have a maximum CPU time allotment associatedwith it. If a process uses up that allotment over thatmulti-time-slice interval the process will be punished bybeing moved to a lower priority level.

Advantages: Multi-level feedback queues are good for separatingprocesses into categories based on their need for a CPU. They favorI/O bound processes by letting them run often. Versions of thisscheduling policy that increase the quantum at lower priority levelsalso favor CPU bound processes by giving them a larger chunk of CPUtime when they are allowed to run.

Disadvantages: The priority scheme here is one that is controlledby the system rather than by the administrator or users. A processis deemed important not because it necessarily is important,but because it happens todo a lot of I/O.

This scheduler also has the drawback that I/O-boundprocesses that become CPU bound or CPU-bound processes that becomeI/O-bound will not get scheduled well. Moreover, CPU-boundprocesses have the danger of starving on a system with manyinteractive processes. Both these problems can be dealt with byapplying process aging.

Another drawback is the ability of aprogrammer to keep the priority of a process high by performingbogus I/O operations periodically. Again, we have a solutionfor this by measuring CPU use over a larger, multi-quantumtime interval and punishing processes that use more of the CPU.

With lottery scheduling (also known as fair share scheduling),the goal is to allow a processto be granted a proportional share of the CPU - a specificpercentage. Conceptually, lottery scheduling works byallocating a specific number of “tickets” to each process.The more tickets a process has, the higher its chance of beingscheduled.

For example, suppose that we have 100 tickets in total andthree processes to run: A, B, and C. We would like to scheduleprocess A twice as frequently as processes B and C.To do this, we assign A twice as many tickets. With ticketsnumbered in the range 0…99, we might assign

Process A: 50 tickets (0...49)Process B: 25 tickets (50...74)Process C: 25 tickets (75...99)

The scheduler then picks a random number in the range 0…100.That number becomes the “winning ticket” and the processholding that ticket gets to run. When its time slice is up,or if it blocks, the sheduler picks another ticket andthat process gets to run. Over time, processes will runwith a random distribution but one that is weighted by theper-process ticket allocation.

The benefit of the algorithm is that each process isgiven a proportional share of the CPU.The difficulty is determining ticket distribution, particularlyin an environment where processes come and go and get blocked.This isn’t a useful algorithm for general-purpose schedulingbut is more useful for environments with long-running processesthat may need to be allocated shares of CPUs, such as runningmultiple virtual machines on a server.

Our discussions thus far assumed an environment where a single processgets to run at a time. Other ready processes wait until the schedulerswitches their context in and gives them a chance to run.With multiple CPUs, multiple cores on one CPU, hyperthreaded processors,more than once thread of execution can be scheduled at a time.The same scheduling algorithms apply; the scheduler simply allows morethan one process to be in the running state at one time.

The environment we will consider here is the common symmetric multiprocessing (SMP)one, where each processor has access to the same memory and devices. Each processoris running its own process and may, at any time, invoke a system call, terminate,or be interrupted with a timer interrupt. The scheduler executes on that processorand decides which process should be context switched to run. It is common forthe operating system to maintain one run queue per processor. This allows oneprocessor to manipulate the queue (e.g., when context switching) withouthaving to worry about the delay of having the queue locked by another processor.

Processors are designed with cache memory that holds frequently-usedregions of memory that processes accessed.This avoids the time delay of going out to the externalmemory bus to access memory and provides a huge boost to performance. Aswe will see in our forthcoming discussion on memory management, processorsalso contain a translation lookaside buffer, or TLB, that stores recentvirtual to physical address translations. This also speeds up memoryaccess dramatically.

When a scheduler reschedules a process onto the same processor, there isa chance that some of the cached memory and TLB lines are still present.This allows the process to run faster since it will make less referencesto main memory. If a scheduler reschedules a process onto a differentprocessor then no part of the process will be present in that processor’scache and the program will start slowly as it populates its cache.

Processor affinity is the aspect of scheduling ona multiprocessor system where the scheduler keeps track of what processorthe process ran on previously and attempts to reschedule the process ontothat same processor. There are two forms of processor affinity.Hard affinity ensures that a process always gets scheduledonto the same processor.Soft affinity is a best-effort approach. A scheduler willattempt to schedule a process onto the same CPU but in some cases maymove the process onto a different processor. The reason for doing thisis that, even though there may be an initial performance penalty to starta process on another CPU, it’s probably better than having the CPU sit idlewith no process to run. The scheduler tries to load balancethe CPUs to make sure that they have a sufficient number of tasks in their run queue.There are two approaches to load balancing among processors:

Push migration is where the operating system checks the load (number of processes in the run queue) on each processor periodically. If there’s an imbalance, some processes will be moved from one processor onto another.
Pull migration is where a scheduler finds that there are no more processes in the run queue for the processor. In this case, it raids another processor’s run queue and transfers a process onto its own queue so it will have something to run.

It is common to combine both push and pull migration (Linux does it, for example).

Scheduling domains

The real world is not as simple as deciding whether to run a task on thesame processor or not. Many systems have multiple processors and someare preferable to others when rescheduling a ready task. The categoriesof process include:

Virtual CPUs in a hyperthreaded core: Many intel CPUs support hyperthreading (HT technology). A single processor core presents itself as two virtual CPUs to the operating system. The processor core has a separate set of registers for each virtual CPU and multitple instructions can execute in parallel as long as they don’t compete for the same section of the processor. Although one execution thread may hold up the performance of another one, the threads share acces to all processor caches.
Multiple cores in a processor: Many processors, particularly those on laptops, desktops, and servers, contain several processor cores (often 2, 4, 6, or 8) within a single chip. In this case, the TLB and fastest instruction and data caches are not shared across cores. However, all the cores share access to a common memory cache. This cache is slower than the high-speed per-core cache but still much faster than accessing main memory.
Multiple processors in an SMP architecture: Multiple processors in one computer system share common access to memory in the system. However, they do not share any caches: one processor does not have access to cached data on another processor.
Multiple processors in an NUMA architecture: NUMA, Non-Uniform Memory Architecture is a multiprocessor computer architecture, designed for large numbers of processors where each processor or group of processors has access to a portion of memory via a high-speed memory interface (e.g., on the same circuit board) while other regions of memory are slower to access since they reside on other processors and are accessed via a secondary, slower, interface.

What we have now is the realization that if a taskis to be migrated to another processor, migrating it to someprocessors is preferable to others.For example, scheduling a task on a different core on thesame chip is preferable to scheduling it onto a core on a differentchip.

Linux introduces the concept of scheduling domains to handle this.Scheduling domains allow the system to organize processors into ahierarchy, where processors are grouped fromthe most desirable migration groups at the lowest layer (e.g., hyperthreadedCPUs on the same core) through to the least desirable migration groupsat the highest layers of the hierarchy (e.g., migrating across processors ondifferent circuit boards in a NUMA system).

A scheduling domain constains one or more CPU groups. Each CPU group is treated asone entity by the domain. A higher-level domain treats lower-level domains as a group.For example, two hyperthreaded CPUs sharing the same core will be placed in onegroup that has two subgroups, one for each CPU. All the per-core groups willmake up a higher-level domain that represents the entire processor.

Each CPU has a runqueue structure associated with it.In addition to the structures needed to keep track of ready processes(e.g., a balanced tree or a set of queues), thisper-cpu structure keeps track of scheduling data, including statisticsabout CPU load.Each scheduling domain has a balancing policy associated with it that definesthe balancing rules for that specific level of the hierarchy. This policy answersquestions such as:

– How often should attempts be made to balance load across groups in the domain?

– How far can the loads in the domain get unbalanced before balancing across groups is needed?

– How long can a group in the domain sit idle?

Linux performs active load balancing periodically.The scheduler moves up the scheduling domain hierarchy and checks each groupsalong the way. If any group is deemed to be out of balance based on thepolicy rules, tasks will be moved from one CPU’s run queue to another’s torebalance the domain.

Linux supports a modular scheduling system that can accommodate differentschedulers. A scheduling class defines a common set of functions thatdefine the behavior of that scheduler (e.g., add a task, remove a task,choose the next task tor run).Multiple schedulers can run concurrently.Each task in the system belongs to one scheduling class. A task willbelong to one of two scheduling classes:

sched_fair: implements the CFS (completely fair share) scheduler, a generalpurpose multilevel queue scheduler that dynamically adjusts priority levels basedon how much “virtual runtime” each task used over a period of time.Tasks that spend more time running (using the CPU) are given alower priority over those that spend spend less time running.
sched_rt: implements a simple multilevel priority-basedround-robin scheduler for real-time tasks.

To pick a task to run, the scheduleriterates through the list of scheduling classes to find the class with thehighest priority that has a runnable task.

Volker Seeker,Process Scheduling in Linux,University of Edinburgh, May 12, 2013.
Inside the Linux scheduler,M. Tim Jones, IBM developerWorks Technical Library, June 30, 2006
Understanding the Linux Kernel,Daniel P. Bovet & Marco Cesati, October 2000, Chapter 10: Process Scheduling
Inside the Linux 2.6 Completely Fair Scheduler: Providing fair access to CPUs since 2.6.23, M. Tim Jones, IBM developerWorks, December 15, 2009
Completely Fair Scheduler, Wikipedia article

This document is updated from its original version of September 27, 2010.

FAQs

Process Scheduling? ›

Process scheduling is an important part of multiprogramming operating systems. It is the process of removing the running task from the processor and selecting another task for processing. It schedules a process into different states like ready, waiting, and running.

Tell Me More ›

What are three levels of process scheduling? ›

There are three process schedulers:

The long-term scheduler which admits processes to the Ready queue.
The medium-term scheduler which blocks processes for access to resources.
The short-term scheduler which admits processes from the Ready queue to the CPU to actually be executed.

Learn More ›

What does schedule process mean? ›

Scheduled processes do tasks that are too complex or time-consuming to do manually, for example importing data or updating many records. You can run scheduled processes on a recurring schedule and send notifications based on how the process ends.

Show Me More ›

What are the three 3 states of the process scheduler? ›

The operating system considers a process to be in one of three basic scheduling states: waiting, ready, or executing.

See Details ›

What is the difference between CPU scheduling and process scheduling? ›

While CPU scheduling focuses on the CPU's utilization, process scheduling takes into account various factors like priority, resource allocation, and synchronization. Both techniques are essential for smooth system operation and resource management.

Get More Info ›

What is the basic of process scheduling? ›

Process Scheduling is responsible for selecting a processor process based on a scheduling method as well as removing a processor process. It's a crucial component of a multiprogramming operating system. Process scheduling makes use of a variety of scheduling queues.

Know More ›

What are the three process scheduling algorithms? ›

Operating systems optimize task management using various scheduling algorithms: First-Come, First-Served (FCFS) processes tasks in arrival order; Shortest Job Next (SJN) and its preemptive variant, Shortest Remaining Time First (SRTF), prioritize tasks with the shortest durations; Priority Scheduling operates based on ...

Find Out More ›

What are the 5 basic states of a process? ›

5 states of a process

New.
Ready.
Running.
Wait/Block.
Termination.

Mar 23, 2022

Keep Reading ›

What is the 3 state process model? ›

Ready State– A state in which a process is ready and waiting for its execution. Blocked State– A state in which a process doesn't execute until and unless a process event occurs, like completion of an Input/Output operation. Running State– A state in which the process is currently executing.

Learn More Now ›

Which scheduling algorithm is best? ›

Round Robin CPU Algorithm generally focuses on Time Sharing technique. Characteristics of Round robin: It's simple, easy to use, and starvation-free as all processes get the balanced CPU allocation. One of the most widely used methods in CPU scheduling as a core.

Find Out More ›

What's the difference between combined job and process scheduling? ›

Job scheduling is basically the process in which a process is selected to be brought into a queue while process scheduling is the process of a process manager scheduling a certain process and allocating it to the CPU on the basis of a strategy.

What is the shortest job first scheduling? ›

The Shortest Job First Scheduling is the policy that holds the process on the waiting list with the shortest execution time. It does so to execute the next process. The Shortest Job First Scheduling is of two types; one is preemptive, and another is non-preemptive.