Stages of Batch Processing — WAeUP.Kofa 1.8.2.dev0 Documentation (2024)

Stage 1: File Upload¶

Users with permissionwaeup.manageDataCenterare allowed to access the data center and also to use the uploadpage. On this page they can access an overview of all availablebatch processors. When clicking on a processor name, required,optional and non-schema fields show up in the modal window.Also a CSV file template, which can be filled and uploaded to avoidheader errors, is being provided in this window.

Many importer fields are of type ‘Choice’, which means only definiedkeywords (tokens) are allowed, see schema fields.An overview of all sources and vocabularies, which feed thechoices, can be also accessed from the datacenter upload page andshows up in a modal window. Sources and vocabularies of the basepackage can be viewed here.

Stage 2: File Header Validation¶

Import step 3 is the stage where the file content is assessed forthe first time and checked if the column titles correspond with thefields of the processor chosen. The page shows the header and thefirst record of the uploaded file. The page allows to change columntitles or to ignore entire columns during import. It might havehappened that one or more column titles are misspelled or that theperson, who created the file, ignored the case-sensitivity of fieldnames. Then the data import manager can easily fix this by selectingthe correct title and click the ‘Set headerfields’ button. Settingthe column titles is temporary, it does not modify the uploadedfile. Consequently, it does not make sense to set new column titlesif the file is not imported afterwards.

The page also calls the checkHeaders method of the batch processorwhich checks for required fields. If a required column title ismissing, a warning message is raised and the user can’t proceed tothe next step (import step 4).

Important

Data center managers, who are only charged with uploading files butnot with the import of files, are requested to proceed up to import step 3and verify that the data format meets all the import criteria andrequirements of the batch processor.

Stage 3: Data Validation and Import¶

Import step 4 is the actual data import. The import is started byclicking the ‘Perform import’ button. This action requires thewaeup.importDatapermission. If data managers don’t have this permission, they willbe redirected to the login page.

Kofa does not validate the data in advance. It tries to import thedata row-by-row while reading the CSV file. The reason is thatimport files very often contain thousands or even tenthousands ofrecords. It is not feasable for data managers to edit import filesuntil they are error-free. Very often such an error is not really amistake made by the person who compiled the file. Example: Theimport file contains course results although the student has not yetregistered the courses. Then the import of this single record has towait, i.e. it has to be marked pending, until the student has addedthe course ticket. Only then it can be edited by the batch processor.

The core import method is:

BatchProcessor.doImport()[source]

In contrast to most other methods, doImport is not supposed tobe customized, neither in custom packages nor in derived batchprocessor classes. Therefore, this is the only place where wedo import data.

Before this method starts creating or updating persistent data, itprepares two more files in a temporary folder of the filesystem: (1)a file for pending data with file extension .pending and (2)a file for successfully processed data with file extension.finished. Then the method starts iterating over all rows ofthe CSV file. Each row is treated as follows:

An empty row is skipped.
Empty strings or lists ([]) in the row are replaced byignore markers.
The BatchProcessor.checkConversion method validates and converts all values in the row. Conversion means the transformation of stringsinto Python objects. For instance, number expressions have to betransformed into integers, dates into datetime objects, phone numberexpressions into phone number objects, etc. The converter returns adictionary with converted values or, if the validation of one of theelements fails, an appropriate warning message. If the conversionfails a pending record is created and stored in the pending data filetogether with a warning message the converter has raised.
In create mode only:
The parent object must be found and a childobject with same object id must not exist. Otherwise the rowis skipped, a corresponding warning message is raised and arecord is stored in the pending data file.
The BatchProcessor.checkCreateRequirements method checks additionalrequirements the parent object must fulfill before a new sububjectis being added. These requirements are not imposed by the datatype but the context of the object. For example, the course resultsof graduated students must not changed by import, neither by creating nor updating or removing course tickets.
Now doImport tries to add the new object with the datafrom the conversion dictionary. In some cases thismay fail and a DuplicationError is raised. For example, a newpayment ticket is created but the same payment for same sessionhas already been made. In this case the object id is unique, noother object with same id exists, but making the ‘same’ paymenttwice does not make sense. The import is skipped and arecord is stored in the pending data file.
In update mode only:
If the object can’t be found, the row is skipped,a no such entry warning message is raised and a record isstored in the pending data file.
The BatchProcessor.checkUpdateRequirements method checks additionalrequirements the object must fulfill before being updated. Theserequirements are not imposed by the data type but the contextof the object. For example, post-graduate students have a differentregistration workflow. With this method we do forbid certain workflowtransitions or states.
Finally, doImport updates the existing object with the datafrom the conversion dictionary.
In remove mode only:
If the object can’t be found, the row is skipped,a no such entry warning message is raised and a record isstored in the pending data file.
The BatchProcessor.checkRemoveRequirements method checks additionalrequirements the object must fulfill before being removed.These requirements are not imposed by the data type but the contextof the object. For example, the course results of graduated studentsmust not changed by import, neither by creating nor updating orremoving course tickets.
Finally, doImport removes the existing object.

Stage 4: Post-Processing¶

The data import is finalized by callingdistProcessedFiles.This method moves the .pending and .finished files as well as theoriginally imported file from their temporary to their final location in thestorage path of the filesystem from where they can be accessed through thebrowser user interface.

Stages of Batch Processing — WAeUP.Kofa 1.8.2.dev0 Documentation (2024)

FAQs

What are the stages in batch processing? ›

Table Of Contents

Stage 1: File Upload.
Stage 2: File Header Validation.
Stage 3: Data Validation and Import.
Stage 4: Post-Processing.

Tell Me More ›

What is the batching process? ›

Batch processing is the method computers use to periodically complete high-volume, repetitive data jobs. Certain data processing tasks, such as backups, filtering, and sorting, can be compute intensive and inefficient to run on individual data transactions.

Learn More ›

What are the types of models in batch process? ›

For a batch process, ISA S88 describes the following models: Physical model: To explain physical assets of the enterprise. Process model: For sub-division of a batch process. Procedural control model: For sub-division of procedural elements of batch process.

Show Me More ›

What is batch processing in transaction processing system? ›

Batch processing is the processing of transactions in a group or batch. No user interaction is required once batch processing is underway. This differentiates batch processing from transaction processing, which involves processing transactions one at a time and requires user interaction.

See Details ›

What are the three phases of batch job? ›

Batch jobs undergo three primary phases:

Load and Dispatch (Phase): Mule prepares for batch job processing. ...
Process (Phase): Actual processing begins, with records processed asynchronously. ...
On Complete (Phase): An optional phase that provides a summary of the batch processing.

Sep 5, 2023

Get More Info ›

What is an example of a batch processing operating system? ›

The system processes each job in turn, without any user intervention, until all jobs have been completed. Some examples of batch processing operating systems include IBM's z/OS, Unisys MCP, and Burroughs MCP/BCS.

Know More ›

What is batching documents? ›

Document batching in terms of AWS refers to the process of grouping multiple documents or files together for seamless management and efficient processing within the cloud environment.

Find Out More ›

What are the 2 types of batching? ›

Weight Batching vs Volume Batching

Weight Batching	Volume Batching
Measurement of quantities is done by considering their weight.	Measurement of quantities is done by considering their volume.
It is an accurate method	It is an approximate method.
It is a tedious process.	It is a simple method

4 more rows

May 27, 2023

Keep Reading ›

What are the characteristics of batch processing? ›

What are the characteristics of batch processing? Batch processing in big data involves processing a high volume of data in batches. Batches are scheduled based on the availability of input data and processing results. The goal of batch processing is use case specific, but is about achieving specific business results.

Learn More Now ›

What are the analytics requirements for batch processing? ›

For batch process analytics, two perspectives need to be merged: data over time and data that shows quality or yield parameters of finished batches. Two types of BSPC models are needed to fully account for batch process data.

Find Out More ›

What is the batch process in strategic management? ›

Meanwhile, process strategy describes ways to improve the process planning of a particular product or service. Batch processes refer to those production lines that create a big push at one time, while continuous processes are almost constantly working but can move more slowly than a batch process.

What are the names of batch operating system? ›

GCOS.
GECOS.
MVS.
OS/1100.
OS/MVT.
OS/SVS.
SCOPE. From CompuWiki, a Wikia wiki.

Steps	Tools
Processing of Batches	Spark, Pig, Hive, Python, and U-SQL
Data Storing with Analytics	Hive, Hbase, SQL Data Warehouse, MongoDB, DynamoDB, Spark SQL
Reporting and Analytics	Python, Power BI, Azure Analysis Service
Arrangement of Data	Oozie, Sqoop and Azure Data Factory