Reference guide¶
Data handling for input function¶
data_loader (csv_filename, str], params[, labels]) |
Loads, preprocesses (data augmentation, padding) and feeds the data |
padding_inputs_width (image, target_shape, …) |
Given an input image, will pad it to return a target_shape size padded image. |
augment_data (image, max_rotation) |
Data augmentation on an image (padding, brightness, contrast, rotation) |
random_rotation (img, max_rotation, crop) |
Rotates an image with a random angle. |
random_padding (image, max_pad_w, max_pad_h) |
Given an image will pad its border adding a random number of rows and columns |
serving_single_input (fixed_height, min_width) |
Serving input function needed for export (in TensorFlow). |
Config for training¶
Alphabet (lookup_alphabet_file, blank_symbol) |
Object for alphabet / symbols units. |
TrainingParams (**kwargs) |
Object for parameters related to the training. |
Params (**kwargs) |
Object for general parameters |
import_params_from_json (model_directory, …) |
Read the exported json file with parameters of the experiment. |
Model¶
deep_cnn (input_imgs, input_channels, …) |
CNN part of the CRNN network. |
deep_bidirectional_lstm (inputs, params, …) |
Recurrent part of the CRNN network. |
crnn_fn (features, labels, mode, params) |
CRNN model definition for tf.Estimator . |
get_words_from_chars (characters_list, …[, …]) |
Joins separated characters to form words. |
Loading exported model¶
PredictionModel (model_dir, session, signature) |
Helper class to load an exported model and apply it to image segments for transcription. |
-
tf_crnn.data_handler.
augment_data
(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203367b320>, max_rotation: float = 0.1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dc18>[source]¶ Data augmentation on an image (padding, brightness, contrast, rotation)
Parameters: - image – Tensor
- max_rotation – float, maximum permitted rotation (in radians)
Returns: Tensor
-
tf_crnn.data_handler.
data_loader
(csv_filename: Union[List[str], str], params: tf_crnn.config.Params, labels=True, batch_size: int = 64, data_augmentation: bool = False, num_epochs: int = None, image_summaries: bool = False)[source]¶ Loads, preprocesses (data augmentation, padding) and feeds the data
Parameters: - csv_filename – filename or list of filenames
- params – Params object containing all the parameters
- labels – transcription labels
- batch_size – batch_size
- data_augmentation – flag to select or not data augmentation
- num_epochs – feeds the data ‘num_epochs’ times
- image_summaries – floag to show image summaries or not
Returns: data_loader function
-
tf_crnn.data_handler.
padding_inputs_width
(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dcc0>, target_shape: Tuple[int, int], increment: int) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f203581e4e0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581eac8>][source]¶ Given an input image, will pad it to return a target_shape size padded image. There are 3 cases:
- image width > target width : simple resizing to shrink the image
- image width >= 0.5*target width : pad the image
- image width < 0.5*target width : replicates the image segment and appends it
Parameters: - image – Tensor of shape [H,W,C]
- target_shape – final shape after padding [H, W]
- increment – reduction factor due to pooling between input width and output width, this makes sure that the final width will be a multiple of increment
Returns: (image padded, output width)
-
tf_crnn.data_handler.
random_padding
(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec940>, max_pad_w: int = 5, max_pad_h: int = 10) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f20336800f0>[source]¶ Given an image will pad its border adding a random number of rows and columns
Parameters: - image – image to pad
- max_pad_w – maximum padding in width
- max_pad_h – maximum padding in height
Returns: a padded image
-
tf_crnn.data_handler.
random_rotation
(img: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec400>, max_rotation: float = 0.1, crop: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec828>[source]¶ Rotates an image with a random angle. See https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders for formulae
Parameters: - img – Tensor
- max_rotation – maximum angle to rotate (radians)
- crop – boolean to crop or not the image after rotation
Returns:
-
tf_crnn.data_handler.
serving_single_input
(fixed_height: int = 32, min_width: int = 8)[source]¶ Serving input function needed for export (in TensorFlow). Features to serve :
- images : greyscale image
- input_filename : filename of image segment
- input_rgb: RGB image segment
Parameters: - fixed_height – height of the image to format the input data with
- min_width – minimum width to resize the image
Returns: serving_input_fn
-
class
tf_crnn.config.
Alphabet
(lookup_alphabet_file: str = None, blank_symbol: str = '$')[source]¶ Object for alphabet / symbols units.
Variables: - _blank_symbol (str) – Blank symbol used for CTC
- _alphabet_units (List[str]) – list of elements composing the alphabet. The units may be a single character or multiple characters.
- _codes (List[int]) – Each alphabet unit has a unique corresponding code.
- _nclasses (int) – number of alphabet units.
-
alphabet_units
¶
-
blank_symbol
¶
-
check_input_file_alphabet
(csv_filenames: List[str], discarded_chars: str = ';|\t\n\r\x0b\x0c', csv_delimiter: str = ';') → None[source]¶ Checks if labels of input files contains only characters that are in the Alphabet.
Parameters: - csv_filenames – list of the csv filename
- discarded_chars – discarded characters
- csv_delimiter – character delimiting field in the csv file
Returns:
-
codes
¶
-
classmethod
create_lookup_from_labels
(csv_files: List[str], export_lookup_filename: str, original_lookup_filename: str = None)[source]¶ Create a lookup dictionary for csv files containing labels. Exports a json file with the Alphabet.
Parameters: - csv_files – list of files to get the labels from (should be of format path;label)
- export_lookup_filename – filename to export alphabet lookup dictionary
- original_lookup_filename – original lookup filename to update (optional)
Returns:
-
n_classes
¶
-
class
tf_crnn.config.
Params
(**kwargs)[source]¶ Object for general parameters
Variables: - input_shape (Tuple[int, int]) – input shape of the image to batch (this is the shape after data augmentation). The original will either be resized or pad depending on its original size
- input_channels (int) – number of color channels for input image
- csv_delimiter (str) – character to delimit csv input files
- string_split_delimiter (str) – character that delimits each alphabet unit in the labels
- num_gpus (int) – number of gpus to use
- lookup_alphabet_file (str) – json file that contains the mapping alphabet units <-> codes
- csv_files_train (str) – csv filename which contains the (path;label) of each training sample
- csv_files_eval (str) – csv filename which contains the (path;label) of each eval sample
- output_model_dir (str) – output directory where the model will be saved and exported
- keep_prob_dropout (float) – keep probability
- num_beam_paths (int) – number of paths (transcriptions) to return for ctc beam search (only used when predicting)
- data_augmentation (bool) – if True augments data on the fly
- data_augmentation_max_rotation (float) – max permitted roation to apply to image during training (radians)
- input_data_n_parallel_calls (int) – number of parallel calls to make when using Dataset.map()
-
keep_prob_dropout
¶
-
class
tf_crnn.config.
TrainingParams
(**kwargs)[source]¶ Object for parameters related to the training.
Variables: - n_epochs (int) – numbers of epochs to run the training (default: 50)
- train_batch_size (int) – batch size during training (default: 64)
- eval_batch_size (int) – batch size during evaluation (default: 128)
- learning_rate (float) – initial learning rate (default: 1e-4)
- learning_decay_rate (float) – decay rate for exponential learning rate (default: .96)
- learning_decay_steps (int) – decay steps for exponential learning rate (default: 1000)
- evaluate_every_epoch (int) – evaluate every ‘evaluate_every_epoch’ epoch (default: 5)
- save_interval (int) – save the model every ‘save_interval’ step (default: 1e3)
- optimizer (str) – which optimizer to use (‘adam’, ‘rms’, ‘ada’) (default: ‘adam)
-
tf_crnn.config.
import_params_from_json
(model_directory: str = None, json_filename: str = None) → dict[source]¶ Read the exported json file with parameters of the experiment.
Parameters: - model_directory – Direcoty where the odel was exported
- json_filename – filename of the file
Returns: a dictionary containing the parameters of the experiment
-
tf_crnn.model.
crnn_fn
(features, labels, mode, params)[source]¶ CRNN model definition for
tf.Estimator
. Combinesdeep_cnn
anddeep_bidirectional_lstm
to define the model and adds loss computation and CTC decoder.Parameters: - features – dictionary with keys : ‘images’, ‘images_widths’, ‘filenames’
- labels – string containing the transcriptions. Flattend (1D) array with encoded label (one code per character)
- mode – TRAIN, EVAL, PREDICT
- params – dictionary with keys: ‘Params’, ‘TrainingParams’
Returns:
-
tf_crnn.model.
deep_bidirectional_lstm
(inputs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a390>, params: tf_crnn.config.Params, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a320>[source]¶ Recurrent part of the CRNN network. Uses a biderectional LSTM.
Parameters: - inputs – output of
deep_cnn
- params – parameters of the model
- summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns: Tuple : (tensor [width(time), batch, n_classes], raw transcription codes)
- inputs – output of
-
tf_crnn.model.
deep_cnn
(input_imgs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a3c8>, input_channels: int, is_training: bool, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a2b0>[source]¶ CNN part of the CRNN network.
Parameters: - input_imgs – input images [B, H, W, C]
- input_channels – input channels, 1 for greyscale images, 3 for RGB color images
- is_training – flag to indicate training or not
- summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns: tensor of shape [batch, final_width, final_height x final_features]
-
class
tf_crnn.loader.
PredictionModel
(model_dir: str, session: <sphinx.ext.autodoc.importer._MockObject object at 0x7f2033696a90> = None, signature: str = 'predictions')[source]¶ Helper class to load an exported model and apply it to image segments for transcription.
Variables: - session (tf.Session) –
tf.Session
within which to run the loading process - model – loaded exported model
Parameters: - model_dir – directory containing the saved model files.
- session –
tf.Session
to load the model - signature –
which signature to use to select the type of input :
- predictions (default) : input a grayscale image
- rgb_images : input a RGB image
- filename : input the filename of the image segment
- session (tf.Session) –