Skip to main content

8. Content Handling

In order to process documents and other files using a pipeline, you first need to load such a file into the pipeline. After loading, the file is automatically converted into a format called "content object". This is a wrapper around a document which provides all required information for such a document, like its name size, mime type etc., for easier processing inside the pipeline. The content object provides these attributes:

Attributes

AttributeTypeDescription
namestringThe name of the document.
createdlongThe unix timestamp in millis when this document was created.
lastUpdatedlongThe unix timetsmap in millis when this document was last modified.
mimeTypestringThe mime type of this document. If null, it is assumed to be text/plain by default. See here for a list of official mime types: https://www.iana.org/assignments/media-types/media-types.xhtml
sizelongThe size of the document in bytes or -1 in case the size cannot be determined.
dataobjectThe data of the document. Which format the data has depends on its mime type. For example, if mime type is application/json, then the data object returns a JSON document.

Here is an example to load a file from the drive service into the body scope and then access its attributes of the content object from there:

pipeline:
# Load document from drive and set it as content object in the body
- drive.read:
path: "invoice.pdf"
# Access the attributes of the content object in the body
- log:
message: "Name: #{body.name}, Size: #{body.size}"

Collection

In case multiple documents are loaded into a pipeline, such documents are grouped together in a so called content object collection. Such a collection has a similar meaning like a folder has in a local file system.

AttributeTypeDescription
parentContentCollectionReturns the parent collection if this is a nested collection, or null in case this is the root collection.
pathstringReturns the path to this collection, whereas / is returned in case it is the root collection. Example: /rootCol/subCol.
childrenContentObjectReturns a list of all content objects which are contained in the collection. This can not only be a document, but also another content collection in case they are nested.

A Content Collection is also a Content Object and therefore it also has all attributes of the Content Object.

Report an Issue

Your help is needed!

In case you're missing something on this page, you found an error or you have an idea for improvement, please click here to create a new issue. Another way to contribute is, to click Edit this page below and directly add your changes in GitHub. Many thanks for your contribution in order to improve PIPEFORCE!