# Files (WIP)

{% hint style="danger" %}
ONLY use the file connector included in the [edge-connector](/~/changes/Q625pQNc0nXqVGPtyjAY/building-apps/integration/edge-connector.md) to import and manage files on your own devices. The file connector under data connectors in the app builder does not have file permanence and should not be used.
{% endhint %}

The Files connector can read specific file types, write .csv files, and list folder contents. It can also move, copy, or delete files. The connector operates on the Heisenware server in your files directory. To access files on your own device, please go to the [edge-connector](/~/changes/Q625pQNc0nXqVGPtyjAY/building-apps/integration/edge-connector.md) and return here, once you have successfully deployed it.

{% hint style="info" %}
The path the file connector is operating in is your starting point for all the path input fields. To see the contents of the root folder and all subfolders, use the [`browse`](#browse-a-directory) function and insert a `.` into the path input box.

To [go up a level](https://linuxize.com/post/linux-cd-command/#the-parent-directory) in the folder structure, use `..` in the beginning of the input field.
{% endhint %}

## Read a file

The `readFile` function is a cumulative function of the others, currently supporting `.csv`, `.xlsx`, `.txt`, `.pdf` and `.xml` file types. Simply insert the path relative to the root folder into the input box. Parameters can be added in object format `parameter: value`. For possible parameters of each file type see below.

## Read a .csv file

To read a character separated values-file use the `readCsv` function and insert the path into the input box. Parameters can be added in object format in the second box.

<table><thead><tr><th width="165">Parameter</th><th width="88">Default</th><th>Description</th></tr></thead><tbody><tr><td>delimiter</td><td>,</td><td>Delimiter used for separating columns. Use "auto" if delimiter is unknown in advance. Use an array to give a list of potential delimiters e.g. [", " "|", "$"].</td></tr><tr><td>noheader</td><td>false</td><td>Indicating csv data has no header row and first row is data row.</td></tr><tr><td>checkColumn</td><td>false</td><td>Check whether column number of a row is the same as headers. If column number mismatched headers number, an error of "mismatched_column" will be emitted.</td></tr><tr><td>checkType</td><td>false</td><td>Turns on and off type interpretation</td></tr><tr><td>quote</td><td>"</td><td>If a column contains delimiter, it is able to use quote character to surround the column content. e.g. "hello, world" won't be split into two columns while parsing. Set to "off" will ignore all quotes.</td></tr><tr><td>trim</td><td>true</td><td>Indicate if parser trim off spaces surrounding column content. e.g. " content " will be trimmed to "content".</td></tr><tr><td>ignoreEmpty</td><td>false</td><td>Ignore the empty value in CSV columns. If a column value is not given, set this to true to skip them.</td></tr><tr><td>includeColumns</td><td></td><td>This parameter instructs the parser to include only those columns as specified by the regular expression. Example: /(name|age)/ will parse and include columns whose header contains "name" or "age"</td></tr><tr><td>ignoreColumns</td><td></td><td>This parameter instructs the parser to ignore columns as specified by the regular expression. Example: /(name|age)/ will ignore columns whose header contains "name" or "age"</td></tr></tbody></table>

See <https://www.npmjs.com/package/csvtojson#parameters> for more parameter options.

## Write a .csv file

You can write `.csv` files with the `writeCsv` function. Simply insert or link some JSON string in the first input box and specify a path and filename for the file to be saved to. The delimiter of the values will always be a comma.

{% hint style="warning" %}
Parameters from the [readCSV](#read-a-.csv-file) function can currently NOT be used for writing `.csv` files.
{% endhint %}

## Read an Excel sheet (.xlsx)

Use the `readXlsx` function to ingest Excel files.

Output Example:

```
{
    sheet1: [{
        A: 'data of cell A1',
        B: 'data of cell B1',
        C: 'data of cell C1'
    },
    {
        A: 'data of cell A2',
        B: 'data of cell B2',
        C: 'data of cell C2'
    }],
    sheet2: [{
        A: 'data of cell A1',
        B: 'data of cell B1',
        C: 'data of cell C1'
    }]
}
```

<table><thead><tr><th width="146">Parameter</th><th width="301">Example</th><th>Description</th></tr></thead><tbody><tr><td>header</td><td><code>{rows: 1}</code></td><td>This is the number of rows that will be skipped and will not be present in the resulting object. Counting from top to bottom.</td></tr><tr><td>sheets</td><td><code>['sheet2']</code></td><td>Only get the data from a specific sheet.</td></tr><tr><td>columnToKey</td><td><p>static example:</p><p><code>{ A: 'id',</code></p><p><code>B: 'firstName' }</code> <br>dynamic example:<br><code>{ '*': '{{columnHeader}}' }</code></p></td><td>Name columns in the output.<br>It is possible to use a value from the sheet with e.g. <code>'{{A1}}'</code> or <code>{{columnHeader}}</code>, which will follow the header parameter settings. To dynamically name every column, use <code>'*'</code>. Omitting columns ignores them for the output, even if specified in the range parameter.</td></tr><tr><td>range</td><td><code>'A2:B3'</code></td><td>Defines the range from which to include data. If your column range goes into double characters (e.g.: AG), set the range from A until Z, because the submodule currently works with alphabetical sorting internally.</td></tr><tr><td>sheetStubs</td><td>false</td><td>To include empty cells (NULL values), set this to <code>true</code>.</td></tr></tbody></table>

It is also possible to nest most parameters for different options per sheet. See [https://github.com/DiegoZoracKy/convert-excel-to-json](https://github.com/DiegoZoracKy/convert-excel-to-json?tab=readme-ov-file#identifying-header-rows) for all parameter options and examples.

## Read a Word file (.docx)

The `readDocx` function is working with [textract](https://www.npmjs.com/package/textract) in the background. You can ingest Word documents with it.

<table><thead><tr><th width="195">Parameter</th><th width="90">Default</th><th>Description</th></tr></thead><tbody><tr><td>preserveLineBreaks</td><td>false</td><td>Pass this in as <code>true</code> and textract will not strip any line breaks. Line breaks are preserved most of the time even with <code>false</code>, but to make sure to preserve line breaks, set this to <code>true</code>.</td></tr><tr><td>preserveOnlyMultipleLineBreaks</td><td>false</td><td>Some extractors, like PDF, insert line breaks at the end of every line, even in the middle of a sentence. If this option is set to <code>true</code>, then any instances of a single line break are removed but multiple line breaks are preserved. Check your output with this option, though, as this doesn't preserve paragraphs unless there are multiple breaks.</td></tr></tbody></table>

For more options visit <https://www.npmjs.com/package/textract#configuration>.

{% hint style="info" %}
The following functions rely on the same textract component, but for different file types different parameters might be sensible.
{% endhint %}

## Read a PowerPoint file (.pptx)

The `readPptx` function allows you to ingest PowerPoint files and also works with textract. For parameter options see [Read a Word file](#read-a-word-file-.docx).

## Read an HTML file (.html & .htm)

The readHtml function allows for reading of HTML files in .html and .htm format. It also works with textract, so it has the same parameter options as the function for [Word](#read-a-word-file-.docx) files.

Additionally, there is `includeAltText`: When extracting HTML, whether or not to include `alt` text with the extracted text. By default this is `false`.

## Read a text file (.txt)

Reading a text file is also textract based (see [Word file](#read-a-word-file-.docx)) and can be done with the function `readTxt`.

## Read a markdown file (.md)

The function `readMd` is also textract based (see [Word file](#read-a-word-file-.docx)) and can be used to ingest markdown files.

## Read a PDF file

PDFs can be read with the function `readPdf`, which is also textract based like the [Word](#read-a-word-file-.docx) function. Additionally to the parameters explained above, for PDF functions there is also `pdftotextOptions`. Parameters entered here need to be nested with `pdftotextOptions` being the main key and all sub options being nested in the input object like `pdftotextOptions: {ownerPassword: 123}`.

`pdftotextOptions`: This is a proxy options object to the library textract uses for pdf extraction: [pdf-text-extract](https://github.com/nisaacson/pdf-text-extract). Options include `ownerPassword`, `userPassword` if you are extracting text from password protected PDFs.

{% hint style="warning" %}
Textract modifies the pdf-text-extract `layout` default so that, instead of \
`layout: layout`, it uses `layout:raw`. Do not modify this without understanding what problems might arise. See [this GH issue](https://github.com/dbashford/textract/issues/75) for why textract overrides that library's default.
{% endhint %}

## Read a .xml file

To read extended markup language files use the `readXml` function. It is also textract based and therefore offers the same parameters as the [Word](#read-a-word-file-.docx) function.

## Move a file

Use the `moveFile` function to remove a file from one location and insert it in a new one. Simply insert the old path, including the file name, in the first input box and the new path, also inclusive of file name, in the second input box. Moving a file into another folder is only possible if it already exists.

<figure><img src="/files/yShli0E9vjGyjrDOSaHd" alt=""><figcaption><p>Moving a file from the root folder into a subfolder while keeping the name.</p></figcaption></figure>

## Copy a file

With the `copyFile` function, you can duplicate files and simultaneously move the copy to a new location.

<figure><img src="/files/NGgEP7BC95IKyWDDXQF1" alt=""><figcaption><p>Duplicating a file into the same folder.</p></figcaption></figure>

## Delete a file

To delete a file, use the `deleteFile` function.

<figure><img src="/files/rjfGvBSgbT2adwIPQOdz" alt="" width="563"><figcaption><p>Deleting the file2.txt file.</p></figcaption></figure>

{% hint style="danger" %}
Files get deleted directly and cannot be recovered!
{% endhint %}

## Create a new folder

With the `createFolder` function, you can create a new subfolder in one of the existing ones.

<figure><img src="/files/GkkodyOIRU5OKWzmsgH3" alt="" width="563"><figcaption><p>Creating a subfolder named "subfolder" in the root folder.</p></figcaption></figure>

## Delete a folder

Delete subfolders with the `deleteFolder` function.

<figure><img src="/files/brG53DmhJwgQfvKGc6Gp" alt="" width="563"><figcaption><p>Deleting a folder named "subfolder" from the root directory.</p></figcaption></figure>

{% hint style="danger" %}
Folders, even if they are not empty, get deleted completely, including all of the content. They cannot be recovered afterwards!
{% endhint %}

## Browse a directory

To show the content of a folder and all its subfolders, insert the path to the folder in the input field.

<figure><img src="/files/yCjQhYEjxA1Hor7GhOJl" alt="" width="563"><figcaption><p>Displaying the contents of the root folder of the file connector.</p></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.heisenware.com/~/changes/Q625pQNc0nXqVGPtyjAY/building-apps/integration/data-connectors/files-wip.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
