# DataSet

![DataSample Process][datasample-process]

DataSets are defined through various settings and information. Below is a detailed description of each field and setting option related to DataSets.

- **Name (name)**: Set the name of the DataSet. This name is used to uniquely identify the DataSet.

- **Description (description)**: Set a description of the DataSet. Used to explain the purpose, content, or features of the DataSet.

- **Type (type)**: Set the type of DataSet. You can choose one of two options:

  - **Manual (Manual)**: Represents a DataSet where users enter data directly.
  - **Automatic (Automatic)**: Represents a DataSet where data is collected through automated methods, such as sensors.

- **Data Key Set (dataKeySet)**: Apply items registered on the Data Key Set master page to the DataSet. The Data Key Set defines the data fields and properties to be used in the DataSet.

- **Partition Keys (partitionKeys)**: Set partition keys used to divide and organize data. Partition keys logically group and store data and provide efficiency in data management and retrieval. Using partition keys, data can be quickly filtered and queried.

- **Schedule (schedule)**: Set the data input cycle. Once the cycle is set, data input tasks will be performed according to that cycle.

- **Timezone (timezone)**: Set the reference timezone for the schedule. Used for time zone conversion related tasks.

- **Normal Data Scenario (normalScenario)**: Sets the scenario that starts when normal data is generated. Used when subsequent data processing is needed.

- **Outlier Scenario (outlierScenario)**: Sets the scenario that starts when data exceeding the normal range is generated. Used when subsequent data processing is needed.

- **Review Approval Line (reviewApprovalLine)**: Sets the approval line for the data review process when normal data is generated. Only applies if review work is needed.

- **Outlier Approval Line (outlierApprovalLine)**: Sets the approval line for the data action process when data exceeding the normal range is generated.

- **Supervisory Role (supervisoryRole)**: Set the supervisory role for the DataSet. The supervisor is responsible for reviewing collected data and outlier data. They also have the authority to assign data collection tasks.

- **Entry Role (entryRole)**: Set the role with entry permissions. This role is responsible for performing data entry tasks.

- **Assignees (assignees)**: Set the person in charge of handling data collection tasks.

- **Review Approval Line (reviewApprovalLine)**: Set the approval line for verifying the data collection results by the supervisor.

- **Outlier Approval Line (outlierApprovalLine)**: Set the approval line for handling outlier data.

- **Requires Review (requiresReview)**: Set whether data review is necessary. If set, the administrator's review and review approval process automatically begins.

- **Entry Type (entryType)**: Set the type of entry screen. You can choose one of the following options:

  - **Generated**: Automatically use the implemented screen.
  - **Board**: Use the Board screen.
  - **Page**: Move to the implemented page. A sub URL (suburl) is required.
  - **External URL**: Move to an external page. A full URL is required.

- **Entry View (entryView)**: Set the value according to the type of entry screen. If the Board screen type is selected, set the corresponding value.

- **Monitor Type (monitorType)**: Set the type of monitor screen. Provides the same options as the entry screen type.

- **Monitor View (monitorView)**: Set the value corresponding to the type of monitor screen.

- **Report Type (reportType)**: Set the type of report screen. Provides partially shared options with the entry screen type.

  - **Jasper**: Render a Jasper report page according to Jasper server settings to generate a report.
  - **Shiny**: Render a Shiny application page according to Shiny server settings to generate a report.

- **Report View (reportView)**: Set the value according to the type of report screen. In the case of Jasper or Shiny, a valid sub URL (suburl) on the respective server is needed.

- **Report Template (reportTemplate)**: Upload the template file required for the report screen.

- **Use Case (useCase)**: Indicates the purpose or use of the DataSet. Includes common use case options like "QA," "CCP," "SPC."

- **Data Items (dataItems)**: Set the data items for the DataSet.

- **Data Entry Schedule (schedule)**: Set the data entry cycle.

- **Data Entry Timezone (timezone)**: Set the timezone related to data entry for the DataSet.

- **Schedule ID (scheduleId)**: Set the schedule ID for the DataSet.

- **Summary Period (summaryPeriod)**: Setting a summary period automatically registers a summary task in the scheduler. Currently, weekly and monthly period summaries are not provided considering workload. Period options include hour (hour), shift (shift), workdate (workdate), and daily (daily).

- **Next Summary Execution Date (summarySchedule)**: Notify the next summary execution date based on the set summary period. Summary tasks are performed according to the set cycle.

- **Creation Date (createdAt)**: Indicates the date and time the DataSet was created.

- **Last Update Date (updatedAt)**: Indicates the date and time the DataSet was last updated.

- **Creator (creator)**: Indicates the user who created the DataSet.

- **Updater (updater)**: Indicates the user who last updated the DataSet.

These settings and information help define the DataSet and use it, aiding in a clear understanding of the DataSet's purpose and use.

## Partition Key Utilization Example

**Partition Key Utilization Example (Athena API):**

Athena is one of Amazon Web Services (AWS) query services, which allows easy querying and analysis of data stored in S3. Let's look at how to use partition keys to quickly search and filter data in Athena.

Assumptions:

- The DataSet stores daily order data.
- The partition key is set to "Date."

**Athena Query Example:**

The following is an example query using Athena to search for order data on a specific date.

```sql
SELECT *
FROM "my_dataset"
WHERE "partition_key" = '2023-12-01';
```

This query searches for all order data in the "my_dataset" DataSet where the partition key "Date" is '2023-12-01'. Using partition keys allows for quick retrieval of desired data even in large data sets.

[datasample-process]: ./images/datasample-process.png