This vignette will show how to create and edit Data Packages.
The new_datapackage() function creates a new Data
Package
> library(datapackage)
> dir <- tempfile()
> dp <- new_datapackage(dir, name = "example", 
+   title = "An Example Data Package")
> dp
[example] An Example Data Package
Location: </tmp/RtmpWGVvWg/file90411d09a4f8>
<NO RESOURCES>This will return an editabledatapackage. This means that
any changes to the Data Package are immediately saved to the
datapackage.json file and when reading any properties these
are read from the file. It is, therefore, possible to manually edit the
datapackage.json file while working in R with the Data
Package.
> list.files(dir)
[1] "datapackage.json"Using methods such as dp_title() and
dp_description() the properties of the Data Package can be
modified.
> dp_description(dp) <- "This is a description of the Data Package"The description<-() method also accepts a character
vector of length > 1. This makes it easy to read the contents of the
description from file as it can be difficult to write long descriptions
directly from R-code. It is possible to use markdown in the
description.
dp_description(dp) <- readLines("description.md")The following methods a currently (when writing the vignette) supported:
dp_title<-()dp_contributors<-() and
dp_add_contributor<-()dp_description<-()dp_id<-()dp_name<-()dp_created<-()dp_keywords<-()dp_property<-(): this function also allow custom
properties.For an up to data list run the following:
[.R #n5} methods(class = "datapackage") |> (\(x) x[grep("<-", x)])()
Below an example of adding a contributor to the package
> dp_add_contributor(dp) <- new_contributor("Jane Doe", role = "author",
+   email = "j.doe@organisation.org")In this example we will save the iris dataset to a new
datapackage.
> data(iris)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosaIn order to store a new dataset in a Data Package we need to do two things. First, we need to create a new Data Resource in the package. Second, using the specification of the Data Resource we need to save the actual dataset at the location specified in the Data Resource.
It is possible to edit the datapackage.json file to
create the new Data Resource. The package also has a function
dp_generate_dataresource() to generate a skeleton Data
Resource for a given dataset:
> res <- dp_generate_dataresource(iris, "iris") Again these can be further modified using methods such as
dp_title() and dp_property():
> dp_title(res) <- "The Iris dataset"Let’s add the resources to the Data Package.
> dp_resources(dp) <- resIn this case the Data Package does not yet contain Data Resources. Should the Data Package contain Data Resources with the same name, these will be overwritten by the new Data Resource.
We are now ready to write the dataset. For this we can use the
dp_write_data() method:
> dp_write_data(dp, resource_name = "iris", data = iris)When some of the field in the Data Resource have categories that are stored in a separate Data Resource, this function will by default also write any categories lists associated with the Data Resource.
> readLines(file.path(dir, "iris.csv"), n = 10) |> writeLines()
"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
5.1,3.5,1.4,0.2,1
4.9,3,1.4,0.2,1
4.7,3.2,1.3,0.2,1
4.6,3.1,1.5,0.2,1
5,3.6,1.4,0.2,1
5.4,3.9,1.7,0.4,1
4.6,3.4,1.4,0.3,1
5,3.4,1.5,0.2,1
4.4,2.9,1.4,0.2,1And of course we can open the Data Package and read the data back in:
> dp2 <- open_datapackage(dir)
> iris2 <- dp2 |> dp_resource("iris") |> dp_get_data(convert_categories = "to_factor")
> all.equal(iris, iris2, check.attributes = FALSE)
[1] TRUEBy default dp_generate_dataresource() will generate
categories properties for factor fields:
> data(chickwts)
> res <- dp_generate_dataresource(chickwts, "chickwts") 
> dp_resources(dp) <- res
> (feed_name <- dp_resource(dp, "chickwts") |> 
+   dp_field("feed") |> dp_property("categories"))
[[1]]
[[1]]$value
[1] 1
[[1]]$label
[1] "casein"
[[2]]
[[2]]$value
[1] 2
[[2]]$label
[1] "horsebean"
[[3]]
[[3]]$value
[1] 3
[[3]]$label
[1] "linseed"
[[4]]
[[4]]$value
[1] 4
[[4]]$label
[1] "meatmeal"
[[5]]
[[5]]$value
[1] 5
[[5]]$label
[1] "soybean"
[[6]]
[[6]]$value
[1] 6
[[6]]$label
[1] "sunflower"Here, the list of categories is stored directly in the
categories property. It is also possible to store the list
of categories in a Data Resource
> res <- dp_generate_dataresource(chickwts, "chickwts", 
+   categories_type = "resource") 
> dp_resources(dp) <- res
> (feed_name <- dp_resource(dp, "chickwts") |> 
+   dp_field("feed") |> dp_property("categories"))
$resource
[1] "feed-categories"Here the categories property points to Data Resource.
dp_write_data() will automatically create this resource by
default when writing the data:
> dp_write_data(dp_resource(dp, "chickwts"), data = chickwts, write_categories = TRUE)
> list.files(dir)
[1] "chickwts.csv"        "datapackage.json"    "feed-categories.csv"
[4] "iris.csv"           
> dp_resource(dp, "feed-categories") |> dp_get_data()
  value     label
1     1    casein
2     2 horsebean
3     3   linseed
4     4  meatmeal
5     5   soybean
6     6 sunflowerBy default the package will generate a list of categories for factor variables. The levels will be numbered using sequential integers starting from 1. The example below shows how different codes can be used.
In order to write the correct codes we will also first have to generate the and save the dataset with the correct codes. In the example below we do this using R, but it is of course also possible to generate the CSV using other methods (e.g. manual editing):
> codelist <- data.frame(
+   value = c(101, 102, 103, 202, 203, 204),
+   label = c("casein", "horsebean", "linseed", "meatmeal", 
+     "soybean", "sunflower")
+ )
> res <- dp_generate_dataresource(codelist, "feed-categories")
> res
[feed-categories] 
Fields:
[value] <number> 
[label] <string> 
Selected properties:
path     :"feed-categories.csv"
format   :"csv"
mediatype:"text/csv"
encoding :"utf-8"
> dp_resources(dp) <- res
> codelistres <- dp |> dp_resource("feed-categories")
> dp_write_data(codelistres, data = codelist, write_categories = FALSE)This creates the correct CSV-files:
> readLines(file.path(dir, "feed-categories.csv")) |> writeLines()
"value","label"
101,"casein"
102,"horsebean"
103,"linseed"
202,"meatmeal"
203,"soybean"
204,"sunflower"When we now write the dataset to file it will use this dataset - as
long as we don’t overwrite it. Therefore, the
write_categories = FALSE:
> dp_write_data(dp, resource_name = "chickwts", data = chickwts, write_categories = FALSE)We can see that the correct codes are used in the CSV-file:
> readLines(file.path(dir, "chickwts.csv"), n = 10) |> writeLines()
"weight","feed"
179,102
160,102
136,102
227,102
217,102
168,102
108,102
124,102
143,102Editing of existing Data Packages is also possible. Use the
readonly = FALSE argument when opening the Data
Package:
> edit <- open_datapackage(dir, readonly = FALSE)
> dp_id(edit) <- "iris_chkwts"
> dp_created(edit) <- Sys.time() |> as.Date()Showing the complete datapackage.json file after all of
the edits in this vignette:
> readLines(file.path(dir, "datapackage.json")) |> writeLines()
{
  "name": "example",
  "title": "An Example Data Package",
  "resources": [
    {
      "name": "iris",
      "path": "iris.csv",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "fields": [
          {
            "name": "Sepal.Length",
            "type": "number"
          },
          {
            "name": "Sepal.Width",
            "type": "number"
          },
          {
            "name": "Petal.Length",
            "type": "number"
          },
          {
            "name": "Petal.Width",
            "type": "number"
          },
          {
            "name": "Species",
            "type": "integer",
            "categories": [
              {
                "value": 1,
                "label": "setosa"
              },
              {
                "value": 2,
                "label": "versicolor"
              },
              {
                "value": 3,
                "label": "virginica"
              }
            ]
          }
        ]
      },
      "title": "The Iris dataset"
    },
    {
      "name": "chickwts",
      "path": "chickwts.csv",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "fields": [
          {
            "name": "weight",
            "type": "number"
          },
          {
            "name": "feed",
            "type": "integer",
            "categories": {
              "resource": "feed-categories"
            }
          }
        ]
      }
    },
    {
      "name": "feed-categories",
      "path": "feed-categories.csv",
      "format": "csv",
      "mediatype": "text/csv",
      "encoding": "utf-8",
      "schema": {
        "fields": [
          {
            "name": "value",
            "type": "number"
          },
          {
            "name": "label",
            "type": "string"
          }
        ]
      }
    }
  ],
  "description": "This is a description of the Data Package",
  "contributors": [
    {
      "title": "Jane Doe",
      "role": "author",
      "email": "j.doe@organisation.org"
    }
  ],
  "id": "iris_chkwts",
  "created": "2025-04-18"
}