Introduction to sjdatar

Author

Robson Bruno Dutra Pereira

The package sjdatar was created to share the datasets collected by the students of Supervised Learning course taught at Federal University of São João del-Rei (UFSJ) in the Industrial Engineering undergraduate. These datasets were used by the students to perform activities about regression and classification modeling. Now they can be used for new students and people interested in supervised learning. New datasets collected by the students will be available in new versions of the package.

Installing and loading the package

# Certify that devtools is installed
# if (!require("devtools")) install.packages("devtools")

# Install th package from github
# devtools::install_github("robsonpro/sjdatar")
library(sjdatar)

Dataset aluguel2025sjdr

Dataset containing information on property rentals in São João del Rei for the year 2025, including prices, property features, and location.

data(aluguel2025sjdr)
str(aluguel2025sjdr)
'data.frame':   191 obs. of  11 variables:
 $ bairro          : chr  "Tejuco" "" "Fabricas" "Dom Bosco" ...
 $ numero_quartos  : int  1 1 1 1 1 1 1 2 2 2 ...
 $ numero_banheiros: int  1 1 1 1 1 1 1 2 2 1 ...
 $ vagas_carro     : int  NA 0 0 0 0 0 0 0 2 0 ...
 $ area_gourmet    : chr  "N" "N" "N" "N" ...
 $ mobiliado       : chr  "N" "N" "S" "N" ...
 $ varanda         : chr  "N" "N" "N" "N" ...
 $ imobiliaria     : chr  "N" "N" "N" "S" ...
 $ tipo            : chr  "Casa" "Apartamento" "Apartamento" "Apartamento" ...
 $ preco           : num  600 750 900 900 800 600 800 1200 1500 900 ...
 $ Link            : chr  "https://www.facebook.com/marketplace/item/1445122673503046?ref=browse_tab&referral_code=marketplace_top_picks&r"| __truncated__ "https://www.facebook.com/marketplace/item/1593219631374355?ref=browse_tab&referral_code=marketplace_top_picks&r"| __truncated__ "https://www.facebook.com/marketplace/item/600487866484792?ref=browse_tab&referral_code=marketplace_general&refe"| __truncated__ "https://www.facebook.com/marketplace/item/593497410362949?ref=browse_tab&referral_code=marketplace_general&refe"| __truncated__ ...

Dataset apartamentos2024mg

Dataset containing information about apartments available for sale in Minas Gerais, for the year 2024, including sale prices, property characteristics, and location.

data(apartamentos2024mg)
str(apartamentos2024mg)
'data.frame':   632 obs. of  14 variables:
 $ Cidade      : chr  "Barbacena" "Barbacena" "Barbacena" "Barbacena" ...
 $ Bairro      : chr  "Santa Tereza" "Boa Morte" "Serra Verde" "Centro" ...
 $ Area        : num  90 74 54 88 210 133 90 19 104 208 ...
 $ Valor       : num  320000 296000 198500 349000 590000 ...
 $ Quartos     : int  2 2 2 3 3 2 3 1 3 3 ...
 $ Banheiros   : int  2 2 1 3 3 2 2 1 2 2 ...
 $ Vaga        : int  1 1 0 0 1 0 1 0 1 1 ...
 $ Varanda     : chr  "N" "S" "N" "N" ...
 $ Suite       : int  0 0 0 1 1 0 0 0 0 1 ...
 $ Area.Gourmet: chr  "S" "N" "N" "S" ...
 $ Terraco     : chr  "N" "N" "N" "N" ...
 $ Sala        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Copa        : chr  "S" "S" "S" "S" ...
 $ Piscina     : chr  "N" "N" "N" "N" ...

Dataset carrosusados2025webmotors

Dataset containing information about used cars collected from the Webmotors platform in 2025, including model, manufacturer, prices, vehicle characteristics, year, and mileage.

data(carrosusados2025webmotors)
str(carrosusados2025webmotors)
'data.frame':   200 obs. of  9 variables:
 $ Marca      : chr  "TOYOTA" "TOYOTA" "TOYOTA" "TOYOTA" ...
 $ Carro      : chr  "COROLLA" "COROLLA" "COROLLA" "COROLLA" ...
 $ Ano        : int  2024 2024 2024 2024 2023 2024 2019 2020 2022 2023 ...
 $ Km         : int  5000 31445 28416 31445 63346 31469 101455 56400 59957 64000 ...
 $ Cambio     : chr  "Automatico" "Automatico" "Automatico" "Automatico" ...
 $ Motor      : num  2 2 2 2 2 2 2 1.8 2 2 ...
 $ Valvulas   : num  16 16 16 16 16 16 16 16 16 16 ...
 $ Combustivel: chr  "Flex" "Flex" "Flex" "Flex" ...
 $ Preco      : int  175900 149400 135690 147990 134990 147990 108900 127990 129490 143900 ...

Dataset celularesnovos2025

Dataset containing information about new cell phones available in 2025, including price, model, and technical specifications.

data(celularesnovos2025)
str(celularesnovos2025)
'data.frame':   200 obs. of  10 variables:
 $ modelo       : chr  "Samsung Galaxy Z Fold 6" "Samsung Galaxy A56" "Samsung Galaxy S24 FE" "Samsung Galaxy A36" ...
 $ marca        : chr  "Samsung" "Samsung" "Samsung" "Samsung" ...
 $ ram          : int  16 8 8 8 4 6 4 4 8 12 ...
 $ armazenamento: int  1024 256 512 256 128 128 128 128 256 512 ...
 $ camerah      : int  8165 8165 8165 8165 8000 9238 9000 8000 8165 6330 ...
 $ cameral      : int  6124 6124 6124 6124 6000 6928 7000 6000 6124 4247 ...
 $ ano          : int  2024 2025 2024 2025 2025 2021 2021 2021 2023 2025 ...
 $ resolucao    : chr  "8K UHD" "4K (2160p)" "8K UHD" "4K (2160p)" ...
 $ bateria      : int  4400 5000 4700 5000 4000 4500 5000 5000 5000 3900 ...
 $ Preco        : int  6999 1833 2498 1599 1574 1529 1169 1559 764 6399 ...

Dataset celularesusados

Dataset containing information about used cell phones, including price, technical specifications, condition, and device features.

data(celularesusados)
str(celularesusados)
'data.frame':   200 obs. of  8 variables:
 $ modelo       : chr  "Poco X3 NFC 128GB" "Moto G10 64GB" "Galaxy A52 128GB" "Moto G30 128GB" ...
 $ marca        : chr  "Xiaomi" "Motorola" "Samsung" "Motorola" ...
 $ anolancamento: int  2020 2021 2021 2021 2021 2020 2021 2020 2020 2019 ...
 $ armazenamento: int  128 64 128 128 64 128 128 128 128 64 ...
 $ estado       : chr  "Usado" "Com avarias" "Usado" "Novo" ...
 $ notafiscal   : chr  "Sim" "Nao" "Nao" "Sim" ...
 $ fonte        : chr  "OLX" "OLX" "OLX" "OLX" ...
 $ preco        : int  926 1630 880 2364 2355 2040 2540 2130 2144 838 ...

Dataset feijoes

Dataset with measurements on distinct beans found in brazilian market.

data(feijoes)
str(feijoes)
'data.frame':   250 obs. of  5 variables:
 $ feijao     : chr  "fradinho" "fradinho" "fradinho" "fradinho" ...
 $ comprimento: num  9.16 8.76 8.81 8.11 9.08 ...
 $ largura    : num  6.26 6.59 6.78 6.39 6.42 6.4 6.28 6.68 7.26 6.5 ...
 $ espessura  : num  5.26 5.01 5.43 4.78 5.22 4.85 5.12 5.24 5.97 5.08 ...
 $ massa      : num  0.21 0.21 0.24 0.18 0.23 0.16 0.17 0.23 0.27 0.2 ...

Dataset folhasfrutas

Dataset on measurements obtained through image analysis of fruit tree leaves.

data(folhasfrutas)
str(folhasfrutas)
'data.frame':   235 obs. of  9 variables:
 $ especie     : chr  "goiaba" "goiaba" "goiaba" "goiaba" ...
 $ area        : num  2.359 2.291 1.887 1.232 0.971 ...
 $ perimeter   : num  5.58 5.82 5.1 5.22 3.68 ...
 $ radius.mean : num  0.91 0.912 0.824 0.748 0.601 ...
 $ radius.sd   : num  0.157 0.161 0.168 0.285 0.154 ...
 $ radius.min  : num  0.695 0.651 0.572 0.249 0.311 ...
 $ radius.max  : num  1.16 1.25 1.1 1.24 0.86 ...
 $ majoraxis   : num  2.31 2.23 2.15 2.34 1.68 ...
 $ eccentricity: int  789 763 819 941 868 909 841 798 908 883 ...