2.3 Ejemplo WoS data

Ejemplo wosdata.R en wosdata.zip. Ver paquete scimetr.

Empleando la función ImportSources.wos() se importaron ficheros de texto descargados de WoS (que por defecto tienen una limitación de 500 registros). Posteriormente se creo una base de datos como una lista de tablas con la función CreateDB.wos(), que finalmente se almacenó en el fichero db_udc_2015.rds.

db <- readRDS("data/wosdata/db_udc_2015.rds")
str(db, 1)
## List of 11
##  $ Docs      :'data.frame':  856 obs. of  26 variables:
##  $ Authors   :'data.frame':  4051 obs. of  4 variables:
##  $ AutDoc    :'data.frame':  5511 obs. of  2 variables:
##  $ Categories:'data.frame':  189 obs. of  2 variables:
##  $ CatDoc    :'data.frame':  1495 obs. of  2 variables:
##  $ Areas     :'data.frame':  121 obs. of  2 variables:
##  $ AreaDoc   :'data.frame':  1364 obs. of  2 variables:
##  $ Addresses :'data.frame':  3655 obs. of  5 variables:
##  $ AddAutDoc :'data.frame':  7751 obs. of  3 variables:
##  $ Journals  :'data.frame':  520 obs. of  12 variables:
##  $ label     : chr ""
##  - attr(*, "variable.labels")= Named chr [1:62] "Publication type" "Author" "Book authors" "Editor" ...
##   ..- attr(*, "names")= chr [1:62] "PT" "AU" "BA" "BE" ...
##  - attr(*, "class")= chr "wos.db"

Puede ser recomendable añadir a los datos un atributo variable.labels que contenga un vector de etiquetas de las variables y empleando como nombres de las componentes las propias variables:

variable.labels <- attr(db, "variable.labels")
knitr::kable(head(as.data.frame(variable.labels)),
             caption = "Variable labels")
Tabla 2.1: Variable labels
variable.labels
PT Publication type
AU Author
BA Book authors
BE Editor
GP Group author
AF Author full

Las tablas de datos con este atributo son compatibles con RStudio. Por ejemplo, también se mostrarán las etiquetas al abrirla con View()

Docs <- db$Docs # No copia los datos (crea otro objeto que apunta a los mismos datos)
attr(Docs, "variable.labels") <- variable.labels[names(Docs)]
# View(Docs)

Para combinar tablas podemos emplear match(x, table). Por ejemplo, el siguiente código permite añadir el nombre de la revista a la tabla de documentos, combinándola con la de revistas:

str(Docs)
## 'data.frame':    856 obs. of  26 variables:
##  $ idd: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ idj: int  260 37 86 272 64 429 14 408 333 214 ...
##  $ TI : chr  "Fractionation of Miscanthus x giganteus via modification of the Formacell process" "Role of Temperature and Pressure on the Multisensitive Multiferroic Dicyanamide Framework [TPrA][Mn(dca)(3)] wi"| __truncated__ "Methane and carbon dioxide emissions from constructed wetlands receiving anaerobically pretreated sewage" "Exceptionally Inert Lanthanide(III) PARACEST MRI Contrast Agents Based on an 18-Membered Macrocyclic Platform" ...
##  $ PT : Factor w/ 2 levels "Journal","Series": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DT : Factor w/ 9 levels "Article","Book Review",..: 1 1 1 1 5 1 6 9 1 1 ...
##  $ NR : int  40 45 35 78 2 59 0 53 20 50 ...
##  $ TC : int  1 5 2 2 1 0 0 10 0 1 ...
##  $ Z9 : int  1 5 2 2 1 0 0 10 0 1 ...
##  $ U1 : int  5 0 10 4 0 1 0 26 0 3 ...
##  $ U2 : int  6 0 61 21 2 5 0 66 2 4 ...
##  $ PD : chr  "DEC 23" "DEC 21" "DEC 15" "DEC 14" ...
##  $ PY : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ VL : chr  "77" "54" "538" "21" ...
##  $ IS : chr  "" "24" "" "51" ...
##  $ PN : chr  "" "" "" "" ...
##  $ SU : chr  "" "" "" "" ...
##  $ SI : chr  "" "" "" "" ...
##  $ MA : chr  "" "" "" "" ...
##  $ BP : chr  "275" "11680" "824" "18662" ...
##  $ EP : chr  "281" "11687" "833" "18670" ...
##  $ AR : chr  "" "" "" "" ...
##  $ DI : chr  "10.1016/j.indcrop.2015.08.066" "10.1021/acs.inorgchem.5b01652" "10.1016/j.scitotenv.2015.08.090" "10.1002/chem.201502937" ...
##  $ D2 : chr  "" "" "" "" ...
##  $ PG : int  7 8 10 9 2 28 3 16 4 25 ...
##  $ UT : num  3.66e+11 3.67e+11 3.63e+11 3.68e+11 3.66e+11 ...
##  $ an : num  3 9 4 8 3 5 8 5 3 3 ...
##  - attr(*, "variable.labels")= Named chr [1:26] NA NA "Title" "Publication type" ...
##   ..- attr(*, "names")= chr [1:26] NA NA "TI" "PT" ...
str(db$Journals) # View(db$Journals)
## 'data.frame':    520 obs. of  12 variables:
##  $ idj: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ SO : chr  "TISSUE ANTIGENS" "ACTA NEUROLOGICA SCANDINAVICA" "ACTA PSYCHOLOGICA" "AFINIDAD" ...
##  $ SE : chr  "" "" "" "" ...
##  $ BS : chr  "" "" "" "" ...
##  $ LA : chr  "English" "English" "English" "English" ...
##  $ PU : chr  "WILEY-BLACKWELL" "WILEY-BLACKWELL" "ELSEVIER SCIENCE BV" "ASOC QUIMICOS" ...
##  $ PI : chr  "HOBOKEN" "HOBOKEN" "AMSTERDAM" "BARCELONA" ...
##  $ PA : chr  "111 RIVER ST, HOBOKEN 07030-5774, NJ USA" "111 RIVER ST, HOBOKEN 07030-5774, NJ USA" "PO BOX 211, 1000 AE AMSTERDAM, NETHERLANDS" "INST QUIMICO SARRIA, VIA AUGUSTA, 390, 08017 BARCELONA, SPAIN" ...
##  $ SN : chr  "0001-2815" "0001-6314" "0001-6918" "0001-9704" ...
##  $ EI : chr  "1399-0039" "1600-0404" "1873-6297" "" ...
##  $ J9 : chr  "TISSUE ANTIGENS" "ACTA NEUROL SCAND" "ACTA PSYCHOL" "AFINIDAD" ...
##  $ JI : chr  "Tissue Antigens" "Acta Neurol. Scand." "Acta Psychol." "Afinidad" ...
ii <- match(Docs$idj, db$Journals$idj)
docs2 <- Docs[, c("PY", "TI")]
docs2$Journal <- db$Journals$SO[ii]
head(docs2)
##     PY
## 1 2015
## 2 2015
## 3 2015
## 4 2015
## 5 2015
## 6 2015
##                                                                                                                                            TI
## 1                                                           Fractionation of Miscanthus x giganteus via modification of the Formacell process
## 2 Role of Temperature and Pressure on the Multisensitive Multiferroic Dicyanamide Framework [TPrA][Mn(dca)(3)] with Perovskite-like Structure
## 3                                    Methane and carbon dioxide emissions from constructed wetlands receiving anaerobically pretreated sewage
## 4                               Exceptionally Inert Lanthanide(III) PARACEST MRI Contrast Agents Based on an 18-Membered Macrocyclic Platform
## 5                                                                                      Community-Acquired Pneumonia Requiring Hospitalization
## 6                                                                             Low-latency Java communication devices on RDMA-enabled networks
##                                             Journal
## 1                     INDUSTRIAL CROPS AND PRODUCTS
## 2                               INORGANIC CHEMISTRY
## 3                  SCIENCE OF THE TOTAL ENVIRONMENT
## 4                      CHEMISTRY-A EUROPEAN JOURNAL
## 5                   NEW ENGLAND JOURNAL OF MEDICINE
## 6 CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Si solo nos interesa hacer un filtrado puede resultar más cómodo emplear el operador %in% (?'%in%'). Por ejemplo, podemos buscar los documentos correspondientes a revistas (que contengan "Chem" en el nombre ISO de la revista). Para ello utilizamos la función grepl() que busca las coincidencias de un patrón dentro de cada elemento de un vector de caracteres:

iidj <- with(db$Journals, idj[grepl('Chem', JI)])
db$Journals$JI[iidj]
##  [1] "J. Am. Chem. Soc."                 
##  [2] "Inorg. Chem."                      
##  [3] "J. Chem. Phys."                    
##  [4] "J. Chem. Thermodyn."               
##  [5] "J. Solid State Chem."              
##  [6] "Chemosphere"                       
##  [7] "Antimicrob. Agents Chemother."     
##  [8] "Trac-Trends Anal. Chem."           
##  [9] "Eur. J. Med. Chem."                
## [10] "J. Chem. Technol. Biotechnol."     
## [11] "J. Antimicrob. Chemother."         
## [12] "Food Chem."                        
## [13] "Cancer Chemother. Pharmacol."      
## [14] "Int. J. Chem. Kinet."              
## [15] "Chem.-Eur. J."                     
## [16] "J. Phys. Chem. A"                  
## [17] "New J. Chem."                      
## [18] "Chem. Commun."                     
## [19] "Chem. Eng. J."                     
## [20] "Comb. Chem. High Throughput Screen"
## [21] "Mini-Rev. Med. Chem."              
## [22] "Phys. Chem. Chem. Phys."           
## [23] "Org. Biomol. Chem."                
## [24] "J. Chem Inf. Model."               
## [25] "ACS Chem. Biol."                   
## [26] "Environ. Chem. Lett."              
## [27] "Anal. Bioanal. Chem."              
## [28] "J. Cheminformatics"                
## [29] "J. Mat. Chem. B"
idd <- with(Docs, idj %in% iidj)
which(idd)
##  [1]   2   4  16  23  43  69 119 126 138 175 188 190 203 208
## [15] 226 240 272 337 338 341 342 357 382 385 386 387 388 394
## [29] 411 412 428 460 483 518 525 584 600 604 605 616 620 665
## [43] 697 751 753 775 784 796 806 808 847 848
# View(Docs[idd, ])
head(Docs[idd, 1:3])
##    idd idj
## 2    2  37
## 4    4 272
## 16  16 195
## 23  23 436
## 43  43 455
## 69  69  37
##                                                                                                                                                                                                                                 TI
## 2                                                                                      Role of Temperature and Pressure on the Multisensitive Multiferroic Dicyanamide Framework [TPrA][Mn(dca)(3)] with Perovskite-like Structure
## 4                                                                                                                    Exceptionally Inert Lanthanide(III) PARACEST MRI Contrast Agents Based on an 18-Membered Macrocyclic Platform
## 16 Reduced susceptibility to biocides in Acinetobacter baumannii: association with resistance to antimicrobials, epidemiological behaviour, biological cost and effect on the expression of genes encoding porins and efflux pumps
## 23                                                       Two Catechol Siderophores, Acinetobactin and Amonabactin, Are Simultaneously Produced by Aeromonas salmonicida subsp salmonicida Sharing Part of the Biosynthetic Pathway
## 43                                                                                                                                                                        Conservation of stony materials in the built environment
## 69                                                                                                                                                         Gd3+-Based Magnetic Resonance Imaging Contrast Agent Responsive to Zn2+

Como ejemplo adicional, se buscan los documentos correspondientes a autores (que contiene "Abad" en su nombre):

# View(db$Authors)
iida <- with(db$Authors, ida[grepl('Abad', AF)])
db$Authors$AF[iida]
## [1] "Mato Abad, Virginia" "Abad, Maria-Jose"   
## [3] "Abad Vicente, J."    "Abada, Sabah"
idd <- with(db$AutDoc, idd[ida %in% iida])
idd
## [1] 273 291 518 586
# View(Docs[idd, ])
head(Docs[idd, 1:3])
##     idd idj
## 273 273 282
## 291 291 141
## 518 518 272
## 586 586 311
##                                                                                                                                                                                      TI
## 273                                 Classification of mild cognitive impairment and Alzheimer's Disease with machine-learning techniques using H-1 Magnetic Resonance Spectroscopy data
## 291 Identifying a population of patients suitable for the implantation of a subcutaneous defibrillator (S-ICD) among patients implanted with a conventional transvenous device (TV-ICD)
## 518           Importance of Outer-Sphere and Aggregation Phenomena in the Relaxation Properties of Phosphonated Gadolinium Complexes with Potential Applications as MRI Contrast Agents
## 586                                                                      Enhanced thermal conductivity of rheologically percolated carbon nanofiber reinforced polypropylene composites