Authors: Paul Murrell
ISBN-13: 9781420065176, ISBN-10: 1420065173
Format: Hardcover
Publisher: Taylor & Francis, Inc.
Date Published: March 2009
Edition: New Edition
Paul Murrell is a Senior Lecturer in the Department of Statistics at the University of Auckland, New Zealand. Author of the bestselling R Graphics (2006), he is also part of the development team for the R and Omegahat statistical computing projects. Dr. Murrell’s research interests include computational and graphical statistics.
Providing key information on how to work with research data, Introduction to Data Technologies presents ideas and techniques for performing critical, behind-the-scenes tasks that take up so much time and effort yet typically receive little attention in formal education. With a focus on computational tools, the book shows readers how to improve their awareness of what tasks can be achieved and describes the correct approach to perform these tasks.
Practical examples demonstrate the most important points
The author first discusses how to write computer code using HTML as a concrete example. He then covers a variety of data storage topics, including different file formats, XML, and the structure and design issues of relational databases. After illustrating how to extract data from a relational database using SQL, the book presents tools and techniques for searching, sorting, tabulating, and manipulating data. It also introduces some very basic programming concepts as well as the R language for statistical computing. Each of these topics has supporting chapters that offer reference material on HTML, CSS, XML, DTD, SQL, R, and regular expressions.
One-stop shop of introductory computing information
Written by a member of the R Development Core Team, this resource shows readers how to apply data technologies to tasks within a research setting. Collecting material otherwise scattered across many books and the web, it explores how to publish information via the web, how to access information stored in different formats, and how to write small programs to automate simple, repetitive tasks.
List of Figures xv
List of Tables xvii
preface xix
1 Introduction 1
1.1 Case study: Point Nemo 1
2 Writing Computer Code 9
2.1 Case study: Point Nemo (continued) 11
2.2 Syntax 13
2.2.1 HTML syntax 13
2.2.2 Escape sequences 17
2.3 Semantics 18
2.3.1 HTML semantics 19
2.4 Writing code 21
2.4.1 Text editors 21
2.4.2 Important features of a text editor 21
2.4.3 Layout of code 22
2.4.4 Indenting code 24
2.4.5 Long lines of code 25
2.4.6 Whitespace 26
2.4.7 Documenting code 26
2.4.8 HTML comments 28
2.5 Checking code 29
2.5.1 Checking HTML code 29
2.5.2 Reading error information 30
2.5.3 Reading documentation 32
2.6 Running code 32
2.6.1 Running HTML code 33
2.6.2 Debugging code 33
2.7 The DRY principle 35
2.7.1 Cascading Style Sheets 36
2.8 Further reading 41
3 HTML Reference 43
3.1 HTML syntax 43
3.1.1 HTML comments 44
3.1.2 HTML entities 45
3.2 HTML semantics 45
3.2.1 Common HTML elements 46
3.2.2 Common HTML attributes 51
3.3 Further reading 51
4 CSS Reference 53
4.1 CSS syntax 53
4.2 CSS semantics 54
4.2.1 CSS selectors 54
4.2.2 CSS properties 56
4.3 Linking CSS to HTML 59
4.4 CSS tips 60
4.5 Further reading 61
5 Data Storage 63
5.1 Case study: YBC 7289 64
5.2 Plain text formats 69
5.2.1 Computer memory 71
5.2.2 Files and formats 71
5.2.3 Case study: Point Nemo (continued) 72
5.2.4 Advantages and disadvantages 73
5.2.5 CSV files 76
5.2.6 Line endings 76
5.2.7 Text encodings 78
5.2.8 Case study: The Data Expo 80
5.3 Binary formats 83
5.3.1 More on computer memory 84
5.3.2 Case study: Point Nemo (continued) 86
5.3.3 NetCDF 87
5.3.4 PDF documents90
5.3.5 Other types of data 91
5.4 Spreadsheets 94
5.4.1 Spreadsheet formats 94
5.4.2 Spreadsheet software 95
5.4.3 Case study: Over the limit 96
5.5 XML 99
5.5.1 XML syntax 102
5.5.2 XML design 105
5.5.3 XML schema 110
5.5.4 Case study: Point Nemo (continued) 110
5.5.5 Advantages and disadvantages 114
5.6 Databases 118
5.6.1 The database data model 119
5.6.2 Database notation 121
5.6.3 Database design 122
5.6.4 Flashback: The DRY principle 132
5.6.5 Case study: The Data Expo (continued) 133
5.6.6 Advantages and disadvantages 138
5.6.7 Flashback: Database design and XML design 139
5.6.8 Case study: The Data Expo (continued) 139
5.6.9 Database software 141
5.7 Further reading 142
6 XML Reference 145
6.1 XML syntax 145
6.2 Document Type Definitions 147
6.2.1 Element declarations 148
6.2.2 Attribute declarations 149
6.2.3 Including a DTD 150
6.2.4 An example 151
6.3 Further reading 152
7 Data Queries 153
7.1 Case study: The Data Expo (continued) 154
7.2 Querying databases 158
7.2.1 SQL syntax 159
7.2.2 Case study: The Data Expo (continued) 159
7.2.3 Collations 165
7.2.4 Querying several tables: Joins 166
7.2.5 Case study: Commonwealth swimming 166
7.2.6 Cross joins 169
7.2.7 Inner joins 170
7.2.8 Case study: The Data Expo (continued) 171
7.2.9 Subqueries 175
7.2.10 Outer joins 176
7.2.11 Case study: Commonwealth swimming (continued) 176
7.2.12 Self joins 179
7.2.13 Case study: The Data Expo (continued) 179
7.2.14 Running SQL code 180
7.3 Querying XML 182
7.3.1 XPath syntax 182
7.3.2 Case study: Point Nemo (continued) 182
7.4 Further reading 185
8 SQL Reference 187
8.1 SQL syntax 187
8.2 SQL queries 187
8.2.1 Selecting columns 188
8.2.2 Specifying tables: The FROM clause 189
8.2.3 Selecting rows: The WHERE clause 190
8.2.4 Sorting results: The ORDER BY clause 192
8.2.5 Aggregating results: The GROUP BY clause 192
8.2.6 Subqueries 193
8.3 Other SQL commands 194
8.3.1 Defining tables 194
8.3.2 Populating tables 195
8.3.3 Modifying data 197
8.3.4 Deleting data 197
8.4 Further reading 197
9 Data Processing 199
9.1 Case study: The Population Clock 204
9.2 The R environment 214
9.2.1 The command line 214
9.2.2 The workspace 217
9.2.3 Packages 218
9.3 The R language 219
9.3.1 Expressions 219
9.3.2 Constant values 219
9.3.3 Arithmetic 220
9.3.4 Conditions 221
9.3.5 Function calls 222
9.3.6 Symbols and assignment 224
9.3.7 Keywords 226
9.3.8 Flashback: Writing for an audience 227
9.3.9 Naming variables 227
9.4 Data types and data structures 229
9.4.1 Case study: Counting candy 232
9.4.2 Vectors 234
9.4.3 Factors 237
9.4.4 Data frames 237
9.4.5 Lists 239
9.4.6 Matrices and arrays 241
9.4.7 Flashback: Numbers in computer memory 242
9.5 Subsetting 243
9.5.1 Assigning to a subset 250
9.5.2 Subsetting factors 251
9.6 More on data structures 252
9.6.1 The recycling rule 252
9.6.2 Type coercion 253
9.6.3 Attributes 256
9.6.4 Classes 259
9.6.5 Dates 261
9.6.6 Formulas 262
9.6.7 Exploring objects 263
9.6.8 Generic functions 264
9.7 Data import/export 266
9.7.1 The working directory 267
9.7.2 Specifying files 267
9.7.3 Text formats 268
9.7.4 Case study: Point Nemo (continued) 269
9.7.5 Binary formats 275
9.7.6 Spreadsheets 278
9.7.7 XML 280
9.7.8 Databases 284
9.7.9 Case study: The Data Expo (continued) 285
9.8 Data manipulation 287
9.8.1 Case study: New Zealand schools 287
9.8.2 Transformations 289
9.8.3 Sorting 293
9.8.4 Tables of counts 295
9.8.5 Aggregation 297
9.8.6 Case study: NCEA 302
9.8.7 The "apply" functions 304
9.8.8 Merging 309
9.8.9 Flashback: Database joins 312
9.8.10 Splitting 312
9.8.11 Reshaping 314
9.8.12 Case study: Utilities 318
9.9 Text processing 326
9.9.1 Case study: The longest placename 326
9.9.2 Regular expressions 333
9.9.3 Cage study: Rusty wheat 335
9.10 Data display 343
9.10.1 Case study: Point Nemo (continued) 343
9.10.2 Converting to text 345
9.10.3 Results for reports 348
9.11 Programming 351
9.11.1 Case study: The Data Expo (continued) 352
9.11.2 Control flow 554
9.11.3 Writing functions 356
9.11.4 Flashback: Writing functions, waiting code, and the DRY principle 359
9.11.5 Flashback: Debugging 360
9.12 Other software 361
10 R Reference 365
10.1 R syntax 365
10.1.1 Constants 365
10.1.2 Arithmetic operators 366
10.1.3 Logical operators 366
10.1.4 Function calls 366
10.1.5 Symbols and assignment 367
10.1.6 Loops 367
10.1.7 Conditional expressions 368
10.2 Data types and data structures 368
10.3 Functions 369
10.3.1 Session management 370
10.3.2 Generating vectors 370
10.3.3 Numeric functions 371
10.3.4 Comparisons 372
10.3.5 Type coercion 373
10.3.6 Exploring data structures373
10.3.7 Subsetting 374
10.3.8 Data import/export 375
10.3.9 Transformations 378
10.3.10 Sorting 379
10.3.11 Tables of counts 379
10.3.12 Aggregation 380
10.3.13 The "apply" functions 380
10.3.14 Merging 381
10.3.15 Splitting 382
10.3.16 Reshaping 382
10.3.17 Text processing 384
10.3.18 Data display 385
10.3.19 Debugging 386
10.4 Getting help 386
10.5 Packages 388
10.6 Searching for functions 389
10.7 Further reading 390
11 Regular Expressions Reference 391
11.1 Literals 391
11.2 Metacharacters 392
11.2.1 Character sets 392
11.2.2 Anchors 393
11.2.3 Alternation 394
11.2.4 Repetitions 395
11.2.5 Grouping 396
11.2.6 Backreferences 396
11.3 Further reading 397
12 Conclusion 399
Attributions 401
Bibliography 403
Index 407
No reviews. Submit yours!
Review this book.
We would like to know what you think about this book and publish your thoughts here! (top)
Your Review