I’ve been taking courses from Coursera’s Data Science specialization for the last three months already. All the tasks are being performed using R and its contemporary packages like dplyr, lubridate, ggplot2, etc. I’ve gained really lots of hands-on experience throughout the courses – and in a very engaging way.
In my project, I needed to connect from R to Teradata to visualize data quality metrics and their trends. What could be better than doing this directly in R, right? 🙂 Reasons:
- I’m learning R :).
- Can start right away.
- R and R Studio are free, so no need to take weeks to acquire a license in the corporate world (depending on who you know of course) takes weeks.
- High usability through scriptability. Profiling a different table is a matter of changing a parameter.
- Repeatable – I can produce the same output multiple times.
So I when started, first I needed to figure out how to connect to Teradata – which is what this blog post is about.
There’s a package for connecting to Teradata database made by the company. The package is available on the Teradata’s github page. We can download it using a browser or (the preferred way) fetch it directly from R using:
destfile = "teradataR_1.1.0.tar.gz")
Then we need to install it along with RODBC package which is available through the CRAN network.
install.packages("c:/temp/teradataR_1.1.0.tar.gz", repos=NULL, type="source")
Finally, we can load the packages using library command, connect to the Teradata engine and test the connection by querying the sys_caledar table.
tdConnect("DSN_NAME_HERE",uid = "USER_NAME_HERE", pwd = "PASSWORD_HERE")
df <- tdQuery("select * from sys_calendar.calendar sample 100;")
Be nice and close the connection when you’re done 🙂