Research Methodology Summary
MCT 624 - Thesis Fundamentals
Thesis Advisor - Darl Kuhn
NoSQL databases have attained success in large-scale, niche-based internet implementations, but have yet to experience widespread acceptance in ecommerce. This study will compare and examine MongoDB and Oracle 11g, to identify performance patterns in ecommerce scenarios. The goal of this study will be to ascertain a set of conditions that will describe whether or not an ecommerce database project would be better-served with a NoSQL or (traditional) Relational Database Management System (RDBMS).
While the original thought was to compare NoSQL databases to Relational Database Management Systems (RDBMS), that has been determined to be too broad. NoSQL databases vary dramatically (even when compared to each other) from an architectural perspective. To assume that experiments done with MongoDB would reflect those done with Cassandra or CouchDB would be erroneous. Testing multiple NoSQL databases would be challenging and time-consuming, therefore a single NoSQL database had to be chosen.
MongoDB has been chosen as the NoSQL subject for this study for several reasons. First of all, it has been advocated for use in ecommerce by several authors, including “MongoDB in Action” author Kyle Banker. Authors Steve Francia and Dwight Merriman also describe MongoDB as (Francia, Merriman 2011) “well suited” for ecommerce. Additionally, MongoDB's reputed ability to scale horizontally and its flexible schema make it a good candidate for a web product database.
Oracle 11g was chosen to represent the RDBMS side of this study. Oracle is widely considered to be the front-runner in the current RDBMS market (Mullins 2011) with a 48% market share. Its status as an industry leader makes it the most attractive option.
The original implementation of this study was to be on “traditional data processing environments.” That statement is too vague to do a valuable study on, and had to be refined. Due to the author's experience and qualifications, the use cases for this study were chosen to relate to the ecommerce industry.
I believe that this study is significant for two reasons. First of all, NoSQL databases have proven to be a viable solution for some unique scaling problems. But the idea of implementing a NoSQL solution seems premature to many information technology organizations. Ofttimes experienced professionals will decide to “live with” a process that is lengthy or slow due to an inability to scale horizontally.
Secondly, NoSQL is a known “buzzword”, which has the effect of people seeking it out when it may not be the best solution. There have been instances of early adopters who do not really understand NoSQL technology (Banker, 2010), and run into issues trying to implement a relational model with it. It is the author’s opinion that there is a high degree of confusion surrounding the appropriate use of NoSQL databases.
A testing framework (to be developed) will simulate an ecommerce website. As a part of the testing framework, a series of common ecommerce functions will be written to operate in a multi-threaded capacity.
Test data will be generated to simulate customers, addresses (of customers) and products. The data will then be loaded into each database (MongoDB and Oracle 11g), so that each will have the same product, customer and address data. The database instances will be running on Linux machines with identical hardware configurations. Next, a series of experiments will be run using the aforementioned ecommerce functions of the testing framework. Statistics will be tracked for performance of CRUD-based transactions typical for an ecommerce website.
Some of the experiments will be focused on testing the ACID (atomicity, consistency, isolation, and durability) properties of each database. It is expected that the all of the ACID tests run against the Oracle 11g instance will succeed. However, data regarding the performance of the MongoDB tests will be recorded and scrutinized for its adherence to ACID properties.
Once the data has been recorded, select variables (including but not limited to column size, number of rows indexed, and size of database) will be analyzed with a correlational approach. The presence of meaningful correlational coefficients will help in deriving conclusions for this study.
A deliverable of this study will be a list of specific, concrete instances where a NoSQL database is a better choice for an ecommerce back-end. The definition of “better” in this case, resembles a favorable trade-off of performance and ACIDity.
Final draft of thesis
|Task #||Phase||Task Name||Deliverable||Completion Date||Dependency #|
|Planning / Preliminary Research|
|1||Thesis proposal||Final draft of proposal, initial draft of project plan||02/17/2012||-|
|2||Advisor approval||MCT-624 grade of "PASS"||02/27/2012||1|
|3||Proposal bibliography||Annotated bibliography of works read thus far||02/26/2012||-|
|4||Articulate the context||Refined research question||03/05/2012||2|
|5||Prepare for the search||Context list||03/09/2012||4|
|6||Conduct the search||List of cited works||07/14/2012||5|
|7||Obtain materials cited||07/14/2012||6|
|8||Evaluation of materials||07/17/2012||7|
|9||Critical analysis of source materials||Annotated bibliography||09/09/2012||8|
|10||Build Data Tools to generate test data||Generated test data||02/26/2012||-|
|11||Build Data Loaders||Data loaded into Oracle and MongoDB||05/09/2012||10|
|12||Design relational model for Oracle 11g instance||DDL statements||03/04/2012||10|
|13||Build testing framework||Completed testing software||07/14/2012||11|
|14||Build Ubuntu Linux machine(s) with install of MongoDB||A running machine with DB instance||08/04/2012||-|
|15||Build Ubuntu Linux machine(s) with install of Oracle 11g||A running machine with DB instance||08/09/2012||-|
|16||Execute tests for customer maintenance||Performance and transaction data||09/08/2012||10,11,12,13|
|17||Execute tests for product maintenance||Performance and transaction data||09/08/2012||10,11,12,13|
|18||Execute tests for customer orders||Performance and transaction data||09/08/2012||10,11,12,13|
|19||Analyze transaction statistics||Conclusions drawn from data||01/23/2013||16,17,18|
|20||Introduction||Section detailing the identification of the problem||09/29/2012||19|
|21||Methodology||Description of research methodologies used||10/05/2012||20|
|22||Results and Evaluation||Sections describing the meanings inferred from data||01/09/2013||21|
|23||Discussion||Conclusions of study will be presented||01/23/2013||22|
|24||Annotated Bibliography||List of previous works which influence this study||01/01/2013||23|
|25||Revise Intro||Make revisions to introduction||01/09/2013||24|
|Thesis Refinement and Conclusion|
|26||Initial Draft of Thesis||First draft presented to advisor (and others)||03/09/2013||25|
|27||Revise Thesis Draft||Apply suggested revisions||05/11/2013||26|
|28||Draft of Thesis||Draft presented to advisor||06/29/2013||27|
|29||Final draft, submission, and presentation||07/17/2013||28|
Banker, K. (2010), “MongoDB and E-commerce”, (retrieved from: http://kylebanker.com/blog/2010/04/30/mongodb-and-ecommerce/)
Francia, S., Merriman, D. (2011), “MongoDB: Use Cases”, 10gen Inc., (retrieved from: http://www.mongodb.org/display/DOCS/Use+Cases)
Leedy, P. D., Omrod J. E. (2010), “Practical Research: Planning and Design” (9th Edition), Pearson Education, Boston, MA, (pp. 121-124)
Mullins, C. (2011), “The Database Report – July 2011”, The Database Administration Newsletter (retrieved from: http://www.tdan.com/view-articles/15299)