Plain text files usually contain one record per line or say per row
Plain text files usually contain one record per line or say per row. Using delimiters in csv or space delimited file incurs some overhead in terms of size and performance as locating each time this record takes some time compared to fixed size records.
Common and easy example of flat files are group and password file in linux like operating system; like /etc/passwd or /etc/group.1
Some of the practical text file based database implementation examples are; TextDB, a file-based database is designed to handle high level of loads, MySQL CSV, a storage engine for MySQL 5.x version, Berkeley DB, a robust flat file database for critical applications which supports ACID transactions, Borland Reflex, Mimesis, an FFDB written in PHP4 that uses multiple files and a heap method of storage, TheIntegrationEngineer How Delimited and Fixed Position or Fixed Width files compare and are used, Flat File Checker – Open source data validation application for flat files.2
Apart from the examples where flat text files are been used as databases, October is example for content management system featuring a flat file based database and Laravel as web framework.345
Programing languages support for ASCII
Any programing language which supports I/O and string manipulation or processing functionality is well capable of read and write CSV files. Some of the common examples of programming languages which support CSV are; BASIC, C++, C#, PASCAL, Delphi, Go, JAVA, MATLAB, LISP, Perl, PHP, Python, R, TCL, VB/Java – Script, VB, Powershell etc. There are and can be many utilities which are used for reformat CSV or compare CSV like csvprint, csvdiff utilities of Perl. Over the years many editors are developed to work effectively with CSV file format like CSVed, CSVJSON, CSV editor pro, ms-excel and many more.7
Tools for converting spatial / non-spatial to csv
For converting spatial and non-spatial data to csv one can refer to safe software FME workbench converter and for tutorial related to that. It makes complex data integration task looks simple.8 FME workbench desktop version and trial versions are also available for trial and hands-on. Another example is mailjet safesent email solution utility and app, which play around with CSV and data-set.9
Python library for csv
Python is the most talked name in the recent times. Python professional community understand the importance of the CSV and its usage in data community, for the same reason there is dedicated csv library in python one can easily check docs.python.org/3/library/csv.html documentation for more clarity. CSV functions and classes of python like reader, writer, csv.DictReader, csv.DictWriter, writeRow, splitlines(), split(), etc and support for JSON and CSV conversion makes it favorable choice.10
Csvkit tool backed by Github
In todays world when data science and data crunching topics are among the top talked ones, tools and suits backed by Github like csvkit can’t be ignored. Csvkit is set of commands purely command line tools part of csvkit suite for the purpose of converting to and working with CSV, the so called king of tabular file formats. Csvkit can be broadly divided into three segments; input – processing – output. CSVKIT is inspired by gdal , pdftk and the original csvcut tool by Aaron Bycoffe ; Joe Germuska. For input csvkit have commands like; in2csv, sql2csv, followed by processing commands like; csvclean, csvcut, csvgrep, csvjoin, csvsort, etc; and for output and analysis of data one can use csvkit commands like; csvformat, csvjson, csvlook,csvpy, csvsql, etc. Christopher Groskopf, Joe Germuska, Aaron Bycoffe, Travis Mehlinger, Alejandro Companioni, Benjamin Wilson, Bryan Silverthorn and many more are the authors who contributed a lot in csvkit code. One may refer to https://csvkit.readthedocs.org11
Text database – q
A very strong alternative to csvsql is q. One can understand q – text as a database, q is a tool that handles all sorts of csv or flat files as databases allowing you to query from files that is text files which contain data as comma separated, space separated, or any other character delimited file. Now, querying for any sort of data output depending upon the requirement can be achieved using SQL language. SQL – structured query language is a declarative language for data. The bigger objective of this q tool is to act and provide bridge between flat files and SQL.15 This q tool or program is free software and covers under the terms of GNU licensing as published by free software foundation. As discussed the goal and purpose of q – tool is to perform SQL like statements on text or flat files which are comma or other character delimited text data. The purpose is clear that is to bring the SQL expressive power to command line and to provide text data to treat as database entities.AS q – tool act like sql query, any standard sql expression, conditions like where clause or group by , having , order by etc. are supported and allowed. Joins and sub-queries are allowed and supported in the where clause but not in from clause. SQL syntax reminds and resembles like sqlite’s syntax. For full online support and information one can refer to github.io/q usage page. It is worth to note here that python developed and api specifically for q so that Python modules can provide all q’s command line capabilities inclusive of capabilities like query execution and analysis. One can expose PyPI package of python and do the installation using pip install.121314
Linux column utility as csv viewer
The linux based CSV viewer command can be column –s, -t ; CSV.txt, it reads the comma separated file and displays in formatted fashion as compared to vi or cat command.18 otherwise as discussed one can install and use csvtool like on Ubuntu and can utilize this tool.
CSV in MYSQL
In MySQL database, which is having pluggable architecture and supports variety of engines, in one such engine and extension is support for csv engine. The csv storage engine stores the data in the text files having CSV format, CSV storage engine compiled into MySQL server, so in case one need to examine the source for the CSV engine then look out in the storage/csv data directory of the MySQL source distribution. In MySQL every table have .frm files so csv is no exception, but along-with it .csv is also created which actually contains the data in comma-separated manner, the meta information is although available in .csm file, big thing is that the .csv file actually contain readable textual data separated by comma values. In that sense, one can open this file in any of the csv editor including spreadsheets like excel (if not locked by any program). MySQL error-log can be maintained as csv file or say table and then queried using sql commands.16
Managing data using Database’s
Today’s personal and professional life is dependent a lot on data. This can be in the form of personal contact’s, professional networking contact’s, transactions with banking institutions, email’s, social media, logs, or anything like this.
New era’s digital life style and dependency on digital computing with more popularity of digital social media is the new challenge for such huge produced data to be handled and re-used.
For this very purpose different types of database management systems are already introduced by the industry and no doubt they have resolved this problem to great extent. Everyday new features and new techniques are introduced to make the things possible in less cost and high performance and efficiency.
There are close to 350 databases exists in the world which comes as row oriented, column oriented and document oriented categories. Every database follows some common basic features and at the same time some specific features, architecture or functionality which makes it best fit for some or the other business needs.
Databases – Understanding the DB-World
A databases are organized collection of data, stored and accessed using software. It is collection of data for future use stored in the form of some file which can be text or binary of some standard, so that makes it permanent persistent copy on disk or media. When a program or specific software access it it’s availability become fast and accurate (depends hugely on program and hardware combination along-with logic and algorithm used). The manner in which data is fundamentally structured in logical manner is data model or data modelling and amongst them most common and famous is relational model which uses table-based model, it was first described by EF Codd in 1969, where all data is represented in terms of tuples grouped into relations20. The purpose of the relational model is to provide a declarative method for specifying data and queries; where users directly state what information the database contains and what information they want or extract from it. Most relational database management systems use SQL-structured query language data definition and query technique language.
Alternate to relational model, are the hierarchical model and network model, the newer model is object oriented database model also exists.20