As the scope of our data (i.e, the different kinds of data objects included in the resource) and our data timeline (i.e., the data accrued from the future and the deep past) are broadened, we need to care about do not confuse one data object with another.
Big Data V’s
- Volume: large amounts of data;
- Variety - various types of data ;
- Velocity - constantly accumulating new data. \
we can put a few more v’s into this list…
- Vision - Having a clear purpose and a plan;
- Verification - ensures that the data conforms to a set of specifications;
- Validation - checking that its purpose is fulfilled.
Just get a more powerful machine
Faster and more powerful computers are nice to have but these devices cannot compensate for deficiencies in data preparation. Computers, no matter how fast they are, do not process data and return results without scientist efforts. In hospitals, for example, we see many computers, but none of them are capable of diagnosing a patient. These computers are more involved in collecting, storing, receiving and delivering medical records, than in other tasks.
It is important to know how to distinguish Big Data
Its given to the size, complexity and restlessness of Big Data resources, which shape the models by which we can operate these resources.
Big Data resources are not gigantic spreadsheets or telephone directories of an entire country (catalogs or matrices of considerable size); Even though their name refers to something that is grandiose, we must take into account the V’s mentioned above.
We can say that Big Data resources cannot be analyzed in their entirety (all at once), Big Data analysis is a multi-step process where data is extracted, filtered and transformed through frequent analysis, which can be fragmented or recursive.
Big data vs small data
Big Data is also not small data that no longer fits on a spreadsheet, nor is it a database that has became too large. Typically, small data aims to answer a specific question and its located in a single institution, single computer or sometimes a single file.
Criteria to characterize Big Data
-
Goals It is designed with a goal in mind, but that goal is characterized by “spanning a range” because it has a certain flexibility, and the questions answered will be about various domains. It is not possible to specify with any degree of certainly the destination of a project. This usually comes as a surprise.
-
Location Typically spread throughout the electronic space, divided between distinct networks and servers located and owned by many organizations.
-
Data structure and content It has to be capable of absorb unstructured data and the general context of the information probably will be multi-disciplinar. A data object can point to data contained on another, without being directly connected
-
Data preparation Data can come from many different sources imaginable, and is prepared by many different people; rarely will the same person who uses the data be the same person who prepared .
-
Longevity Big data projects typically contain data that needs to be stored (in the best case scenario) perpetually. Ideally, a stored resource should be absorbed into another resource when the original one is “depleted”. Many projects extend into the future and the past, acquiring data by prediction or retrospect.
-
Measurements When present, it can be obtained through various control protocols, verifying the quality of Big Data is one of the most difficult tasks for data managers.
-
Reproducible Replicating a project is rarely feasible. In most cases, one cannot wait for inconsistent or bad data to be flagged as.
-
Stakes Big Data projects can be obscenely expensive. A failed Big Data effort can lead to bankruptcy, institutional collapse, mass firings, and the sudden disintegration of all the data held in the resource. Though the costs of failure can be high in terms of money, time, and labor. Big Data failures may have some redeeming value. Each failed effort lives on as intellectual remnants consumed by the next Big Data effort.
-
Introspection Unless the Big Data resource is exceptionally well designed, the contents and organization of the resource can be inscrutable, even to the data managers.
Complete access to data, information about the data values, and information about the organization of the data is achieved through a technique herein referred to as introspection.
-
Analysis Big Data is ordinarily analyzed in incremental steps. The data are extracted, reviewed, reduced, normalized, transformed, visualized, interpreted, and reanalyzed with different methods.