当前位置:首页 > 云计算 > 正文

云计算机与大数据的典型案例(云计算技术应用和大数据)

What are the big data application cases?

The cases are as follows:

1. Traffic big data makes travel smoother

Traffic is an important part of human behavior And one of the important conditions, the perception of big data is also the most urgent. In recent years, my country's intelligent transportation has achieved rapid development, and many technical means have reached the international leading level. Big data applications in transportation are mainly in two aspects. On the one hand, big data sensor data can be used to understand vehicle traffic density and conduct reasonable road planning, including single-line line planning. On the other hand, big live data can be used to realize real-time signal light dispatching and improve the operation capacity of existing lines.

2. Big data in education teaches students in accordance with their aptitude

In the classroom, data can not only help improve education and teaching, but also big data is more useful in major educational decision-making and education reform. Use data to diagnose students who are at risk of dropping out of school, explore the relationship between education expenditures and student academic performance improvement, and explore the relationship between student absenteeism and performance.

3. Environmental protection big data combats PM2.5

In the United States, NOAA (National Oceanic and Atmospheric Administration) has actually been using big data services for a long time. More than 3.5 billion observations are collected every day via satellites, ships, aircraft, buoys, sensors and more. After the collection is completed, NOAA will summarize atmospheric data, ocean data, and geological data, conduct direct measurements, draw a complex high-fidelity prediction model, and provide it to the NWS (National Weather Service) as reference data for weather forecasts.


Characteristics of big data

1. Large capacity

For example, IDC recently The report predicts that by 2020, the world's data volume will expand 50 times. Currently, the scale of big data remains a changing metric, with the size of a single data set ranging from tens of terabytes to several petabytes. In simple terms, storing 1PB of data 20,000 PCs with 50GB hard drives are required. Additionally, data can be generated from a variety of unexpected sources.

2. Diversity

The increase in data diversity is mainly due to data types such as web logs, social media, network retrieval, mobile phone call records, and sensor networks.

3. High speed

High speed describes the speed at which data is created and moved. In the era of high-speed networks, real-time data streams are created through high-speed computer processors and servers that optimize software performance. has become a popular trend. Enterprises must not only know how to quickly create data, but also know how to quickly process, analyze and return users to meet their real-time needs.

The relationship between cloud computing and big data

Cloud computing (cloud computing) is the increase, use and delivery model of Internet-based related services, which usually involves providing dynamic, easily scalable and frequent services through the Internet. are virtualized resources. Cloud is a metaphor for network and Internet. In the past, cloud was often used to represent telecommunications networks in diagrams, and later it was also used to represent the abstraction of the Internet and underlying infrastructure. Cloud computing in the narrow sense refers to the delivery and use model of IT infrastructure, which refers to obtaining the required resources through the network in an on-demand and easily scalable manner; in the broad sense cloud computing refers to the delivery and use model of services, which refers to the on-demand and easily scalable manner through the network. way to get the services you need. Such services can be IT, software, Internet-related, or other services. It means that computing power can also be circulated as a commodity through the Internet.

Big data, or massive data, refers to the amount of data involved that is so large that it cannot be captured, managed, processed, and combined within a reasonable time through current mainstream software tools. Organize information to help companies make more positive business decisions. The 4V characteristics of big data: Volume, Velocity, Variety, and Veracity.

Technically, the relationship between big data and cloud computing is as inseparable as the two sides of the same coin. Big data cannot be processed by a single computer, and a distributed computing architecture must be used. Its characteristic lies in the mining of massive data, but it must rely on distributed processing, distributed database, cloud storage and virtualization technology of cloud computing.

Big data management, distributed file systems, such as Hadoop, Maprece data segmentation and access execution; at the same time, SQL support, SQL interface support represented by HiveHADOOP, is built on big data technology using cloud computing First-generation data warehouses have become a hot topic. From the perspective of system requirements, the architecture of big data poses new challenges to the system:

1. Higher integration. A standard chassis completes a specific task to the maximum extent possible.

2. The configuration is more reasonable and the speed is faster. The balanced design of storage, controller, I/O channel, memory, CPU, and network, as well as the optimal design for data warehouse access, are more than an order of magnitude higher than traditional similar platforms.

3. Lower overall energy consumption. For the same computing tasks, the energy consumption is the lowest.

4. The system is more stable and reliable. It can eliminate various single points of failure and unify the quality and standards of a component or device.

5. Low management and maintenance costs. Routine management of data collections is fully integrated.

6. Plannable and foreseeable system expansion and upgrade roadmap.

What is big data and what are the typical cases of big data?
"Big data" is a data set with a very large volume and a very large data category, and such a data set cannot be used in traditional databases Tools capture, manage and process its content. "Big data" first refers to large data volumes (volumes), which refers to large data sets, usually around 10TB in size. However, in practical applications, many enterprise users put multiple data sets together, forming a PB-level data volume; secondly, it refers to the large variety of data. Data comes from a variety of data sources. Data types and formats are becoming increasingly rich. It has broken through the previously limited scope of structured data and includes semi-structured and unstructured data. ized data. Next is the data processing speed (Velocity), which enables real-time processing of data even when the amount of data is very large. The last feature refers to the high veracity of data. With the interest in new data sources such as social data, enterprise content, transaction and application data, the limitations of traditional data sources are broken, and enterprises increasingly need effective information to Ensure its authenticity and safety.
Data collection: ETL tools are responsible for extracting data from distributed and heterogeneous data sources, such as relational data, flat data files, etc., into the temporary middle layer for cleaning, conversion, integration, and finally loading into a data warehouse or data set. In the city, it has become the basis for online analytical processing and data mining.
Data access: relational database, NOSQL, SQL, etc.
Infrastructure: cloud storage, distributed file storage, etc.
Data processing: Natural language processing (NLP, Natural Language Processing) is a discipline that studies language issues in the interaction between humans and computers. The key to processing natural language is to let the computer "understand" natural language, so natural language processing is also called natural language understanding (NLU, NaturalLanguageUnderstanding), also known as computational linguistics (Computational Linguistics). On the one hand, it is a branch of language information processing. On the other hand, it is one of the core topics of artificial intelligence (AI, Artificial Intelligence).
Statistical analysis: hypothesis testing, significance testing, difference analysis, correlation analysis, T test, analysis of variance, chi-square analysis, partial correlation Analysis, distance analysis, regression analysis, simple regression analysis, multiple regression analysis, stepwise regression, regression prediction and residual analysis, ridge regression, logistic regression analysis, curve estimation, factor analysis, cluster analysis, principal component analysis, factor analysis, Fast clustering and clustering methods, discriminant analysis, correspondence analysis, multivariate correspondence analysis (optimal scale analysis), bootstrap technology, etc.
Data mining: Classification, Estimation, Prediction ), correlation grouping or association rules (Affinity grouping or association rules), clustering (Clustering), description and visualization, Description and Visualization), complex data type mining (Text, Web, graphics images, video, audio, etc.)
Model prediction: prediction Models, machine learning, modeling and simulation.
Result presentation: cloud computing, tag cloud, relationship diagram, etc.
To understand the concept of big data, we must first start with "big". "Big" refers to the scale of data. Big data generally refers to the amount of data above 10TB (1TB=1024GB). Big data is different from the massive data of the past. Its basic characteristics can be summarized by four V (Vol-ume, Variety, Value and Velocity), that is, large volume, diversity, low value density, and fast speed.
First, the volume of data is huge. From the TB level to the PB level.
Second, there are many types of data, such as the previously mentioned web logs, videos, pictures, geographical location information, etc.
Third, the value density is low. Taking video as an example, during continuous and uninterrupted monitoring, the potentially useful data is only a second or two.
Fourth, the processing speed is fast. 1 second rule. This last point is also fundamentally different from traditional data mining technology. The Internet of Things, cloud computing, mobile Internet, Internet of Vehicles, mobile phones, tablets, PCs, and various sensors spread across every corner of the earth are all data sources or carrying methods.
Big data technology refers to technology that quickly obtains valuable information from various types of huge amounts of data. The core of solving big data problems is big data technology. The current term "big data" refers not only to the scale of the data itself, but also to the tools, platforms and data analysis systems that collect the data. The purpose of big data research and development is to develop big data technology and apply it to related fields, and promote its breakthrough development by solving huge data processing problems. Therefore, the challenges brought by the big data era are not only reflected in how to handle huge amounts of data