一, Interim Project Experience Example 1.1 New server online build system environment 1, according to the existing structure deployment tool (PXE+kickstart) 2, combined with the application system requirements to customize the deployment template 3, production system optimization and other one-click execution script 4, automated deployment implementation 5, according to the customized optimization content to verify the effectiveness of the automation deployment
1.2 new server online build software environment 1. Deploy the LNMP environment on the newly deployed server. 2. Perform an effect test on the environment of the batch deployment; 3. Prepare Nginx configuration files and batch deployment; 4, according to the needs of Nginx service-related optimization (expires/gizp, etc.)
1.3 web server architecture adjustment (from single point to cluster design) Requirement: Solve the problem of single point of failure of the web server of the website
Responsibilities: 1, study a variety of load balancing schemes Mainly for lvs+keepalived and nginx+keepalived 2, writing a new architecture plan implementation project book and implementation schedule 3, new system deployment and routine maintenance Turning most of the company's original single-point servers into clusters, improving the stability of the website and high concurrent application scenarios
1.4 Server user rights management transformation plan and implementation project Requirement: Solve the problem of flooding the company's root privileges
1, propose a permission rectification solution, improve the current status of the company's root privileges 2, convene everyone to discuss and determine the plan to promote implementation 3. After implementation, the company's authority management is more clear (summary maintenance), which fundamentally reduces the occurrence of internal operations and other irregularities and security risks.
问题1: How does your company manage user rights? A: We use sudo to manage permissions. No matter whether it is operation or maintenance or development, we generally don't give root privileges. Only core level development or R&D director or above can give the corresponding server-level permissions. The dimension or operation and maintenance director will give root authority
问题2: When planning the server, run several ordinary users on the server? A: Our ordinary users are based on projects. The product lines of different projects are different in different companies. Our company has only a dozen product lines. We build a common user for each project, so both nginx and tomcat run under normal users.
问题3: What about some public services? Such as memcached or redis. A: These public services can also run under ordinary users. In general, this is the case. My understanding of operation and maintenance is that operations and maintenance do things, and development and development. Operation and maintenance is responsible for the network system. As long as the system is not faulty, as long as the network is not faulty, as long as the system resources are sufficient, then our operation and maintenance duties are in place. The concept of our company is project responsibility system, which means that the responsibility of each project is development, and our operation and maintenance account for about 30%-40%. Our development accounts for 60% of the responsibility. When the process goes online, the service is run by ordinary users. Each of its site directories is the normal user's permission, which is the normal user of the 700. This is the safest. Whether it is the start, stop, and code on-line, log collection, log analysis is implemented by ordinary users running through our process. When we manage this project, we can add the user of the development to the project team, so that the developer responsible for the project has all the permissions for the project.
1.5 Server log audit project proposed and implemented 1, after the permission control, further implement the logging scheme for all users. 2, through sudo and rsyslog cooperation to achieve log audit for all users and centralized management of records 3. After implementation, all the execution commands of all operation and maintenance and development are recorded, and the internal security risks of internal personnel are eliminated.
1.6 Full network server data batch distribution and batch management Demand: The company's servers are gradually increasing, so it is very troublesome to manage. So it proposes to solve the batch distribution management solution, and distribute and manage the entire network server data
1, for the ansible distribution tool and ssh key+ Rsync two sets of distribution management program research, the final choice is simple and easy to maintain and powerful ssh key + rsync program
2, find an IDC intranet server, as a distribution machine, do sshkey authentication for fixed ordinary users (note that it is not root ), need root privileges, control through sudo, reduce security risks.
3, for the security configuration of the distributor, for example, remove the external IP and open the firewall. After the implementation, the efficiency of operation and maintenance management has improved a lot, so it has won the company's award.
1.7 Full network server data backup plan proposed and responsible for implementation Requirement: Make a complete backup system for company data
1, propose a backup network-wide data solution for the company's important data backup chaos and leadership 2, through the local package backup, and then rsync combined with the inotify application to back up the entire network data to a fixed storage server, and then through the script on the storage server to check and alarm the administrator backup results 3. Regularly store the IDC room data backup company's internal server to prevent data loss caused by earthquake and other problems.
1.8 MySQL database to achieve master-slave synchronization, and a complete backup solution 1. Before the company entered the company, the former operation and maintenance lost data, so the boss attached great importance to data security. 2, I proposed and launched the MySQL database backup solution and MySQL architecture solution 3, the program is mainly to open the binlog from the library and the full library according to the library, push to the backup server 4, the backup data is periodically restored to the test library for development use 5, the process and system for the manual update database
1.9 LNMP architecture optimization program 1, the company uses LNMP architecture, less optimization, poor performance 2, I proposed the optimization scheme of LNMP architecture 3, the program is mainly Linux system optimization, nginx service optimization, php service optimization, MySQL optimization 4. After the optimization is completed, the performance of the LNMP architecture is greatly improved.
1.10 Full network server monitoring solution implementation Demand: After the company, there is no monitoring system, every failure can not alarm, each failure has a great impact on the company's website, so I use the monitoring technology I have mastered, as well as query data writing solutions, submit To the company's leaders, to improve the server alarms are not timely, to ensure that the company's website failures are handled in a timely manner
1, according to the needs of the most popular monitoring software zabbix for research. 2, according to the specific needs of different servers to customize the template for monitoring real-time alarm
After the implementation is completed, most of the fault alarms can be reported to the administrator in a timely and effective manner, and strive for the stability of the website
1.11 Build a jumpserver jumper management chaos account Start and end time: 2016/03-2016/04 Software environment: CentOS6.5 Development Tools: jumpserver Project Description: During the few months of working, I found that the management of server accounts in the company's server operation and maintenance management is very confusing. Some operation and maintenance even have several work accounts, and can log in to the root account at any time. Therefore, whenever there is an operation and maintenance staff to transfer or leave, all account passwords of the server will be changed again, not only time-consuming and laborious, but also difficult to remember passwords, which is very troublesome. So, after thinking about it, I suggested to the leadership to enable the open source jumper jumpserver to improve the current chaos. Project responsibilities:
Deploy a server for the jumpserver springboard Log in to the springboard with xshell for authorization testing 1.12 NFS+keepalived High Availability Architecture In the mid-term architecture diagram
1.13 MySQL multi-instance and master-slave replication In the mid-term architecture diagram
1.14 Improve server storage issues Demand: Reducing storage pressure during peak visits Duties: 1, Web front-end storage uses NFS master and backup structure 2, the user writes data, such as pictures, attachments, etc., to the NFS main, the user's read access NFS 3, NFS active and standby, use rsync + inotify for data synchronization 4, NFS storage data is not large, use sersync to push data to the web front end, try to minimize the front-end service access to the back-end server request, reduce NFS storage pressure 5, the security of data backup is guaranteed, do not worry about data loss.
二, end project experience example 2.1 The First Industry Department of the First Aerospace Institute--------Integrated Service Cluster Project requirements:
The main realization of the project is the establishment of the internal service platform of the First Aerospace Institute. The goal is to build a secure, efficient and stable server cluster architecture. Provide a comprehensive platform for the services of aerospace houses.
The front section adopts load balancing with Squid cluster and hardware firewall to isolate the internal network and the external network, and can provide functions for monitoring the network and recording and transmitting information, strengthening the security of the local area network, etc. Realizing the high availability of the front-end scheduling server and the intermediate web server. Load balancing, high availability of the back-end database server, monitoring server monitoring the private data and public data of each server in the cluster. The software used by the front-end scheduling server is Keepalived and Nginx. The software used by the intermediate web server is Nginx, and the number of concurrent data is high. And relatively stable The back-end database server uses read-write separation, and the write library MySQL+MHA dual-master mutual-slave mode. Read the slave library using load balancing LVS+Keepalived+MySQL, and use Memcached to cache the cluster cache from the database. The Web server uses Nginx to build the web server, and combines Inotify+Rsync to achieve website data synchronization. The monitoring server uses Zabbix to monitor the running status and service status of each server. duty description: In this project, I am mainly responsible for the construction of the server service platform. In order to achieve uniformity, a shell script has been specially written to make the server deployment more standardized. 2.2 NFS Cluster Upgrade demand analysis: 1. The original shared storage server NFS, performance bottleneck and single point of failure 2. After the main NFS storage system is down, the alarm administrator comes to manually select the fastest NFS storage system to be changed to the master according to the synchronized log records. The scheme is simple and feasible, but needs manual processing. It is inevitable that the operation is wrong or the time is too long. solution: 1. Replace NFS with distributed file storage management system MFS 2. At present, there is a single point problem in the MFS metadata server. Therefore, we provide timely disk synchronization through DRBD and Failover through HeartBeat to achieve high availability. 3, using MFS + DRBD + Heartbeat high availability service solution, this solution can effectively solve the single point problem of the main MFS storage system, when the main MFS storage is down, you can switch the main MFS storage system from one master node to another A standby node, and the new primary MFS storage system will automatically synchronize with all other MFS storage systems, and the data of the new primary MFS storage system is almost identical to the primary MFS storage system of the downtime. This switching process is completely complete. It is automatic, thus implementing the hot standby solution of the MFS storage system. Fast fault recovery and improved service reliability. duty description: I am mainly responsible for this project, coordination of project site, construction of all server service platforms, writing shell scripts, making server deployment more standardized
2.3 MySQL cluster read and write separation and high availability solution demand analysis: 1. The new solution guarantees service performance and I/O to meet the rapid response requirements of multiple terminals of the enterprise. 2. Ensure long-term uninterrupted and stable operation of the system. Guarantee cost rationality. 3. Meet the high availability and reliability of the database system. solution: 1, the bottom 5 MySQL databases, one master four slaves. Open semi-synchronous replication. Improve data security 2. Use the middleware Atlas to achieve read-write separation and read load balancing, and improve decoupling from the program. 3. Use two servers to build LVS+Keepalived to load balance and high availability on the Atlas server. 4. Set up a main MHA server management database main library hot standby problem. 5, the program greatly reduces server resource waste, achieve 30 seconds of fault switching, greatly guarantee database consistency duty description: Mainly responsible for the construction of all server service platforms, program design, scripting.
2.4 NFS+DRBD+heartbeat high availability solution Software environment: Centos6.8 Hardware environment: DELL R710 Implementation time: March 2015 Shortly after entering the company, the back-end NFS server occasionally crashes during the peak period of the network request, so that the mount request of the WEB server cannot be automatically switched to the backup server, causing the web server to be unable to be used normally, causing the network service to be suspended. The company's leaders asked me to make a solution in order to avoid similar situations in the future. By observing the load of the NFS server CPU and memory, and querying the load data of the main hardware before the NFS server, and carefully analyzing it, I submitted a DRBD+heartbeat+NFS solution to solve the existing problem. I was approved by the leadership to implement this program. Project responsibilities: 1. Responsible for the overall planning and deployment of the project; 2. Responsible for writing the heartbeat automatic switching script; 3. Responsible for the main frame of the NFS service rack; 4. Through the simulation of the fault, and the observation of the metadata server, the data storage server operation data, and the previous situation, the data is compared to form a report; 5. Writing of the project implementation report. Late improvement: By configuring multiple independent physical connections to avoid single point of failure in the Heartbeat communication line itself, the chance of "brain cracking" is minimized. The switching time of the master and slave servers is shortened by setting the options such as keepalive in the ha.cf configuration file. In DRBD, the replication process is adjusted. Handle bad block problems on the Master side.
2.5 Mobile deployment tuning and online Operating environment: CentOS-6.6, DELL R730 Main function: Separate mobile and PC services 运用 Application technology: Nginx seven-layer load, tomcat8+jdk1.8, MHA to achieve mysql high availability (mysql–5.6.17), Php-5.6.30, shell script to send data detection information Technical points:
Using Nginx seven-layer scheduling to achieve PC-side and mobile-side business separation, dynamic and static separation Seven-layer scheduling + Keepalived high availability solution Nginx integrates Taobao health detection module Code read and write separation + database mha mature high availability program Timing + script mysql data backup and detection, send detection results information to the administrator phone Web service optimization, php optimization, tomcat optimization 2.6 squid transparent proxy 1, system environment: CentOS6.5 2, software tools: squid-3.0 3. Project description: Previously, the company used SNAT to access the Internet, causing employees to use the company's network bandwidth to browse and work during their work. Closed website video, resulting in reduced work efficiency; flooding of applications such as Thunder and P2P, resulting in network congestion, enterprises Network bandwidth resources are tight. 4. Responsibilities:
a) Use Squid proxy service to control the online behavior of employees;
b) Develop a corporate online behavior control plan;
c) Implement security control on the intranet Function, filter malicious web pages, prevent malicious attacks;
d) Limit network behavior, intelligent control of download software such as Thunder, P2P, etc.;
e) Fine and intelligent management of online behavior. 5. Project achievements: After the implementation of the project, the employee's work efficiency is significantly improved, and the bandwidth resources of the enterprise network are guaranteed.