I recently had the opportunity to develop a Proof of Concept (PoC) for an idea whereby data is available from PDFs and Excel files to be consumed and analyzed into a responsive web application running in the AWS cloud. The scope included development of a responsive UI (User Interface) to reflect the idea, back-end APIs supporting the user interface, database storage, a data loading process, and a Continuous Integration/Continuous Deployment (CI/CD approach deploying it on the cloud. I will be sharing some details about the architecture, techniques, tooling, libraries and code snippets used in this effort.
Environment:
Development: ATOM IDE, AWS SDK for NodeJs , AWS CLI
Source code Repository: Bitbucket
AWS Services used:
Data extraction and loading: AWS Step-functions, AWS Lambda, Amazon S3, Amazon DynamoDB
Back-end: Amazon API Gateway, AWS Lambda, Amazon DynamoDB
Front-end: Amazon EC2, Amazon EC2 Auto scaling, Elastic Load Balancing, Amazon Route-53
CI/CD: AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, AWS CloudFormation, Amazon S3
Amazon Elastic Container service(ECS), Amazon Elastic Container Registry(ECR)
Data extraction and loading component: The raw data was available in PDF, Excel files and node.js used in conjunction with AWS Lambda and step-functions.
Some key statements presented here to process spreadsheets:
var Excel = require('exceljs');
var wb = new Excel.Workbook();
var filename="filename.xlsx";
wb.xlsx.readFile(filename).then(function() {
var ws=wb.getWorksheet("Sheet1"); // Read the sheet
// Iterate over all rows that have values in a worksheet
ws.eachRow(function(row, rowNumber) {
// Iterate over all cells in a row (including empty cells)
row.eachCell({ includeEmpty: true }, function(cell, colNumber) {
console.log(cell.value) } } }
Some key statements presented here to process PDF files:
var pdfText = require('pdf-text');
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
var params = {Bucket: 'bucketfolder.pdffiles', Key: 'filename.pdf'};
s3.getObject(params, function(error, data) {
var buf = new Buffer(data.Body, 'binary');
pdfText(buf, function(err, chunks) {
chunks.forEach(function(value) {
console.log(value); }); }) });
var pdfText = require('pdf-text');
var request = require("request");
var url= https://validurlvalue.pdf //Substitute with the Web URL pointing to the pdf file
request({url: url, encoding: null, strictSSL: false
}, function (error, response, body) {
if (!error && response.statusCode === 200) {
{
pdfText(body, function(err, chunks) {
chunks.forEach(function(value) {
console.log(value);
}); }); } } });
The following shows some of the justifications behind choosing DynamoDB database:
There are varieties of data that needs to be loaded from Excel and PDF to the database and needs to be periodically updated checking the data source and in specific sequence. The individual data extraction and loading components are implemented using AWS Lambda functions while the AWS step functions provided simplified and natural extension to orchestrate Lambda to manage the workflow involving sequential and parallel nature of the loading and extraction process. This is part of the AWS Serverless platform services to promote the full managed and completely serverless solution.
UI Layer:
React 15.5.4(react, react-dom, react-scripts)
Less Dynamic stylesheet language ("less": "^2.5.3","less-loader": "^2.2.1")
.jsx(or .js) file defined with components with building block for the different
Basic steps:
// CLI tool to get started with React
npm install -g create-react-app
create-react-app app-name
cd app-name
npm install (To install the dependencies)
We intend to leverage storybook to browse a component library, view the different states of each component, and interactively develop and test components. Initial steps to install the project will be:
npm i -g @storybook/cli
cd my-react-app
getstorybook
Packaging was accomplished through webpack for bundle JavaScript files for usage in a browser. While developing, we tested the application locally by interacting with the Backend layer without having the need to deploy to the server through the CI/CD process.
Backend/API layer:
Back-end APIs are required to support the UI layer is enabled through API Gateway, AWS Lambda, and DynamoDB.
The creation and integration of backend resources is achieved through AWS Mobile CLI to expedite the development of the required AWS components. This project uses the AWS Amplify JavaScript library to add cloud support to the application.
We used a separate project for the backend APIs . Here are the prerequisites:
It includes Installing AWS mobile CLI, Configuring AWS credentials and Installing React native project.
npm install -g awsmobile-cli
awsmobile configure
npm install -g create-react-native-app
Create the project and execute init command for the backend project for your app:
create-react-native-app BackendProject
cd BackendProject
awsmobile init
awsmobile cloud-api enable
awsmobile cloud-api configure
NOTE: If you go to the AWS Mobile Hub, you can see this project. However, the AWS Mobile Hub is required only to support the mobile clients while leveraging the API Gateway and Lambda functions generated as part of this process.
To create new APIs and path (/items), use the following command which can also be used to edit APIs to manage different paths.
awsmobile cloud-api configure
NOTE: This will create the required path and actions (GET, POST,PUT, DELETE…) under the awsmobilejs/backend folder. The app.js can be modified with the required backend database access or business logic code.
Save and push to cloud.
awsmobile push
This will create the required API in API Gateway, Lambda function, and also in the MobileHub-> cloudLogic featuring the APIs. It can be tested there as well.
This node example can be used to test the APIs launched in the API Gateway:
NOTE: aws-exports.js can be referred to find the parameters in the reference code
var apigClientFactory = require('aws-api-gateway-client').default;
config = {invokeUrl:'https://xxxxxxxxxxxx.execute-api.us-east-1.amazonaws.com/Development'}
var apigClient = apigClientFactory.newClient({
invokeUrl:'https://xxxxxxxxxxx.execute-api.us-east-1.amazonaws.com/Development',
accessKey: process.env.AWS_ACCESS_KEY_ID, secretKey: process.env.AWS_SECRET_ACCESS_KEY,
region: 'us-east-1', // OPTIONAL: The region where the API is deployed
systemClockOffset: 0 ,// OPTIONAL: An offset value in milliseconds to apply to signing time
retries: 4, // OPTIONAL: Number of times to retry before failing. Uses axon-retry plugin.
retryCondition: (err) => { // OPTIONAL: Callback to further control if request should be retried.
return err.response.status === 500; } ; });
var params = {
//This is where any header, path, or querystring request params go. The key is the parameter named as defined in the API
//userId: '1234',
};
// Template syntax follows url-template https://www.npmjs.com/package/url-template
var pathTemplate = '/items'
var method = 'GET';
var additionalParams = {
};
var body = {
//This is where you define the body of the request
};
apigClient.invokeApi(params, pathTemplate, method, additionalParams, body)
.then(function(result) {console.log(result);
}).catch( function(result){console.log(result); });
The recommended approach to interact from UI is by using aws-amplify for cloud services.
import Link from 'link-react';
import { Table } from 'semantic-ui-react';
import awsmobile from './configuration/aws-exports';
import Amplify,{API} from 'aws-amplify';
Amplify.configure(awsmobile);
To make calls to the API Gateway through AWS Amplify, you need your IdentityPoolID in aws-exports.js. For further documentation, refer to AWS Amplify Modify the App component like
class App extends Component {
state = { data: [] }
fetch = async () => {
this.setState(() => {
return { loading: true }
});
API.get('abtestAPI','/items')
.then(resp => {
this.setState({
data: resp });
console.log("response is : ", resp); })
.catch (err => console.log(err))
} }
//this.state.data[] array variables will reflect the return data
CI/CD approach:
Repository – Bitbucket repository is used for managing the application source code. In addition, the CloudFormation template files and the configuration files required to integrate with AWS platform are also maintained to accomplish managing the infrastructure as code.
Bitbucket offers an integrated CI/CD environment with Pipelines that can automate the code to build, test and deploy processes to manage an entire workflow from taking checked in code to deployment into target environments. Since our target Build, Test and Deployment platform is AWS, AWS CodeCommit, CodeBuild & AWS CodePipeline CI/CD services are leveraged to accomplish the application and infrastructure updates in a faster, more reliable, stable and native manner. We leverage Bitbucket Pipelines to push code to AWS CodeCommit per commit to make the integration between Bitbucket and AWS seamless to the developer.
A “Templates” folder is maintained in the Bitbucket repository with the cloud formation template stacks.
BitBucket pipeline(Configure bitbucket-pipelines.yml) is configured with the following tasks to integrate with AWS:
bitbucket-pipelines.yml
NOTE: AWS CodeBuild supports Bitbucket as a source, however CodePipeline does not. That’s why the AWS CodeCommit is the option we chose, as opposed to Github or S3.
AWS CodePipeline integrates with CodeCommit, AWS CodeBuild and code deployment using CloudFormation to avail the application.
AWS CodeBuild compiles the source code using Docker and creates the output file for deployment. webpack build is included to the Docker codebuild, so CodeBuild will generate new files every time.
Infrastructure deployment occurs by using a CloudFormation script to create the required infrastructure, after which the code is run with an ECS container(Fargate). The application can be started or restarted with each successful commit/build/test/deployment with ECS pulling and running the specific image that was created during its specific run of the pipeline.
The major advantage with this architecture in place is it allows us to go completely serverless by simply modifying the CI/CD process in place. Currently S3 is used only as the storage to support other services. In the event of going fully serverless, S3 will host the front-end website with AWS CloudFront, and the same backend API Gateway/Lambda/DynamoDB replacing ECS from the current solution.
This article should give you some useful ideas for architecting and developing a PoC or an application spanning several layers, tools, tips and techniques in the AWS platform.
References :