Secure RAG for Safe AI Deployments Using F5 Distributed Cloud and NetApp ONTAP
Retrieval Augmented Generation (RAG) is one of the most discussed techniques to empower Large Language Models (LLM) to deliver niche, hyper-focused responses pertaining to specialized, sometimes proprietary, bodies of knowledge documents. Two simple examples might include highly detailed company-specific information distilled from years of financial internal reporting from financial controllers or helpdesk type queries with the LLM harvesting only relevant knowledge base (KB) articles, releases notes, and private engineering documents not normally exposed in their entirety. RAG is highly bantered about in numerous good articles; the two principal values are: LLM responses to prompts (queries) based upon specific, niche knowledge as opposed to the general, vast pre-training generic LLMs are taught with; in fact, it is common to instruct LLMs not to answer specifically with any pre-trained knowledge. Only the content “augmenting” the prompt. Attribution is a key deliverable with RAG. Generally LLM pre-trained knowledge inquiries are difficult to traceback to a root source of truth. Prompts augmented with specific assistive knowledge normally solicit responses that clearly call out the source of the answers provided. Why is the Security of RAG Source Content Particularly Important? To maximize the efficacy of LLM solutions in the realm of artificial intelligence (AI) an often-repeated adage is “garbage in, garbage out” which succinctly states an obvious fact with RAG: valuable and actionable items must be entered into the model to expect valuable, tactical outcomes. This means exposing key forms of data, examples being data which might include patented knowledge, intellectual property not to be exposed in raw form to competitors. Actual trade secrets, which will infuse the LLM but need to remain confidential in their native form. In one example around trade secrets, the Government of Canada spells out a series of items courts will look at in determining compensation for misuse (theft) of intellectual property. It is notable that the first item listed is not the cost associated with creation of the secret material (“the cost in money or time of creating or developing the information”) but rather the very first item is instead how much effort was made to keep the content secure (“the measures taken to maintain secrecy”). With RAG, incoming queries are augmented with rich, semantically similar enterprise content. The content has already been populated into a vector database by converting documents, they might be pdf or docx as examples, into raw text form and converting chunks of text into vectors. The vectors are long sequences of numbers with similar mathematical attributes for similar content. As a trivial example, one-word chunks such as glass, cup, bucket, jar might be semantically related, meaning similarities can be construed by both human minds and LLMs. On the other hand, empathy, joy, and thoughtfulness maintain similarities of their own. This semantic approach means a phrase/sentence/paragraph (chunk) using bow to mean “to bend in respect” will be highly distinct from chunks referring to the “front end of a ship" or “something to tie one’s hair back with”, even a tool every violinist would need. The list goes on; all semantic meanings of bow are very different in these chunks and would have distinctive embeddings within a vector database. The word embedding is likely derived from “fixing” or “planting” an object. In this case, words are “embedded” into a contextual understanding. The typical length of the number sequence describing the meaning of items has typically been more than 700, but this number of “dimensions” applied is always a matter of research, and the entire vector database is arrived at with an embedding LLM, distinct from the main LLM that will produce generative AI responses to our queries. Incoming queries destined for the main generative AI LLM can, in turn, be converted to vectors themselves by the very same text-embedding “helper” LLM and through retrieval (the “R” in RAG) similar textual content can buttress the prompt presented to the main LLM (double click to expand). Since a critical cog in the wheel of the RAG architecture is the ingestion of valuable and sensitive source documents into the vector database, using the embedding LLM, it is not just prudent but critical that this source content be brought securely over networks to the embedding engine. F5 Distributed Cloud Secure Multicloud Networking and NetApp ONTAP For many practical, time-to-market reasons, modern LLMs, both the main and embedding instances, may not be collocated with the data vaults of modern enterprises. LLMs benefit from cloud compute and GPU access, something often in short supply for on-premises production roll outs. A typical approach assisted by the economies of scale might be to harvest public cloud providers, such as Azure, AWS, and Google Cloud Platform for the compute side of AI projects. Azure, as one example, can turn up virtual machines with GPUs from NVIDIA like A100, A2, and Tesla T4 to name a few. The documents needed to feed an effective RAG solution may well be on-premises, and this is unlikely to change for reasons including governance, regulatory, and the weight of decades of sound security practice. One of the leading on-premises storage solutions of the last 25 years is the NetApp ONTAP storage appliance family, and reflected in this quote from NVIDIA: "Nearly half of the files in the world are stored on-prem on NetApp." — Jensen Huang, CEO of NVIDIA A key deliverable of F5 Distributed Cloud is providing encrypted interconnectivity of disparate physical sites and heterogeneous cloud instances such as Azure VNETs or AWS VPCs. As such, there are two immediate, concurrent F5 features that come to mind: Secure interconnectivity of on-premises NetApp volumes (NAS) or LUNs (Block) containing critical documents for ingestion into RAG. Utilize encrypted L3 connectivity between the enterprise location and the cloud instance where the LLM/RAG are instantiated. TCP load balancers are another alternative for volume sharing NAS protocols like NFS or SMB/CIFS. Secure access to the LLM web interface or RESTful API end points, with HTTPS load balancers including key features like WAF, anti-bot mechanisms, and API automatic rate limiting for abusive prompt sources. The following diagram presents the topology this article set out to create, REs are “regional edge” sites maintained internationally by F5 and harness private RE to RE, high-speed global communication links. DNS names, such as the target name of an LLM service, will leverage mappings to anycast IP addresses, thus users entering the RE network from southeast Asian might, for example, enter the Singapore RE while users in Switzerland might enter via a Paris or Frankfurt RE. Complementing the REs are Customer Edge (CE) nodes. There are virtual or physical appliances which act as security demarcation points. For instance, a CE placed in an Azure VNET can protect access to the server supporting the LLM, removing any need for Internet access to the server, which is now entirely accessible only through a private RFC-1918 type of private address. External access to the LLM for just employees or, maybe employees and contractors, or potentially access for the Internet community is enabled by a distributed HTTPS load balancer. In the example depicted above, oriented towards full Internet access, the FQDN of the LLM is projected by the load balancer into the global DNS and consumers of the service resolve the name to one IP address and are attracted to the closest RE by BGP-4’s support for anycast. As the name “distributed” load balancer suggests, the origin pool can be in an entirely different site than the incoming RE, in this case the origin pool is the LLM behind the CE in the Azure VNET. The LLM requests travel from RE to CE via a highspeed networking underlay. The portion of the solution that securely ties the LLM to the source content required for RAG to embed vectors is, in this case, utilizing layer 3 multicloud networking (MCN). The solution is turnkey, routing table are automatically connected to members of the L3 MCN, in this case the inside interfaces of the Azure CE and Redmond, Washington on-premises CE and traffic flows over an encrypted underlay network. As such, the NetApp ONTAP cluster can securely expose volumes with key file ware via a protocol like Network File System (NFS), no risk of data exposure to third-party prying eyes exists. The following diagram drills into the RE and CE and NetApp interplay (double click to expand). F5 Distributed Cloud App Connect and LLM Setup This article speaks to hands-on experience with web-driven LLM inferencing with augmented prompts derived from a RAG implementation. The AI compute was instantiated on an Azure-hosted Ubuntu 20.04 virtual machine with 4 virtual cores. Installed software included Python 3.10, and libraries such as Langchain, Pypdf (for converting pdf documents to text), FAISS (for similarity searching via a vector database), and other libraries. The actual open source LLM utilized for the generative AI is found here on huggingface.co. The binary, which exceeds 4 GB, is considered effective for CPU-based deployments. The embedding LLM model, critical to seed the vector database with entries derived from secured enterprise documentation, and then used again per incoming query for RAG similarity searches to build augmented prompts, was from Hugging Face: sentence-transformers/all-MiniLM-L6-v2 and can be found here. The AI RAG solution was implemented in Python3, and as such the Azure Ubuntu can be accessed both by SSH or via Jupyter Notebooks. The latter was utilized as this is the preferred final delivery mechanism for standard users, not a web chatbot design or the requirement to use API commands through solutions like Postman or Curl. This design choice, to steer the user experience towards Jupyter Notebook consumption, is in keeping with the fact that it has become a standard in AI LLM usage where the LLM is tactical and vital to an enterprise's lines of business (LOBs). Jupyter Notebooks are web-accessed with a browser like Chrome or Edge and as such, F5’s WAF, anti-bot, and L7 DDoS, all part of the F5 WAAP offering, can easily be laid upon an HTTP load balancer with a few mouse clicks in XC to provide premium security to the user experience. NetApp and F5 Distributed Cloud Secure Multicloud Networking The secure access to files for ingestion into the vector database, for similarity searches when user queries are received, makes use of an encrypted L3 Multicloud Network relationship between the Azure VNET and the LAN on prem in Redmond, Washington hosting the NetApp ONTAP cluster. The specific protocol chosen was NFS and the simplicity is demonstrated by the use of one Linux command to present key, high-valued documents for the AI steps to populate the database: #mount -t nfs <IP Address of NetAPP LIF interface on-prem>:/Secure_docs_for_RAG /home/ubuntu_restriced_user/rag_project/docs/Secure_docs_for_RAG. This address is available nowhere else in the world except behind this F5 CE in the Azure VNET. After the pdf files are converted to text, chunked to reasonable sizes with some overlap suggested between the end of one chunk and the start of the next chunk, the embedding LLM will populate the vector database. The files are always only accessed remotely by NFS through the mounted volume, and this mount may be terminated until new documents are ready to be added to the solution. The Objective RAG Implementation - Described In order to have a reasonable facsimile of the real-word use cases this solution will empower today, but not having any sensitive documents to be injected, it was decided to use some seminal “Internet Boom”-era IETF Requests for Comment (RFCs) as source content. With the rise of multi-port routing and switching devices, it became apparent the industry badly needed specific and highly precise definitions around network device (router and switch) performance benchmarking to allow purchasers “apples-to-apples” comparisons. These documents recommend testing parameters, such as what frame or packet sizes to test with, test iteration time lengths, when to use FIFO vs LIFO vs LILO definitions of latency, etc. RFC-1242 (Request for Comment, terminology) and RFC-2544 (methodologies), chaired by Scott Bradner of Harvard University, and the later RFC 2285 (LAN switching terminologies), chaired by Bob Mandeville then of European Network Laboratories are three prominent examples, to which test and measurement solutions aspired to be compliant. Detailed LLM answers for quality assurance engineers in the network equipment manufacturing (NEM) space is the intended use case of the design, answers that must be distilled specifically by generative AI considering queries augmented by RAG and specifically only based upon these industry-approved documents. These documents are, of course, not containing trade secrets or patented engineering designs. They are in fact publicly available from the IETF, however they are nicely representative of the value offered in sensitive environments. Validating RAG – Watching the Context Provided to the LLM To ensure RAG was working, the content being augmented in the prompt was displayed to screen, we would expect to see relevant clauses and sentences from the RFCs being provided to the generative AI LLM. Also, if we were to start by asking questions that were outside the purview of this testing/benchmarking topic, we should see the LLM struggle to provide users a meaningful answer. To achieve this, rather than, say, asking what 802.3/Ethernetv2 frame sizes should be used in throughput measurements, and what precisely is the industry standard definition of the term “throughput” was, the question instead pertained to a recent Netflix release, featuring Lindsay Lohan. Due to the recency of the film, even if the LLM leaned upon its pretrained model, it will come up with nothing meaningful. “Question: Important, only use information provided as context in the prompt, do not use other trained knowledge. Please identify who played Heather in the March 2024 Lindsay Lohan Netflix movie titled Irish Wish?” As seen in the following Jupyter screenshot, the RAG solution can only provide augmented prompts from the database, in this case it has some test and measurement clauses and some rules pertaining to the winter ice sport of curling (double click to enlarge). The supportive context data augmenting the prompt was, as expected, not going to help in this off-the-wall query, as only the RFCs and some sports rules had been introduced to the vector database. What is especially noteworthy is the LLM did not simply admit it could not possibly answer this question but hallucinated a very articulate response that indicated “Heather played for her team against Lindsay Lohan’s Team Ireland”. To validate the RAG architecture for a working scenario, actually pertinent to the movie, a film review in pdf format was added to the ONTAP cluster-housed volume. The supportive documentation was consumed (using Pypdf from the Jupyter Notebook it became text). It was automatically chunked and introduced to the FAISS vector database through the Hugging Face embedding LLM. With the RAG vector database populated by the embedding LLM, the responses then became meaningful. The following exchange shows the value of RAG in producing meaningful, thought-out answers: Question: Important, only use information provided as context in the prompt, do not use other pre-trained knowledge. Who does Maddie edit novels for in the Lindsay Lohan March 2024 Netflix movie titled Irish Wish? Answer: Let's think step by step. We know that Maddie Kelly plays a book editor who harbors a big secret, and we have been given information about her job as an editor for Paul Kennedy (played by Alexander Vlahos), which is also part of the question context provided above. Therefore it seems logical to infer that she edits novels in general but specifically those written or co-written by author Paul Kennedy - who happens to be a bestselling romance novelist as stated earlier on, so we can safely conclude Maddie works for Paul himself (as opposed to any other book publishing company). This time we see relevant, meaningful context provided by RAG, with relevant portions of the pdf file review document augmenting the prompt. The augmentations are normally not displayed unless requested by the user or programmer, unlike humans, LLMs do not require superfluous text formatting (advantage: LLMs) and thus the content is packed tightly and efficiently. Fewer characters also mean fewer tokens get used by the targeted AI model and can allow more data before a token context limit is reached. In pay-for-use LLM approaches fewer tokens also help the enterprise’s bottom line financially. Also, note that the answer will likely not always be identical with subsequent asks of the same question as per LLM normal behavior. Features like “temperature setting” can also allow more “creative” ideas in responses, injecting humor and even outlandishness if desired. The RAG workflow is now validated, but the LLMs in question (embedding and main generative LLM) can still be made better with these suggestions: Increase “chunk” sizes so ideas are not lost when excessive breaks make for short chunks. Increase “overlap” so an idea/concept is not lost at the demarcation point of two chunks. Most importantly, provide more context from the vector database as context lengths (maximum tokens in a request/response) are generally increasing in size. Llama2, for instance, typically has a 4,096 context length but can now be used with larger values, such as 32,768. This article used only 3 augmentations to the user query, better results could be attained by increasing this value at a potential cost of more CPU cycles. Using Secure RAG – F5 L3 MCN, HTTPS Load Balancers and NetApp ONTAP Together With the RAG architecture validated to be working, the solution was used to assist the target user entering queries to the Azure server by means of Jupyter Notebooks, with RAG documents ingested over encrypted, private networking to the on-premises ONTAP cluster NFS volumes. The questions posed, which are answerable by reading and understanding key portions spread throughout the Scott Bradner RFCs, was: “Important, only use information provided as context in the prompt, do not use other pre-trained knowledge. Please explain the specific definition of throughput? What 802.3 frame sizes should be used for benchmarking? How long should each test iteration last? If you cannot answer the questions exclusively with the details included in the prompt, simply say you are unable to answer the question accurately. Thank you." The Jupyter Notebook representation of this query, which is made in the Python language and issued from the user’s local browser anywhere in the world and directly against the Azure-hosted LLM, looks like the following (click to expand image): The next screenshot demonstrates the result, based upon the provided secure documents (double click to expand). The response is decent, however, the fact that it is clearly using the provided augmentations to the prompt, that is the key objective of this article. The accuracy of the response can be questionable in some areas, the Bradner RFCs highlighted the importance of 64-byte 802.3/Ethernetv2 frame sizes in testing, as line rate forwarding with this minimum size produces the highest theoretically possible frame per second load. In the era of software driven forwarding in switches and routers this was very demanding. Sixty-four byte frames result in 14,881 fps (frames per second) for 10BaseT, 148,809 fps for 100BaseT, 1.48 million fps for Gigabit Ethernet. These values were frequently more aspirational in earlier times and also a frequent metric used in network equipment purchasing cycles. Suspiciously, the LLM response calls out 64kB in 802.3 testing, not 64B, something which seems to be an error. Again, with this architecture, the actual LLM providing the generative AI responses is increasingly viewed as a commodity, alternative LLMs can be plugged quickly and easily into the RAG approach of this Jupyter Notebook. The end user, and thus the enterprise itself, is empowered to utilize both different LLMs, purchased or open-source from sites like Hugging Face, to determine optimal results. The other key change that can affect the overall accuracy of results is to experiment with different embedding models. In fact, there are on-line “leader” boards strictly for embedding LLMs so one can quickly swap in and out various popular embedding LLMs to see the impact on results. Summary and Conclusions on F5 and NetApp as Enablers for Secure RAG This article demonstrated an approach to AI usage that leveraged the compute and GPU availability that can be found today within cloud providers such as Azure. To safely access such an AI platform for a production-grade enterprise requirement, F5 Distributed Cloud (XC) provided HTTPS load balancers to connect worker browsers to a Jupyter Notebook service on the AI platform, this service applies advanced security upon the traffic within the XC, from WAF to anti-bot to L3/L7 DDOS protections. Utilizing secure Multicloud Networking (MCN), F5 provided a private L3 connectivity service between the inside interface on an Azure VNET-based CE (customer edge) node and the inside interface of an on-premises CE node in a building in Redmond, Washington. This secure network facilitated an NFS remote volume, content on spindles/flash in on-premises NetApp ONTAP to be remotely mounted on the Azure server. This secure file access provided peace of mind to exposing potentially critical and private materials from NetApp ONTAP volumes to the AI offering. RAG was configured and files were ingested, populating a vector database within the Azure server, that allowed details, ideas, and recommendations to be harnessed by a generative AI LLM by augmenting user prompts with text gleaned from the vector database. Simple examples were used to first demonstrate that RAG was working by posing queries that should not have been addressed by the loaded secure content; such a query was not suitably answered as expected. The feeding of meaningful content from ONTAP was then demonstrated to unleash the potential of AI to address queries based upon meaningful .pdf files. Opportunities to improve results by swapping in and out the main generative AI model, as well as the embedding model, were also considered.418Views1like0CommentsSecuring and Scaling Hybrid apps with F5 NGINX (Part 2)
If you attended a cybersecurity trade-show lately, you may have noticed the term “Zero Trust (ZT)” advertised on almost every booth. It may seem like most security companies are offering the same value proposition: ‘Securing apps with ZT’. Its commonality stems from the fact that ZT is a broad term that can span endless use cases. ZT is not a feature or capability, rather a philosophy embraced by IT security leaders based on the idea that all traffic entering and exiting a system is not trusted and must be scrutinized before passing through. Organizations are shifting to a zero-trust mindset due to the increased complexity of cyber-attacks. Perimeter based firewalls are no longer sufficient in securing digital resources. In Part 1 of our series, we configured NGINX Plus as an external load balancer to route and terminate TLS traffic to cluster nodes. In this part of the series, we leverage the same NGINX Plus deployment to enable ZT use cases that will improve the security posture of your hybrid applications. NOTE: Part 1 of the series is a prerequisite for enabling the ZT use cases in our examples. Please ensure that part 1 is completed before starting with part 2 Part 1 of the series ZT Use case #1: OIDC Authentication OIDC (OpenID Connect) is an authentication layer on top of the OAuth 2.0 framework. Many organizations will choose OIDC to authenticate digital identities and enable SSO (Single-Sign-On) for consumer applications. With Single-Sign-on technologies, users gain access to multiple applications with one set of user credentials by authenticating their identities through an IdP (Identity Provider). I can configure NGINX Plus to operate as an OIDC relaying party to exchange and validate ID tokens with the IdP (Identity Provider), in addition to basic reverse-proxy load balancing configured in part 1. I will extend the architecture in part 1 with an IdP and configure NGINX Plus as the identity aware proxy. Prerequisites for NGINX Plus Before configuring NGINX Plus as the OIDC identity aware proxy: 1. Installation of NJS is required. $ sudo apt-get install nginx-plus-module-njs 2. Load the NJS module into the NGINX configuration by adding the following line at the top of yournginx.conf file. load_module modules/ngx_http_js_module.so; 3. Clone the OIDC GitHub repository in your directory of choice cd /home/ubuntu && git clone --branch R28 https://github.com/nginxinc/nginx-openid-connect.git Setting up the IdP The IdP will manage and store digital identities to mitigate attackers from impersonating users to steal sensitive information. There are many IdP vendors to choose from; Okta, Ping Identity, Azure AD. We chose Okta as the IdP in our example moving forward. If you do not have access to an IdP, you can quickly get started with the Okta Command Line Interface (CLI) and run the okta register command to sign up for a new account. Once account creation is successful, we will use the Okta CLI to preconfigure Okta as the IdP, creating what Okta calls an app integration. Other IdPs will have different nomenclatures defining an application integration. For example, Azure AD will call them App registrations. If you are not using Okta, you can follow the documentation of the IdP you are using and skip to the next section (Configuring NGINX as the OpenID Connect relying party). 1. Run the okta login command to authenticate the Okta Cli with your Okta developer account. Enter your Okta domain and API token at the prompts $ okta login Okta Org URL: https://your-okta-domain Okta API token: your-api-token 2. Create the app integration $ okta apps create --app-name=mywebapp --redirect-uri=https://<nginx-plus-hostname>:443/_codexch where --app-name defines the application name (here, mywebapp) --redirect-uri defines the URI to which sign-ins are redirected to the NGINX Plus. <nginx-plus-hostname> should resolve to the NGINX Plus external IP configured in part 1. We use port 443 since TLS termination is configured on NGINX Plus from part 1. Recall we used self-signed certificates and keys to configure TLS on NGINX Plus. In a production environment, we recommend using certs/keys issued by a trusted certificate authority such as Let’s Encrypt. Once the command from step #2 is completed, the client ID and Secret generated from the app integration can be found in ${HOME}/.okta.env Configuring NGINX as the OpenID Connect relying party Now that we have finished setting up our IdP, we can now start configuring NGINX Plus as the OpenID Connect relying party. Once logged into the NGINX Plus instance, simply run the configuration script from your home directory. $ ./nginx-openid-connect/configure.sh -h <nginx-plus-hostname> -k request -i <YOURCLIENTID> -s <YOURCLIENTSECRET> –xhttps://dev-xxxxxxx.okta.com/.well-known/openid-configuration where -h defines the hostname of NGINX Plus -k defines how NGINX will retrieve JWK files to validate JWT signatures. The JWK file is retrieved from a subrequest to the IdP -i defines the Client ID generated from the IdP -s defines the Client Secret generated from the IdP -x defines the URL of the OpenID configuration endpoint. Using Okta as the example, the URL starts with your Okta organization domain, followed by the path URI /.well-known/openid-configuration The configure script will generate OIDC config files for NGINX Plus. We will copy the generated config files into the /etc/nginx/conf.d directory from part 1. $ sudo cp frontend.conf openid_connect.js openid_connect.server_conf openid_connect_configuration.conf /etc/nginx/conf.d/ You will notice by default that frontend.conf listens on port 8010 with clear text http. We need to merge kube_lb.conf into frontend.conf to enable both use cases from part 1 and 2. The resulting frontend.conf should look something like this: https://gist.github.com/nginx-gists/af067326734063da6a4ff42146873262 Finally, I will need to edit the openid_connect_configuration.conf file and modify my client secret to the one generated by my Okta IdP. Reload NGINX Plus for the new config to take effect. $ nginx -s reload Testing the Environment Now we are ready to test our environment in action. To summarize, we set up an IdP and configured NGINX Plus as the identity aware proxy to validate user ID tokens before the entering the Kubernetes cluster. To test the environment, we will open a browser and enter the hostname of NGINX Plus into the address field. You should be redirected to your IdP login page. Note: The host name should resolve to the Public IP of the NGINX Plus machine. Once you are prompted with the IdP login page from your browser, you can access the Kubernetes pods once the user credentials are validated. User credentials should be defined from the IdP. Once you are logged into your application, the ID token of the authenticated is stored in the NGINX Plus Key-Value Store. Enabling PKCE with OIDC In the previous section, we learned how to configure NGINX Plus as the OIDC relying party to authenticate user identities attempting connections to protected Kubernetes applications. However, there are few cases where attackers can intercept code exchange requests issued from the IdP and hijack your ID tokens and gain access to your sensitive applications. PKCE is an extension of the OIDC Authorization code flow designed to protect against authorization code interception and theft. PKCE provides an extra layer of security where the attacker will need to provide a code verifier in addition to the authorization code in exchange for the ID token from the IdP. In our current setup, NGINX Plus will send a random generated PKCE code verifier as a query parameter when redirecting users to the IdP login page. The IdP will use this PKCE code verifier as extra validation when the authorization code is exchanged for the ID token. PKCE needs to be enabled from both NGINX Plus and the IdP. To enable PKCE verification on NGINX Plus, edit the openid_connect_configuration.conf file and modify $oidc_pkce_enable to 1 and reload NGINX Plus. Depending on the IdP you are using, a checkbox should be available to enable PKCE. Testing the Environment To test that PKCE is working, we will open a browser and enter the NGINX Plus host name once again. You should be redirected to the login page, only this time you will notice the URL had slightly changed with additional query parameters: code_challenge_method – Method used to hash the plain code verifier (most likely SHA256) code_challenge – The hashed value of the plain code verifier NGINX Plus will provide this plain code verifier along with the authorization code in exchange for the ID token. NGINX Plus will then validate the ID token and store it in cache. Extending OIDC with 3rd party Systems Customers may need to integrate their OIDC workflow with proprietary Auth/Auth systems already in production. For example, additional metadata pertaining to a user may need to be collected from an external Redis Cache or JFrog Artifactory. We can fill this gap by extending our diagram from the previous section. In addition to token validation with NGINX Plus, I pull response data from JFrog Artifactory and pass it to the backend applications once users are authenticated. Note: I am using JFrog Artifactory as an example 3rd party endpoint here. I can technically use any endpoint I want. Testing the Environment To test our environment in action, I will make a few updates to my NGINX OIDC configuration. You can pull our updated GitHub repository and use it as reference for updating your NGINX configuration. Update #1: Adding the njs source code The first update is extending NGINX Plus with njs and retrieving response data from our 3rd party system. Add the KvOperations.js source file in your /etc/nginx/njs directory. Update #2: frontend.conf I am adding lines 37-39 to frontend.conf so that NGINX Plus will initiate the sub-request to our 3rd party system after users are authenticated with OIDC. We are setting the URI of this sub-request to /kvtest. More on this on the next update. Update #3: openid_connect.server_conf We are adding lines 35-48 to openid_connect.server_conf consisting of two internal redirects in NGINX: /kvtest; Internal redirect from sub-requests with URI /kvtest will functions in KvOperations.js /auth; Internal redirect for sub-requests with URI /authwill be proxied to the 3rd party endpoint. You can replace the <artifactory-url> in line 47 with your own endpoints Update #4: openid_connect_configuration.conf This update is optional and applies when passing dynamic variables to your 3rd party endpoints. You can dynamically update variables on the fly by sending POST requests to the NGINX Plus Key-Value store. $ curl -iX POST -d '{"admin", "<input-value>"}' http://localhost:9000/api/7/http/<keyval-zone-name> We are defining/instantiating the new Key-Value store in lines 102-104.Once the updates are complete, I can test the optimized OIDC environment by troubleshooting/verifying the application is on the receiving end of my dynamic input values. Wrapping it up I covered a subset of ZT use cases with NGINX in hybrid architectures. The use cases presented in this article center around authentication. In the next part of our series, I will cover more ZT use case that include: Alternative authentication methods (SAML) Encryption Authorization and Access Control Monitoring/Auditing168Views0likes0CommentsPowerPoint, ArcaneDoor, the Z80 and Kaiser Permanente
Notable security news from the week of April 21st with a small side of nostalgia for the Z80 CPU; we'll dive into the exploitation of an old PowerPoint CVE from 2017, ArcaneDoor and the targeting of Cisco perimiter devices and an enormous breach of Kaiser Permanente user information!290Views3likes0CommentsGhostStripe, Sec Clearance bill, JR EAST, Vulnrichment, and Solar Storm
This weekKoichi is back as editor for another round-up of the news. This time I chose these security news: GhostStripe, Security Clearance bill, and RISS, Suspected attack on Japan Railway (JR) East, Vulnrichment; and Solar Storm.114Views1like0CommentsMitigating Application Threats with BIG-IP Next WAF
Overview of BIG-IP Next In today's modern world where the digital landscape is continuously evolving and security threats are becoming more sophisticated, the need for a robust and adaptive security solution is essential. BIG-IP Next is a next-generation solution which is setting a new standard for safeguarding your digital assets, protecting your applications, and empowering enterprises with the highest security efficacy.BIG-IP Next is the modernized solution optimized to simplify operations, enhance performance, and strengthen security. As per the official website, BIG-IP Next simplifies day-to-day ADC operations and accelerates application time-to-market through automation so that you can focus more on getting your apps online. BIG-IP Next’s modern, highly scalable software architecture is designed for maximum resiliency to support vast, dynamic application portfolios and their most complex traffic management and security policies, ensuring that applications are always available to end users. BIG-IP Next also provides deep insights into your application health, network performance, traffic patterns, and security threats to improve business decision-making. For a quick overview of BIG-IP Next and how the next-generation attributes can help you with your existing or new deployments, check out the video below. Here are some of the key capabilities that you can checkout and learn how you can mitigate app threats and security complexity with BIG-IP Next WAF: 1. Deploy HTTPS application with WAF Protection The first step in protecting your applications starts with onboarding your application in BIG-IP Next instance and creating a WAF security policy as per application requirements. Finally creating load balancers and applying the above-created WAF policies. Next, users can monitor the application traffic by navigating to their respective security dashboards and take necessary steps as per security insights. For more details, see this video. 2. Create and Manage Security Policies Sometimes creating security policies can be a time-consuming job, and BIG-IP Next has made this user-friendly for creating and managing security policies from a centralized UI. Users can create, delete or update their existing policies in fewer steps and can apply them directly to the applications, thereby decreasing the application delivery time to market. You can check out the video below for more details. 3. Create Security Policies using Templates One more advantage of BIG-IP Next is the support for creating security policies using templates and it’s just a one-click action using 'F5 BIG-IP Next’. Users can make use of default templates and protect their applications with zero effort, for ex. Using the Violation Rating Template. For more information, check below video. 4. Security Policy Migration Going through existing BIG-IP security policies and then creating the same ones in BIG-IP Next solution can be time-consuming. This is made easy so that users can migrate their security policy from 'F5 Advanced WAF' to 'F5 BIG-IP Next WAF' in a simple manner. With fewer steps, you can have your entire WAF security posture up without going through the rough step of creating them from scratch. Please refer to the video below for more insights. 5. Signatures and Threat Campaigns Update Regular update of attack signatures and threat campaigns is a vital step in safeguarding your applications against the latest attacks. This process is super easy using ‘F5 BIG-IP Next’ so that applications can mitigate them without the need for downtime. For step-by-step procedure to update signatures and threat campaigns, please check the video below. You can also check out the demo link below for detailed insights of how BIG-IP Next WAF enables the migration of apps and policies between BIG-IP TMOS and BIG-IP Next. The demo also shows how to deploy new web applications with WAF security policies included within BIG-IP Next Central Manager and finally how to analyze and respond to security incidents within the Next WAF dashboard. Reference links What is BIG-IP Next? | DevCentral Getting Started with BIG-IP Next: Fundamentals | DevCentral https://www.f5.com/products/big-ip-services/big-ip-next111Views0likes0Comments