Got an Error Reading Communication Packets Rds

Recently, we noticed a ton of Errors in AWS CloudWatch Logs, the message similar this:

2020-08-21T17:43:32.142855Z 8812 [Note] Aborted connection 8812 to db: 'dbname' user: 'db_user' host: 'x.x.x.x' (Got an error reading communication packets)

"Aborted connection" is a common MySQL communication error.  The possible reasons are numerous includes:

  1. It takes more than than connect_timeout seconds to obtain a connect packet.
  2. The client had been sleeping more than wait_timeout or interactive_timeout seconds without issuing whatever requests to the server.
  3. The max_allowed_packet variable value is besides small or queries require more retentivity than you have allocated for mysqld.
  4. The client program did not call mysql_close() before exiting.
  5. The client program ended abruptly in the middle of a data transfer.
  6. A connectedness packet does not contain the right data.
  7. Whatever DNS related issues and the hosts are authenticated against their IP address instead of Hostnames.

Aye, information technology's a long list, unfortunately.

MySQL Tuning

We went thru the list, the first thing we tried was increasing those three variables:

  1. max_allowed_packet = 500M
  2. innodb_log_buffer_size = 32M
  3. innodb_log_file_size = 2047M

Only they did not assistance, so we checked some timeout variables:

  1. wait_timeout -this parameter refers to the number of seconds the server waits for activity on a non-interactive connexion before endmost it
    • Immune value: 1-31536000
    • The default value for MySQL 5.seven: 28800
  2. connect_timeout - this parameter the number of seconds that the server waits for a connect packet before responding with Bad handshake.
    • Allowed value: 2- 31536000
    • The default value for MySQL 5.seven:ten
  3. interactive_timeout - this parameter refers to the number of seconds the server waits for activity on an interactive connection before endmost it.
    • Allowed value: 1-31536000
    • The default value for MySQL 5.seven: 28800

Nosotros have default values for all, so unlikely it is due to "The client had been sleeping more than wait_timeout or interactive_timeout seconds without issuing whatsoever requests to the server."

Application Troubleshooting

It's enough for the MySQL tuning. Nosotros looked into the Application side. We use Lambda Serverless  Apollo Server. The database connection is using KnexJS library.  Our DB connection was something similar this:

          require('mysql');  const dbContext = require('knex')({   client: 'mysql',   connection: async () => {     return {       host: await getSecret(`${process.env.ENV}_DB_HOST`),       user: look getSecret(`${process.env.ENV}_DB_USER`),       password: look getSecret(`${process.env.ENV}_DB_PASSWORD`),       database: expect getSecret(`${process.env.ENV}_DB_NAME`),       ssl: 'Amazon RDS'     };   },   pool: {     min: 2,     max: 10,      createTimeoutMillis: 30000,     acquireTimeoutMillis: 30000,     idleTimeoutMillis: 30000,     reapIntervalMillis: one thousand,     createRetryIntervalMillis: 100   },   debug: false });  consign default dbContext;        

Updated (10/19/2020) We removed the pool settings propagateCreateError: true. Per Mikael (knex/tarn maintainer) -

the setting, propagateCreateError, should never exist touched.

There are many people having this consequence on KnexJS GitHub problems channel (come across the reference links below.) The key point hither is that we are using serverless,

Lambda functions are stateless, so there is no way to share a connection pool betwixt functions. - rusher

And our DB connection had pool setting every bit min:two / max: 10. The good affair about connection pooling is that regarding wiki:

In software technology, a connectedness pool is a cache of database connections maintained so that the connections can exist reused when future requests to the database are required. Connection pools are used to enhance the operation of executing commands on a database.

The beginning expression for us was that since Lambda functions doe not share a connection pool, maintaining a connection pool becomes a waste matter hither. Those connections sleep there do nothing, and they end up taking all the bachelor connections. Then as many people suggested, we changed the pool setting to be

                      pool: {     min: 1,     max: 1,      ...        

Okay, after that, we saw some improvement. And nosotros also did one thing:

          const APIGatewayProxyHandler = (event, context, callback) => {   context.callbackWaitsForEmptyEventLoop = faux;        

By specifying context.callbackWaitsForEmptyEventLoop = false, we allow a DB connection to be maintained in a global variable in the lambda's container resulting in a faster connectedness.

However, we noticed the operation result later on changing the connection pool to be min:1/max:1. Why? Did we just said, "Lambda functions doe not share a connection pool"?Let'due south be more than clear, although Lambda functions practice non share a connection pool, the queries in the aforementioned function do share a connection pool. For example, we accept a GraphQL phone call, getCustomPages. When it's chosen, a DB connection is created, and and so all queries in this telephone call can share the connection pool, especially, those queries are written in an async way (i.e., with Promise) To demonstrate this, we can utilise MySQLWorkbench under Management/Client Connections. When the pool size is i, we see only one connectedness created for i GraphQL call, simply when the pool size is set to two/x, we meet more connections (well-nigh 10) are created for one call, and so we can see all queries are sharing those connections perfectly. We run across folio load time drops from ~10 seconds to a half-second by changing the connection pool from 1/one to ii/x. That's considering without sharing the puddle, every query has to look until another query finished earlier it tin be executed. If a lambda function has thousands of queries in a loop, the large connectedness poll volition truly boost the performance.

So for meliorate functioning, we should however use the default puddle connection. Simply because AWS Lambda functions exercise non share the pool properly, the pool connection will exist open for each Lambda office call and terminate up using all of the bachelor MySQL connections.

Later on looking into tarn.js, the library KnexJS used to manage the connection pool, nosotros plant out that the setting idleTimeoutMillis controls how many mill seconds before the gratuitous resources are destroyed. For example, if nosotros prepare it to 30000, ie. xxx seconds, and so the idle connections will be destroyed later on xxx seconds. Notwithstanding, it will non destroy all sleeping connections, only continue the minimum number of connections set in the pool configuration. Since we cannot reuse the pool connection in the adjacent Lambda function phone call, why non gear up the minimum to 0?

Brilliant! The final solution we have is:

  • Use the pool settings as: (Updated: 8/26/2020 - since the connections are not shared by next Lambda function call, nosotros could set idelTiemoutMillis to exist much shorter, and we changed it to 1000 from 30000.)
          pool: {     min: 0,     max: 10,      createTimeoutMillis: 30000,     acquireTimeoutMillis: 30000,     idleTimeoutMillis: k, //changed it from 30000 to yard     reapIntervalMillis: 1000,     createRetryIntervalMillis: 100,     propagateCreateError: true   },        
  • Set up context.callbackWaitsForEmptyEventLoop to be simulated (Annotation: the DB connectedness exterior of Lambda handler function)
          const APIGatewayProxyHandler = (event, context, callback) => {   context.callbackWaitsForEmptyEventLoop = false;        

The Ultimate Solution

As nosotros repeated many times higher up, the bottleneck of the AWS Lambda functions is that they don't share the resources. To better understand this, please check the following images:

Many concurrent connections kill Amazon RDS - Photo Past thundra

Since each Lambda function is an individual process, it has to plant its connection to the DB instance. This blueprint becomes very resource-intensive.

Luckily, Amazon launched the preview of Amazon RDS Proxy in December 2019 and and then made it generally bachelor for both Mysql and PostgreSQL engines in Jun 2020, which address this issue. We phone call it "the Ultimate Solution" here. :-)

Amazon RDS Proxy allows applications to pool and share connections established with the database, improving database efficiency, awarding scalability, and security. RDS Proxy reduces customer recovery time after failover past upwardly to 79% for Amazon Aurora MySQL and past up to 32% for Amazon RDS for MySQL. Besides, its hallmark and admission can be managed through integration with AWS Secrets Manager and AWS Identity and Access Management (IAM).

Again, nosotros borrow a cute diagram from Thundra:

Many concurrent connections connect to Amazon RDS via RDS Proxy - Photo Past thundra

Please follow the links to run across how to set upwardly Amazon RDS Proxy if you are interested. Nosotros are about the end of this long long mail. Thank you for reading.

Updated (08/27/2020): We added AWS RDS Proxy.

References:

  • https://github.com/knex/knex/bug/1875 (How practice I use Knex with AWS Lambda
  • https://github.com/vincit/tarn.js (Another Resource Pool)
  • https://blog.thundra.io/can-lambda-and-rds-play-nicely-together (Play Lambda and RDS Nicely Together)
  • https://aws.amazon.com/blogs/aws/amazon-rds-proxy-now-mostly-bachelor/ (AWS RDS Proxy)

kemptonbunecand.blogspot.com

Source: https://deniapps.com/blog/mysql-rds-aborted-connection-error-aws-lambda

0 Response to "Got an Error Reading Communication Packets Rds"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel