分类 mysql 下的文章 - Pengc

登录 / 注册

标签搜索

CC

累计撰写 19 篇文章
累计收到 1 条评论

搜索到 2 篇与 mysql 的结果

2023-03-04
我是怎么样优化API的？背景在系统开发过程中，以迭代功能为主，在使用过程中遇到性能瓶颈再进行针对性优化。本次遇到的问题是在导入3000+FAQ时导入时间大概500S左右，用户端提示导入失败，结果数据在导入中。文章出现的代码经过处理，不包含任何具体隐私信息，仅还原场景概念：FAQ：一般指常见问题解答。常见问题解答（frequently-asked questions，简称FAQ）即：一个问题对应一个答案。他的结构大概是一组相关的问题和一个答案。比如“你是谁？”这个问题，他会有不同的问法：“你叫什么？你叫啥？”，然后对应一个答案："我是xxx"意图/Intent：即FAQ中的“你是谁”目标1、在3000条数据时，导入在时间优到5秒内（而用户能够忍受的最长等待时间在6～8秒之间）；2、优化查询，返回时间在1S内；步骤1、找出瓶颈点；2、确定优化措施；3、实施；这三个步骤看似简单，其实需要在开发过程中逐步积累出自己的套路及经验。FAQ批量导入这里从系统csv文件中读取出意图和答案，再将数据保存到数据库，然后做一些检查，把错误的信息打上标签。解析数据保存校验、更新找出瓶颈点首先，我们发现API慢时，我们要知道哪些步骤慢。这里我们用到google的Stopwatch进行“打桩”，给个操作计时。如 public static void main(String[] args) { // 创建后可多次输出耗时 Stopwatch stopwatch = Stopwatch.createStarted(); ThreadUtil.sleep(1000); System.out.println("消耗时间：" + stopwatch.elapsed(TimeUnit.MILLISECONDS)); ThreadUtil.sleep(2000); System.out.println("消耗时间：" + stopwatch.elapsed(TimeUnit.MILLISECONDS)); } 消耗时间：1001 消耗时间：3009回到我们的导入FAQ代码 public void save(String faqId, MultipartFile file) throws Exception { Stopwatch stopwatch = Stopwatch.createStarted(); // 1、从faq文件解析出数据 List<List<String>> rows = _readRows(file); LOG.info("[import faq - finish check used:]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); // 获取一个链接 Connection conn = getConn(); // 一个FAQ由一个User、一个Bot组成 // 转换成faq pair 对 List<Pair<User, Bot>> details = _convertRows(rows, faqId); LOG.info("[import faq - convert: ]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); // 保存到数据库 for (Pair<User, Bot> p : details) { ComponentDao.add(conn, p.getKey().toComponent()); ComponentDao.add(conn, p.getRight().toComponent()); } LOG.info("[import faq - save 2 db: ]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); conn.commit(); LOG.info("[import faq - finish commit 2: ]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); _validate(faqId, conn); conn.commit(); } private void _validate(String faqId, Connection conn, String dbName) throws Exception { Stopwatch stopwatch = Stopwatch.createStarted(); // 1、创建一个上下文，携带一些信息 ComponentValidatorContext context = _buildContext(componentId, conn); LOG.debug("[validate faq - build context:{}]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); // 2、 query 当前节点所在的validator，将user、bot转换成校验器 Pair<ComponentValidator /*Faq Validator*/, List<ComponentValidator>/*children validators*/> validatorPair = _validators(context); LOG.debug("[validate faq - build validate pair:{}]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); // 3、遍历节点，更新错误信息 LOG.debug("[validate faq - start validate with length:{}, spend:{}]", validators.size(), stopwatch.elapsed(TimeUnit.MILLISECONDS)); int writeDbCount = 0; boolean hasError = false; for (ComponentValidator validator : validators) { StatusCodes statusCodes = validator.validate(context); if (statusCodes != null) { hasError = true; } // 节点错误 boolean updated = updateComponentError(validator); if (updated) { ComponentDao.update(conn, validator.getComponent()); writeDbCount++; } } LOG.debug("[validate faq - finish validate with write 2 db count:{}, spend:{}]", writeDbCount, stopwatch.elapsed(TimeUnit.MILLISECONDS)); // 3、更新FAQ上的标记 _updateError(faqId, hasError); LOG.debug("[validate used in millis:{}]", stopwatch.elapsed(TimeUnit.MILLISECONDS)); }运行结果[import faq - finish check used:147] [import faq - convert:241] [import faq - finish save 2 db:50870] [import faq - finish commit:50878] [start validate] [validate faq - build context:33486] [validate faq - build validate pair:35203] [validate faq - start validate with length:6795, spend:35203] [validate faq - finish validate with write 2 db count:6795, spend:361809] [validate used in millis:361818]到此，我们可以从打桩信息中得出瓶颈：保存到数据库，花费50s+构建上下文：33s+检验所有节点：326s+确定优化措施1、保存到数据库，花费50s+代码： // 保存到数据库 for (Pair<User, Bot> p : details) { ComponentDao.add(conn, p.getKey().toComponent()); ComponentDao.add(conn, p.getRight().toComponent()); }这里有3397个FAQ，6794个User、Bot节点；那么就执行6794次save操作。方案：结合以往经验，这里我们做批量保存。减少db写入次数，批量写入;mysql批量写入优化;1、改为批量写入 : 写入db次数变成了 14次private void _saveBach(List<Pair<RestProjectComponentUser, RestProjectComponentBot>> data, Connection conn, String dbName)throws SQLException { String sql = "INSERT INTO tableName (id, xx, xx) VALUES (?, ?, ?) "; PreparedStatement ps = conn.prepareStatement(sql); for (int i = 0; i < data.size(); i++) { Pair<User, Bot> item = data.get(i); Component user = item.getKey().toComponent(); Component bot = item.getValue().toComponent(); _add2Bach(ps, user); _add2Bach(ps, bot); // 每500次提交一下 if (i % 500 == 0) { ps.executeBatch(); conn.commit(); ps.clearBatch(); } } ps.executeBatch(); conn.commit(); ps.clearBatch(); } private void _add2Bach(PreparedStatement ps, Component component) throws SQLException{ ps.setString(1, component.getId()); ps.setString(2, component.getXX()); ps.setString(3, component.getXx()); ps.addBatch(); } 2、mysql批量写入优化修改链接信息，重写批量写入jdbc:mysql://127.0.0.1:3305?rewriteBatchedStatements=true验证## 时间从50S+到1s+ [import faq - finish save 2 db:1586]总结：这里原来每次保存都要写入数据库，大的数据量下这效率必然是低下的。经过批量保存和重写批量insert的sql，这减少了写入db次数，提高了sql批量写入效率# 之前：6794次insert insert into xx (xx,xx) values (xx,xx) # 之后：14次批量insert INSERT INTO xx (xx,xx) values ('xxx','xxx'),('xxx','xxx')...2、构建上下文：33s+针对构建上下文，我们在再次打桩，查找瓶颈点[build - context start build validate context] [build - convert2Rest - start] [build - convert2Rest - find root & query all projects spend:652] [build - convert2Rest - load children:10113] [build - convert2Rest - covert children:22067] [build - build - context convert2Rest pair spend:22072] [build - build - context spend:32382]这里大概的逻辑：查询顶级节点、查询当前项目下所有的节点信息还原关联关系从db对象转换成Rest对象筛选一些节点优化的点多次查询当前项目的所有节点-这数据量大，多次重复操作这在只查询一次，其他方法用到所有节点时通过参数传入还原关联关系这用了递归，实现简单数据量大时效率不如 while /for 循环 -- 改为while + for从db对象转换成Rest对象-在循环中查询整个项目数据减少重复查询db循环中使用stream，筛选/转换少量 - 这种场景下效率不如for筛选一些节点减少db查询递归转while /for验证[build - context start build validate context] [build - convert2Rest - start] [build - convert2Rest - find root & query all projects spend:3] [build - convert2Rest - load children:15] [build - convert2Rest - covert children:124] [build - context convert2Rest pair spend:128] [build - context spend:214]总结：减少db查询合理使用stream递归转while / for实现这优化还是很明显，从32S+到0.2S。3、检验所有节点：326s+这是最耗时的地方，先分析下代码：for循环所有节点，执行校验逻辑 - 验证意图（Intent，用户节点的输入需要唯一）不能重复如果有错误，将错误信息更新并同步数据库finish validate with write 2 db count:6795, spend:361809这提示总共写了6795次db，这显然不合理，因为在测试的时候是全新的数据，不会重复。 int writeDbCount = 0; boolean hasError = false; for (ComponentValidator validator : validators) { StatusCodes statusCodes = validator.validate(context); if (statusCodes != null) { hasError = true; } // 节点错误 boolean updated = updateComponentError(validator); if (updated) { ComponentDao.update(conn, validator.getComponent()); writeDbCount++; } }这里结合上面的套路优化stream修复更新db的bugfinish validate with write 2 db count:0, spend:108427 这总时间从361s+到108s+,提升非常大。剩余的代码检查没有操作db，但仍然有如此高的耗时，这很不合理。那么108427 / 6794 约等于16毫秒，校验一个节点需要这么久，显然不正常。我们再次使用Stopwatch打点状，输出下校验的耗时 for (ComponentValidator validator : validators) { Stopwatch sw = Stopwatch.createStarted(); StatusCodes statusCodes = validator.validate(context); if (statusCodes != null) { hasError = true; } long usedInMillis = sw.elapsed(TimeUnit.MILLISECONDS); if (usedInMillis > 20) { LOG.debug("[validate usedInMillis:{}]", usedInMillis) } .... }... [validate component usedInMillis:199] ...这就发现了问题，部分节点耗时异常的高。排查后发现这些节点的intent数量很大，intent越大越耗时。 // 经过排查发现，在校验User.Intent时会判断是否与数据库已存在的意图重复。 // 那么这里有很大的计算量: // 这样判断扩展问重复从 example.size * db.example.size private boolean intentRepeated() { List<String> exampleTexts = getIntents(); //user的example不能重复，如果是引用的摸板则允许重复 List<User> users = componets.stream() // filter .collect(Collectors.toList()); for (User item : users) { List<String> examples =item.getIntents(); for (String text : examples) { if (exampleTexts.contains(text)) { return true; } } } return false; } 我们来看看这里的问题：循环中有stream，这部分在筛选db中的User这里所有节点都是一样的，不用每次都计算，将结果在context中缓存起来，只计算一次，后续直接用for * for计算量大修改实现1、User从context获取，不在这进行计算 - 计算次数减少了6000次+ List<User> users= null; public List<User> getUsers() { if (users == null) { synchronized (this) { if (users == null) { users = componets.stream() // ... filter .collect(Collectors.toList()); } } } return users; }2、减少计算次数这里需要一定的经验积累，比如在第二个for循环中，我们改成map.containsKey去判断，就去掉了一个for循环。Tip：在最好的情况下，map.containsKey(key)的时间复杂度时o(1)Context中构建db中存在的Intents,只在第一次用到时计算，其他后续节点直接使用即可； private boolean intentRepeated() { List<String> exampleTexts = getIntents(); Map<String> exitsIntents = context.intentsMap(); for (String text : examples) { if (exitsIntents.containsKey(text)) { return true; } } return false; }这样以来，我们的计算量变：从 example.size db.example.size 变成 example.size 1 次；验证：[start validate with length:6795] [finish validate with write 2 db count:0, spend:227]总结：合理使用stream缓存结果，减少计算次数减少计算次数结论至此，完成了本次优化，同样数据导入时间控制在了5s左右。减少db读写次数、批量操作；合理使用stream，在循环且单个集合数量较小的情况下，使用循环；合理使用递归修复Bug合适使用Context，将结果缓存，减少重复计算减少计算量经过此次调整，其中重要的是如何发现问题，解决方案可以有很多。同时要知道调整后，多少是符合预期的，不能盯着小的优化点去弄，这种优化空间太低了，反而会花很多时间效果也不会好。如何找到瓶颈点预期优化效果在开发过程中，我们还是以满足实现为主，在时间可能的情况下合理保证代码效率，可以在后续碰到瓶颈时单独处理即可。打个广告：如果你是RASA开发者： PromptAI支持通过脑图的方式编辑对话流程，编辑完成后可一键下载Rasa Project文件，可在原生Rasa直接训练[部分高级功能需要配合应用使用] - 功能: FAQ、多轮、变量提取（单个/多个）、form、webhook及自定义Action等高级功能； - 编辑: 支持拖拽、回收站、收藏、撤销、重做及复制等便捷功能； - 调试: 编辑完成后可一键调试、发布，最大程度减少开发时间。我们提供高性能GPU（3090），让你的效率起飞！ - 其他: 支持私有部署，如果你有需求可联系我们！如果你是普通用户，不会开发，但是想用AI对话： PromptAI对普通用户/企业极度友好，无需了解编程/AI的相关知识，也可快速上手，比Excel使用还简单，没有复杂繁琐的操作及公式！几分钟就能拥有自己的ChatBot！免费使用：https://www.promptai.cn 注册过程：免费试用 -> 填写试用信息 -> 审核通过 -> 即可上手！
- 2023年03月04日
- 167 阅读
- 0 评论
- 0 点赞
2022-05-13
记一次Mysql优化优化一：批量插入（实际业务中，耗时从400s左右降到7s左右）伪代码 @Cleanup Connection conn = DaoUtils.getConnection(false); String sql = "INSERT INTO xx (column1, column2) VALUES (?, ?)"; PreparedStatement ps = conn.prepareStatement(sql); for (int i = 0; i < 1000000; i++) { ps.setString(1,"column1"); ps.setString(2,"column2"); ps.addBatch(); // 每1000条记录插入一次 if (i % 1000 == 0) { ps.executeBatch(); conn.commit(); ps.clearBatch(); } } // 剩余数量不足1000 ps.executeBatch(); conn.commit(); ps.clearBatch();默认情况下，有多少条数据，就会执行多少条sql。可以在mysql链接加上rewriteBatchedStatements=true，将多条插入sql重写为1条，可以有效减少sql执行次数。按照上面的代码，执行sql条数为执行executeBatch的次数。jdbc:mysql://127.0.0.1:3306?useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true 官方文档：rewriteBatchedStatements Should the driver use multi-queries (regardless of the setting of "allowMultiQueries") as well as rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() is called? Notice that this has the potential for SQL injection if using plain java.sql.Statements and your code doesn't sanitize input correctly. Notice that for prepared statements, server-side prepared statements can not currently take advantage of this rewrite option, and that if you don't specify stream lengths when using PreparedStatement.set*Stream(), the driver won't be able to determine the optimum number of parameters per batch and you might receive an error from the driver that the resultant packet is too large. Statement.getGeneratedKeys() for these rewritten statements only works when the entire batch includes INSERT statements. Please be aware using rewriteBatchedStatements=true with INSERT .. ON DUPLICATE KEY UPDATE that for rewritten statement server returns only one value as sum of all affected (or found) rows in batch and it isn't possible to map it correctly to initial statements; in this case driver returns 0 as a result of each batch statement if total count was 0, and the Statement.SUCCESS_NO_INFO as a result of each batch statement if total count was > 0. Default Value false Since Version 3.1.13 https://dev.mysql.com/doc/connectors/en/connector-j-connp-props-performance-extensions.html优化二：添加索引（分析sql，扫描数量、索引命中情况）表结构mysql> desc detection_plains; +-------------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+----------------+ | id | bigint(20) | NO | PRI | NULL | auto_increment | | .. | ....... | NO | | NULL | | | peopleId | bigint(20) | NO | | NULL | | | plainType | int(2) | NO | | 0 | | +-------------------+--------------+------+-----+---------+----------------+ xx rows in set (0.02 sec)已有索引 mysql> show index from t_name; +------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | t_name | 0 | PRIMARY | 1 | id | A | 169267 | NULL | NULL | | BTREE | | | +------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 1 row in set (0.00 sec)这张表目前仅有主键查询sql mysql> select no from t_name where peopleId=9379 and plainType=1 order by schedule DESC limit 1; +----+ | no | +----+ | 10 | +----+ 1 row in set (2.23 sec)这里花了2s+,不太符合预期explain:查询执行sql情况，扫了多少条数据、有没有使用索引mysql> explain select no from t_name where peopleId=9379 and plainType=1 order by schedule DESC limit 1; +----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+ | 1 | SIMPLE | detection_plains | NULL | ALL | NULL | NULL | NULL | NULL | 169301 | 1.00 | Using where; Using filesort | +----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+ 1 row in set, 1 warning (0.00 sec) 目前表数据量 mysql> select count(id) from t_name; +-----------+ | count(id) | +-----------+ | 175575 | +-----------+ 1 row in set (5.45 sec) 通过explain结果看，目前查询sql并未使用索引，几乎扫了整张表的数据。我们在peopleId,plainType列添加索引mysql> alter table t_name add index peopleIdPlainType(`peopleId`,`plainType`); Query OK, 0 rows affected (8.29 sec) Records: 0 Duplicates: 0 Warnings: 0 再次explain可知，扫描数据下降到10条，命中了创建的peopleIdPlainType的索引。 mysql> explain select no from t_name where peopleId=9379 and plainType=1 order by schedule DESC limit 1; +----+-------------+------------------+------------+------+-------------------+-------------------+---------+-------------+------+----------+---------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+------------+------+-------------------+-------------------+---------+-------------+------+----------+---------------------------------------+ | 1 | SIMPLE | detection_plains | NULL | ref | peopleIdPlainType | peopleIdPlainType | 12 | const,const | 10 | 100.00 | Using index condition; Using filesort | +----+-------------+------------------+------------+------+-------------------+-------------------+---------+-------------+------+----------+---------------------------------------+ 1 row in set, 1 warning (0.01 sec) mysql> select no from t_name where peopleId=9379 and plainType=1 order by schedule DESC limit 1; +----+ | no | +----+ | 10 | +----+ 1 row in set (0.00 sec) mysql> show profiles; +----------+------------+------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+------------------------------------------------------------------------------------------------------------------+ | 3 | 0.00723975 | select no from t_name where peopleId=9379 and plainType=1 order by schedule DESC limit 1 | +----------+------------+------------------------------------------------------------------------------------------------------------------+ 3 rows in set, 1 warning (0.00 sec) 查询时间从2s+降到了7ms左右其他不使用“大事务"：如果执行操作耗时高且开启了事务，容易导致Lock wait timeout exceeded; try restarting transaction 异常使用SELECT * FROM information_schema.INNODB_TRX;可查询当前事务数量、执行状态和执行的sql等信息尽量不并行操作db，会增加单个事务耗时
- 2022年05月13日
- 205 阅读
- 0 评论
- 0 点赞

CC

19 文章数

1 评论量

标签云

舔狗日记