Bug3412人工智能停止运行分析报告(初版)

  • 简述

  1. 人工智能的应用是系统级应用,adj值为-15, 在内存不足的情况下,系统也会尽可能不杀死我们的进程,之前我们着手于进程保护的这一方法,后来经过查看aji为-15后,我认为我们的进程退出原因主要是自身程序出错,其调用nlu与msc两个动态库,出错退出是正常的事情.
  2. 无限crash原因: 当dialog应用挂掉后,系统有一个监视器,检测到其挂了会立即重新启动. 但是我们的crash发生在启动过程中,因此无限循环下去.
  • 问题分析

1.一开始我认为是科大讯飞原因造成的. 其destory方法的确会造成crash, 但这就是造成我们第一次crash的主要原因,在此之后,我们自身的bug引起了无限crash.

2.log日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
E/AndroidRuntime( 6192): FATAL EXCEPTION: main
E/AndroidRuntime( 6192): Process: com.avatar.dialog, PID: 6192
E/AndroidRuntime( 6192): java.lang.RuntimeException: Unable to create service com.avatar.speech.AvatarSpeechService: java.lang.SecurityException: Unable to find app for caller android.app.ApplicationThreadProxy@41992d50 (pid=6192) when getting content provider settings
E/AndroidRuntime( 6192): at android.app.ActivityThread.handleCreateService(ActivityThread.java:2633)
E/AndroidRuntime( 6192): at android.app.ActivityThread.access$1900(ActivityThread.java:138)
E/AndroidRuntime( 6192): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1338)
E/AndroidRuntime( 6192): at android.os.Handler.dispatchMessage(Handler.java:102)
E/AndroidRuntime( 6192): at android.os.Looper.loop(Looper.java:136)
E/AndroidRuntime( 6192): at android.app.ActivityThread.main(ActivityThread.java:5101)
E/AndroidRuntime( 6192): at java.lang.reflect.Method.invokeNative(Native Method)
E/AndroidRuntime( 6192): at java.lang.reflect.Method.invoke(Method.java:515)
E/AndroidRuntime( 6192): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:811)
E/AndroidRuntime( 6192): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:627)
E/AndroidRuntime( 6192): at dalvik.system.NativeStart.main(Native Method)
E/AndroidRuntime( 6192): Caused by: java.lang.SecurityException: Unable to find app for caller android.app.ApplicationThreadProxy@41992d50 (pid=6192) when getting content provider settings
E/AndroidRuntime( 6192): at android.os.Parcel.readException(Parcel.java:1465)
E/AndroidRuntime( 6192): at android.os.Parcel.readException(Parcel.java:1419)
E/AndroidRuntime( 6192): at android.app.ActivityManagerProxy.getContentProvider(ActivityManagerNative.java:2848)
E/AndroidRuntime( 6192): at android.app.ActivityThread.acquireProvider(ActivityThread.java:4499)
E/AndroidRuntime( 6192): at android.app.ContextImpl$ApplicationContentResolver.acquireProvider(ContextImpl.java:2210)
E/AndroidRuntime( 6192): at android.content.ContentResolver.acquireProvider(ContentResolver.java:1409)
E/AndroidRuntime( 6192): at android.provider.Settings$NameValueCache.lazyGetProvider(Settings.java:890)
E/AndroidRuntime( 6192): at android.provider.Settings$NameValueCache.getStringForUser(Settings.java:937)
E/AndroidRuntime( 6192): at android.provider.Settings$System.getStringForUser(Settings.java:1142)
E/AndroidRuntime( 6192): at android.provider.Settings$System.getString(Settings.java:1126)
E/AndroidRuntime( 6192): at com.avatar.speech.setting.RobotSettings.readRobotNickName(RobotSettings.java:236)
E/AndroidRuntime( 6192): at com.avatar.speech.setting.RobotSettings.init(RobotSettings.java:106)
E/AndroidRuntime( 6192): at com.avatar.speech.setting.RobotSettings.<init>(RobotSettings.java:45)
E/AndroidRuntime( 6192): at com.avatar.speech.AvatarSpeechService.onCreateSpeech(AvatarSpeechService.java:109)
E/AndroidRuntime( 6192): at android.robot.speech.SpeechService.onCreate(SpeechService.java:864)
E/AndroidRuntime( 6192): at com.avatar.speech.AvatarSpeechService.onCreate(AvatarSpeechService.java:53)
E/AndroidRuntime( 6192): at android.app.ActivityThread.handleCreateService(ActivityThread.java:2623)
E/AndroidRuntime( 6192): ... 10 more

所有的crash都是这个SecurityException, 便去寻找引起SecurityException的原因.

于Android核心类ActivityManagerService中找到

原因在于 在获取app记录的时候没有获取到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
private final ContentProviderHolder getContentProviderImpl(IApplicationThread caller,
String name, IBinder token, boolean stable, int userId) {
ContentProviderRecord cpr;
ContentProviderConnection conn = null;
ProviderInfo cpi = null;
synchronized(this) {
ProcessRecord r = null;
if (caller != null) {
r = getRecordForAppLocked(caller); //在获取app记录的时候没有获取到
if (r == null) {
throw new SecurityException(
"Unable to find app for caller " + caller
+ " (pid=" + Binder.getCallingPid()
+ ") when getting content provider " + name);
}
}
private final int getLRURecordIndexForAppLocked(IApplicationThread thread) {
IBinder threadBinder = thread.asBinder();
// Find the application record.
for (int i=mLruProcesses.size()-1; i>=0; i--) {
ProcessRecord rec = mLruProcesses.get(i);
if (rec.thread != null && rec.thread.asBinder() == threadBinder) {
return i;
}
}
return -1;
}

而记录是在mLruProcesses中, 通过搜索知道 mLruProcesses 是android用来存放最近每个打开的应用的信息的cache.

即使被关闭了,也会存在,以此保证下次启动更加迅速.

此时,困扰很久的问题在于,为什么mLruProcesses会没有我们的记录.


此时,我认为是我们的应用创建出现了问题, 后来在Application的onCreate中与Service的OnCreate中都加了log,发现一切正常.

1
2
3
4
5
6
D/DialogApplication( 6192): onCreate start
D/DialogApplication( 6192): onCreate end
D/FakeAIService( 6192): onBind
I/AISupport( 939): mConntction
W/SpeechManager( 939): getAsrEnable failed: not connected to speech service
D/AvatarSpeechService( 6192): onCreate start

这个时候没有目标了,一切都正常, 在于为什么没有我们的应用信息呢.前几天我便打开ActivityManagerService的debug,开始看应用挂掉后重启的各项操作.

同时根据AMS的代码发现,

在下面方法中会创建app, 获取ProcessRecord app; 初始化app的记录

4891行 private final boolean attachApplicationLocked(IApplicationThread thread,

创建后通过调用下面的方法来添加进lruProcess中

updateLruProcessLocked(app, false, null);

很显然,是在我们添加(刷新)的过程中出现了问题, 我在该方法中加了很多打印,最后发现,在一开始进入的时候

Android系统会进行判断,若你的进程有activitys,且activity没有变化活动,或者说你有ClientActivities,你都不会被添加进进程记录表中.

1
2
3
4
5
6
7
8
9
10
11
12
final void updateLruProcessLocked(ProcessRecord app, boolean activityChange,
ProcessRecord client) {
Slog.i(TAG, "################updateLruProcessLocked: begin");
final boolean hasActivity = app.activities.size() > 0 || app.hasClientActivities;
final boolean hasService = false; // not impl yet. app.services.size() > 0;
if (!activityChange && hasActivity) {
// The process has activties, so we are only going to allow activity-based
// adjustments move it. It should be kept in the front of the list with other
// processes that have activities, and we don't want those to change their
// order except due to activity operations.
return;
}

经过log打印, 发现我们的进程在重新创建后ClientActivities 为true, 竟然没有被清空.

1
2
I/ActivityManager( 519): ####new App:ProcessRecord{41b985d0 6192:com.avatar.dialog/1000} Activity [] size 0
I/ActivityManager( 519): ####new App:ProcessRecord{41b985d0 6192:com.avatar.dialog/1000} App hasClientActivities: true

问题来了,为什么没有被清空???

到Android4.4的源码树下搜索对app.hasClientActivities的操作,发现只有置true, 没有被置false的情况.

那么为什么没有被置false. 在去Android5.0的源码中搜索,发现竟然有. 同一个方法中, 5.0的系统比4.4特地多添加了一行把该标志位置false的代码

这个方法就是android用来处理应用被干掉后的进程信息清理的,可见,4.4中没有对这个标志位清空.那就标志着

若你的应用死掉后再起起来,但同时还没有activity的情况下, 你将无法操作content provider

当然这个问题应该很少人会遇到.一个没有activity的应用.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/**
* Main code for cleaning up a process when it has gone away. This is
* called both as a result of the process dying, or directly when stopping
* a process when running in single process mode.
*/
private final void cleanUpApplicationRecordLocked(ProcessRecord app,
boolean restarting, boolean allowRestart, int index) {
if (index >= 0) {
removeLruProcessLocked(app);
}
mProcessesToGc.remove(app);
mPendingPssProcesses.remove(app);
// Dismiss any open dialogs.
if (app.crashDialog != null && !app.forceCrashReport) {
app.crashDialog.dismiss();
app.crashDialog = null;
}
if (app.anrDialog != null) {
app.anrDialog.dismiss();
app.anrDialog = null;
}
if (app.waitDialog != null) {
app.waitDialog.dismiss();
app.waitDialog = null;
}
app.crashing = false;
app.notResponding = false;
app.resetPackageList(mProcessStats);
app.unlinkDeathRecipient();
app.makeInactive(mProcessStats);
app.forcingToForeground = null;
app.foregroundServices = false;
app.foregroundActivities = false;
app.hasShownUi = false;
app.hasAboveClient = false;
/**
* 当你的应用这个标志不至false的情况下,若你的应用死掉后再起起来,但同时还没有activity的情况下,
* 你将无法操作content provider
*/
Slog.v(TAG, "执行死亡后 app 记录清理 + hasClientActivities " + app.hasClientActivities);
app.hasClientActivities = false;
mServices.killServicesLocked(app, allowRestart);