Statistics toolbox презентация

Decision Tree functions

Слайд 1Statistics toolbox


Слайд 2Decision Tree functions


Слайд 3Функция ‘treefit’ - fit a tree-based model for classification or regression.

Syntax: t = treefit(X,y)

Пример:

load fisheriris;
t = treefit(meas,species);
treedisp(t,'names',{'SL' 'SW' 'PL' 'PW'});


Слайд 4Cluster analysis functions


Слайд 5Функция kmeans
IDX = kmeans(X,k)
[IDX,C] = kmeans(X,k)
[IDX,C,sumd] = kmeans(X,k)
[IDX,C,sumd,D] = kmeans(X,k)
[...] =

kmeans(...,'param1',val1,'param2',val2,...)

IDX = kmeans(X, k) partitions the points in the n-by-p data matrix X into k clusters. This iterative partitioning minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. Rows of X correspond to points, columns correspond to variables. By default, kmeans uses squared Euclidean distances.
IDX - n-by-1 vector containing the cluster indices of each point.
C - k-by-p matrix cluster centroid locations.
sumd - 1-by-k vector within-cluster sums of point-to-centroid distances.
D - n-by-k matrix of distances from each point to every centroid.

Слайд 6Параметр ‘distance’
'sqEuclidean‘ - Squared Euclidean distance (default).
'cityblock‘ - Sum of

absolute differences, i.e., L1.
'cosine‘ - One minus the cosine of the included angle between points (treated as vectors).
'correlation‘ - One minus the sample correlation between points (treated as sequences of values).
'Hamming‘ - Percentage of bits that differ (only suitable for binary data).

Слайд 7Параметр ‘start’
Method used to choose the initial cluster centroid positions, sometimes

known as "seeds". Valid starting values are:
'sample‘ - Select k observations from X at random (default).
'uniform‘ - Select k points uniformly at random from the range of X. Not valid with Hamming distance.
'cluster‘ - Perform a preliminary clustering phase on a random 10% subsample of X. This preliminary phase is itself initialized using 'sample'.
‘Matrix’ - k-by-p matrix of centroid starting locations. In this case, you can pass in [] for k, and kmeans infers k from the first dimension of the matrix. You can also supply a 3-dimensional array, implying a value for the 'replicates' parameter from the array's third dimension.

Слайд 8Classification
load fisheriris;
gscatter(meas(:,1), meas(:,2), species,'rgb','osd');
xlabel('Sepal length');
ylabel('Sepal width');


Слайд 9Linear and quadratic discriminant analysis
linclass = classify(meas(:,1:2), meas(:,1:2),species);
bad = ~strcmp(linclass,species);
numobs =

size(meas,1);
pbad = sum(bad) / numobs;

hold on;
plot(meas(bad,1), meas(bad,2), 'kx');
hold off;

Слайд 10Visualization regioning the plane
[x,y] = meshgrid(4:.1:8,2:.1:4.5);
x = x(:);
y = y(:);
j =

classify([x y],meas(:,1:2), species);
gscatter(x,y,j,'grb','sod')

Слайд 11Decision trees
tree = treefit(meas(:,1:2), species);
[dtnum,dtnode,dtclass] = treeval(tree, meas(:,1:2));
bad = ~strcmp(dtclass,species);
sum(bad) /

numobs

Слайд 12Iris classification tree


Слайд 13Тестирование качества классификации
resubcost = treetest(tree,'resub');
[cost,secost,ntermnodes,bestlevel] = treetest(tree,'cross',meas(:,1:2),species);
plot(ntermnodes,cost,'b-', ntermnodes,resubcost,'r--')
figure(gcf);
xlabel('Number of terminal nodes');
ylabel('Cost

(misclassification error)')
legend('Cross-validation','Resubstitution')

Слайд 14Выбор уровня
[mincost,minloc] = min(cost);
cutoff = mincost + secost(minloc);
hold on
plot([0 20],

[cutoff cutoff], 'k:')
plot(ntermnodes(bestlevel+1), cost(bestlevel+1), 'mo')
legend('Cross-validation', 'Resubstitution', 'Min + 1 std. err.','Best choice')
hold off

Слайд 15Оптимальное дерево классификации
prunedtree = treeprune(tree,bestlevel);
treedisp(prunedtree)

cost(bestlevel+1)

>> ans = 0.22


Слайд 16Дендрограмма классификации ирисов
eucD = pdist(meas,'euclidean');
clustTreeEuc = linkage(eucD,'average');
[h,nodes] = dendrogram(clustTreeEuc,0);
set(gca,'TickDir','out', 'TickLength', [.002 0],'XTickLabel',[]);


Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика