Shaokang's Blog

Note: This is a group project for a course. I am responsible for coding the actual model and optimizing its structure to make the model run much faster (over 200 times). I am also responsible for scraping data from the official site and setting up github action to regularly update the data source using puppeteer, which could simulate real user behavior and can never been blocked. You can find the relevant repository at ShaokangJiang/CSE-203B-crapping (github.com). My code can be seen at the end of this post.

Nowadays, new computer components are introduced into the retail market annually. Customers may find that the abundant component choices make it difficult to choose the optimum combination of hardware components while building a new computer. This project aims to take in the main components of a computer hardware build and provide several of the highest-performing options that can meet a consumer’s budget.

Original PDF:

Extracted text:

Team Assignments

Team members

Maxim Edelson Shaokang Jiang Junhua Yang Jinghong Luo

Data Curation

Maxim is scrapping data using python and do completeness, correctness
checking of all data. Shaokang was scrapping data using Node.js and
autoscrapping feature. Both of them are discussing on how to formulate a
better, more meaningful data.

Linear Programming Formulation and Analysis

Maxim, Junhua and Jinghong are using python to formulate and analysis
the problem. All of us are working to test and revise the python
solution code and make it more feasible in running. Shaokang is using
GAMS to formulate and analysis the problem.

Paper Writing

Maxim did majority of introduction, problem setup and primal
formulation. Shaokang did majority of intended approaches, result and
conclusion. Jinghong did intended contributions and contribution vs
previous part. All people are working on the primal, dual, KKT
formulation and paper reviewing, editing.

Introduction

Motivation

Nowadays, new computer components are introduced into the retail market
annually. Customers may find that the abundant component choices make it
difficult to choose the optimum combination of hardware components while
building a new computer. This project aims to take in the main
components of a computer hardware build and provide several of the
highest-performing options that can meet a consumer’s budget.

Previous Works

Linear programming is a modern, powerful mathematical technique that is
commonly used to solve optimization problems with constraints
[@bradley1971transformation],[@dantzig2002linear],[@sultan2014linear].
In particular, there are already many research publications that focus
on using linear programming to solve linearly constrained optimization
problems. By leveraging linear programming techniques, researchers and
practitioners can efficiently solve complex problems with multiple
constraints and objectives, enabling them to make more informed
decisions.

One example of using linear programming is the “diet problem”, which
seeks to find the optimal combination of foods to meet nutritional
requirements at the lowest cost. This problem can be formulated as a
linear programming problem, where the objective is to minimize the total
cost of the food, subject to constraints on the required nutrients.
Authors from paper [@paper1] aim to solve the diet problem of McDonald’s
sets menu in the hope of finding the optimal cost while satisfying the
daily calories and nutritional requirements for a person.

Intended Contributions

In this paper, we have explored state-of-art methods of applying linear
programming with constraints to help us formulate our problem and
analyze it using primal, dual, and KKT conditions. We came up with our
hardware component performance measurement formula in order to formulate
the optimization problem and set appropriate constraints based on
real-world scenarios. The novelty and main attraction of this paper is
applying integer programming techniques to solve computer building
problem, which is a relatively new and under-explored area in the
computer industry. We also compare the efficiency between commercial and
open source solvers.

Organization of the Paper

The rest of the paper is divided into different sections illustrating
our work. First, we go in-depth to discuss the problem statement in
section 3, demonstrating the reasoning behind our objective function and
constraints, as well as computing the dual formulation and KKT
conditions of the primal problem. In section 4, we talk about the method
we use to build the linear programming problem. Then we present the
results and conclude our work in section 5.

Contribution vs Previous Works

Even though the application of linear programming have been widely used
in various fields to solve complex optimization problems, its
application in the computer industry has yet received attentions on.
With this paper, we hope to make significant impact on the computer
community by offering this new way of identifying the computer build to
customers who are looking for cost-effective builds with optimal
performance. We think by exploring into the novel field of computer
building with linear programming makes us unique compared to the other
previous works.

Statement of the Problem

Problem Setup

Before formulating the primal problem, it is necessary to discuss how we
have arrived to the problem itself. In this paper, we utilized data from
the website UserBenchmark.com by web
scraping down component names, benchmarks, prices, samples per
component, and value for GPUs, CPUs, SSDs, HDDs, and RAM (automatically
updating). Using the datasets we collected, we crafted g(s) (equation
(1)), which is used as a part of our objective function (equation (3)).
This objective function allows us estimate the performance of a hardware
based on its price, benchmark, value of the component, and number of
benchmark samples.

g(s)=benchmark * value * samplespriceg(s) = \frac{\text{benchmark * value * samples}}{\text{price}}

Formula (1) computes a ratio of performance to cost, adjusted by the
hardware’s overall value and the sample size used to obtain the
benchmark score. A higher score indicates better performance for the
price. It’s important to note that this formula is just one possible way
to measure hardware performance, and that there are many other factors
that may be relevant depending on the context.

Next, to define our constraints, we look back to our motivation of the
paper, which is to find highest-performing hardware build options that
can meet the budget. To do this, we need to 1) constrain the total
budget of all hardware combinations to be less than the fixed budget we
specified, and 2) limit the count of each hardware component in a
combination to 1 (i.e., one GPU, one CPU, etc…) because we are
looking for the optimum combination of hardware components in terms of
performance per cost.

Primal Formulation

From (1), let g(s)=s1s2s3s4 where sR4\text{From (1), let } g(s) = \frac{s_1*s_2*s_3}{s_4} \text{ where } s \in R^{4}

In equation (1), s1s_1 is the component benchmark (with a range of
0-100), s2s_2 is the number of recorded benchmark samples min-max
normalized between 0-1, s3s_3 describes whether this component is
valuable from the user’s perspective (with a range of 0-100), and s4s_4
is the current component price with no upper bounds.\

Let nNn \in \mathbb{N} be the number of components in each category
(CPU, GPU, SSD, RAM)

max ing(ci)x1i+ing(di)x2i+ing(ei)x3i+ing(fi)x4i\text{max } \sum_{i\in n}g(c_i)*x_{1i} + \sum_{i\in n}g(d_i)*x_{2i} + \sum_{i\in n}g(e_i)*x_{3i} + \sum_{i\in n}g(f_i)*x_{4i}

c,d,e,fSnx4c, d, e, f \in \mathbb{S}^{n\text{x}4}
x1,x2,x3,x4{0,1}nx_1, x_2, x_3, x_4 \in \{0,1\}^{n}

subject to

in(ci,4x1i)+in(di,4x2i)+in(ei,4x3i)+in(fi,4x4i)b, where bR++ is the budget\sum_{i\in n}(c_{i,4}*x_{1i}) + \sum_{i\in n}(d_{i,4}*x_{2i}) + \sum_{i\in n}(e_{i,4}*x_{3i}) + \sum_{i\in n}(f_{i,4}*x_{4i}) \leq b \text{, where } b \in R_{++} \text{ is the budget}

\textbf{1}^Tx_1=1 \text{, } \textbf{1}^Tx_2=1 \text{, } \textbf{1}^Tx_3=1 \text{, } \textbf{1}^Tx_4=1

Dual Formulations

The Lagrangian function is $$\begin{split}
\mathcal{L}(x_1,x_2,x_3,x_4,\lambda,v_1,v_2,v_3,v_4)&=
\sum_{i\in n}g(c_i)*x_{1i} + \sum_{i\in n}g(d_i)*x_{2i} + \sum_{i\in n}g(e_i)*x_{3i} + \sum_{i\in n}g(f_i)*x_{4i} \
&+ \lambda (\sum_{i\in n}(c_{i,4}*x_{1i}) + \sum_{i\in n}(d_{i,4}*x_{2i}) + \sum_{i\in n}(e_{i,4}*x_{3i}) + \sum_{i\in n}(f_{i,4}*x_{4i}) - b) \
&+ v_1(\textbf{1}^T x_{1} -1) + v_2(\textbf{1}^T x_{2} -1) + v_3( \textbf{1}^T x_{3} -1) + v_4(\textbf{1}^T x_{4} -1), \text{ where } \lambda \geq 0
\end{split}$$

To compute
maxx1,x2,x3,x4L(x1,x2,x3,x4,λ,v1,v2,v3,v4)\max_{x_1,x_2,x_3,x_4} \mathcal{L}(x_1,x_2,x_3,x_4,\lambda,v_1,v_2,v_3,v_4),
let us take $$\begin{pmatrix}
\frac{\partial\mathcal{L}}{\partial x_1} \\
\frac{\partial\mathcal{L}}{\partial x_2} \\
\frac{\partial\mathcal{L}}{\partial x_3} \\
\frac{\partial\mathcal{L}}{\partial x_4}
\end{pmatrix} =
\begin{pmatrix}
0 \\ 0 \\ 0 \\ 0
\end{pmatrix} \implies
\begin{cases}
\sum_{i\in n}g(c_i) + \lambda \sum_{i\in n}c_{i,4} + v_{1}\textbf{1}^T = 0\\
\sum_{i\in n}g(d_i) + \lambda \sum_{i\in n}d_{i,4} + v_{2}\textbf{1}^T = 0\\
\sum_{i\in n}g(e_i) + \lambda \sum_{i\in n}e_{i,4} + v_{3}\textbf{1}^T = 0\\
\sum_{i\in n}g(f_i) + \lambda \sum_{i\in n}f_{i,4} + v_{4}\textbf{1}^T = 0
\end{cases}$$
When λ\lambda and v1,v2,v3,v4v_1,v_2,v_3,v_4 satisfy these four equations, we
get\

\begin{split} \text{max}_{x_1,x_2,x_3,x_4} \mathcal{L}(x_1,x_2,x_3,x_4,\lambda,v_1,v_2,v_3,v_4) = &\text{ }(\sum_{i\in n}g(c_i) + \lambda \sum_{i\in n}c_{i,4} + v_{1}\textbf{1}^T)x_1 + (\sum_{i\in n}g(d_i) + \lambda \sum_{i\in n}d_{i,4} + v_{2}\textbf{1}^T)x_2\\ & +(\sum_{i\in n}g(e_i) + \lambda \sum_{i\in n}e_{i,4} + v_{3}\textbf{1}^T)x_3 +(\sum_{i\in n}g(f_i) + \lambda \sum_{i\in n}f_{i,4} + v_{4}\textbf{1}^T)x_4\\ & -\lambda b - v_1 - v_2 - v_3 - v_4\\ & = 0x_1 + 0x_2 + 0x_3 + 0x_4 -\lambda b - v_1 - v_2 - v_3 - v_4\\ & = -\lambda b - v_1 - v_2 - v_3 - v_4 \end{split}

The Dual function is $$\begin{aligned}
\text{min } -\lambda b - v_1 - v_2 - v_3 - v_4\
\text{subject to }
\begin{pmatrix}
\sum_{i\in n}g(c_i) + \lambda \sum_{i\in n}c_{i,4} + v_{1}\textbf{1}^T = 0\
\sum_{i\in n}g(d_i) + \lambda \sum_{i\in n}d_{i,4} + v_{2}\textbf{1}^T = 0\
\sum_{i\in n}g(e_i) + \lambda \sum_{i\in n}e_{i,4} + v_{3}\textbf{1}^T = 0\
\sum_{i\in n}g(f_i) + \lambda \sum_{i\in n}f_{i,4} + v_{4}\textbf{1}^T = 0\
\lambda \geq 0
\end{pmatrix}
\end{aligned}
\text{ where } c, d, e, f \in \mathbb{S}^{n\text{x}4}$$

KKT Conditions

Primal Constraints

\textbf{1}^Tx_1 - 1=0
\textbf{1}^Tx_2 - 1=0
\textbf{1}^Tx_3 - 1=0
\textbf{1}^Tx_4 - 1=0
in(ci,4x1i)+in(di,4x2i)+in(ei,4x3i)+in(fi,4x4i)b0\sum_{i\in n}(c_{i,4}*x_{1i}) + \sum_{i\in n}(d_{i,4}*x_{2i}) + \sum_{i\in n}(e_{i,4}*x_{3i}) + \sum_{i\in n}(f_{i,4}*x_{4i}) - b\leq 0

Dual Constraints

λ0\lambda \geq 0

Complementary Slackness

λ[in(ci,4x1i)+in(di,4x2i)+in(ei,4x3i)+in(fi,4x4i)b]=0\lambda[\sum_{i\in n}(c_{i,4}*x_{1i}) + \sum_{i\in n}(d_{i,4}*x_{2i}) + \sum_{i\in n}(e_{i,4}*x_{3i}) + \sum_{i\in n}(f_{i,4}*x_{4i}) - b]=0

Gradient of Lagrangian With Respect to x Variables

\nabla_{x_1} = \sum_{i\in n}g(c_i)+\lambda\sum_{i\in n}c_{i,4}+v_1\textbf{1}^T=0
\nabla_{x_2} = \sum_{i\in n}g(d_i)+\lambda\sum_{i\in n}d_{i,4}+v_2\textbf{1}^T=0
\nabla_{x_3} = \sum_{i\in n}g(e_i)+\lambda\sum_{i\in n}e_{i,4}+v_3\textbf{1}^T=0
\nabla_{x_4} = \sum_{i\in n}g(f_i)+\lambda\sum_{i\in n}f_{i,4}+v_4\textbf{1}^T=0

Intended Approaches

In the real world case, a user should be able to select the limitation B
value. In our run, we simply set the limitation of B as 2000. And
initially, we set all variables to be binary since one computer can only
have one components in each machine.

Quality of choice

Why you make choice

There are no previous academic works discussing how to optimally select
a computer component combination using linear programming algorithms. As
such, we selected to use the benchmark score, sample number, value
score, and price to craft what we felt was a reasonable objective
function. Our function could reflect the user’s perspective in taking
multiple pieces of data about a hardware component into account when
making a selection.

Why this is good

Under normal conditions, we ran the linear programming models under
strict constraints, equation (4), of equal to 1 and binary variables. To
prove the tightness of relaxation of our solution, we relaxed these
constraints to \leq to 1 and nonnegative variables. After rerunning
our experiment with these relaxed constraints and receiving the same
solution distribution (all solution vectors contained only zeros and
ones, i.e., are binary vectors), we can confidently declare that our
model solution is tight enough in our context.

Since our problem was a linear programming problem, there would always
exist an optimal solution as long as the feasible region exists. Because
we are always trying to maximum the objective function, which guarantee
us to find a valid solution under the given limitation B. We also tested
the optimality based on the KKT conditions described in the previous
section. We also tried several different B setup, we are able to find a
proven optimal solution in all cases. Which proves the optimality of our
model.

Novelty

Difference from previous work

There is no work in the previous discussing how to select a computer
using linear programming.

Consumers in the market have to use some search engine and compare each
product with days of work previously[@james2006buying]. Right now, we
introduced a new way of thinking what to purchase and how to organize
the computer in a more efficient way. User could simply run our script
with the latest data to get the recommendation with seconds of work. In
addition to a traditional scrapping using static web fetching, we also
introduced a way to automatically scrapping data from website by using
github action. The script is set to run once per three days. And a
contemporary way to scrapping website by simulating user behavior to
prevent blocking of IP address by using puppeteer. In this way, user is
always able to use the latest data to get the optimization result. See
here for the auto
scrapping repository.

Our setup of combining the binary vector with the associate price also
improves computational time a lot comparing to some intuitive setup of
using each components price * each choice vector as the function when
calculating the limitation. This reduces the equations required to
handle from (Dimension of cpu * Dimension of gpu * Dimension of hdd *
Dimension of mem) to only one equation. In the actual experiment, this
could reduce computation time from many minutes to 0.016 seconds.

Computational complexity

Since our problem was a linear programming problem, previous
studies[@karmarkar1984new] have shown that any linear programming is
able to run in polynomial-time, so our model should be able to run under
polynomial-time. But, since our model only have one dimension in each
equation, the actual runtime should be close to O(n). Nevertheless, it
is hard to scale the overhead if we have to use mixed integer solver to
solve the problem. Fortunately, our model is always able to be converted
to a problem using linear programming solver[@raidl2008combining], which
guarantees our model to run under O(n). However, it is hard and not
necessary to measure a runtime like this. In the real world usage case,
running a model like this would spent far more time on context switch
and read data from memory instead of doing the actual computation. This
could also provide a possible reason that our real tests always show a
runtime of 0.016 seconds even with more data and more dimensions.

The memory space complexity is also O(n) where n is calculated by sum of
Dimension of cpu and Dimension of gpu and Dimension of hdd and Dimension
of memory.

Conjectured Results, Conclusion, and Future Works

Results

All models are running on an old laptop, which has 2.5 GHz CPU and 8 GB
memory. In our setup of limitation as 2000, with the dataset from
auto-scrapping result from
here. We could
reach the following result:

​ CPU GPU


          Intel Core i7-6700K                       Nvidia GTX 1060-3GB
                                          
                  SSD                                       RAM

Corsair Vengeance LPX DDR4 3000 C15 2x8GB Samsung 970 Evo Plus NVMe PCIe M.2 1TB

The time complexity of using mip solver in GAMS is 0.016 seconds. Time
complexity of using lp solvers on GAMS could reach the same result with
time complexity of 0.01 seconds. For python setup, the average time
consumed for 10 runs is 0.065 seconds.
In a run with a more meaningful result with limitation as 400, we could
reach the following result:

​ CPU GPU


         Intel Core i5-10600K                       Nvidia GTX 1060-3GB
                                          
                  SSD                                       RAM

Corsair Vengeance LPX DDR4 3000 C15 2x8GB Samsung 970 Evo Plus NVMe PCIe M.2 1TB

Time complexity stays the same as previous for GAMS. For python setup,
the average time consumed for 10 runs is 0.05 seconds.

By using our way, user is able to get the recommendation result in a
fraction of a second instead of manually comparing between various
components. We also observed the commercial solvers generally have a
better performance than open source solvers on python.

Conclusion

In this work, we developed a novel and efficient way in finding the best
possible computer components pairs for a user to select under a certain
budget. By introducing the benchmark, popularity, valuable and price, we
formulate the problem as a convex optimization problem which maximizes a
customized objective function with the consideration of all four
factors. Through experiments under different settings, we show that the
proposed convex optimization algorithm could run fast and efficiently
generate a good result. Comparing to the previous situation where user
have to spent tons of hours in selecting and comparing components when
composing a new computer, our way is much more efficient and our model
is suitable to run on any computer.

Future Works

Based on our current solution, possible future works may search for more
meaningful objective functions. This would require additional real world
analysis and people have to conduct some user studies to learn which
factors among price, benchmark, popularity, and value are most
important.

As part of improving this work, we would like to add more features for
improving user interaction experience. Considering the size of this
problem, another possible route is to build a static website by using
jsLPSolver and host it on GitHub. This would be a low cost way to make
general public get access to our study result and benefit a wider
audience.

As our current system automatically updates the data every three days,
we utilize the most up-to-date data when searching for the optimal
combination. That being said, using a more meaningful combination of
data from all up-to-date data may be in the user’s best interest as it
potentially allows for better and more meaningful combination results.

References

[1] Gordon H Bradley. “Transformation of integer programs to knapsack problems”. In: Discrete mathematics 1.1 (1971), pp. 29–45.
[2] George B Dantzig. “Linear programming”. In: Operations research 50.1 (2002), pp. 42–47.
[3] Richard James. “Buying a computer?” In: ITNOW 48.3 (2006), pp. 24–24.
[4] Narendra Karmarkar. “A new polynomial-time algorithm for linear programming”. In: Proceedings of the sixteenth annual ACM symposium on Theory of computing. 1984, pp. 302–311.
[5] Nurul Farihan Mohamed et al. “An Integer Linear Programming Model For A Diet Problem Of Medonald’s Sets Menu In Malaysia”. In: 2021.
[6] Günther R Raidl and Jakob Puchinger. “Combining (integer) linear programming techniques and metaheuristics for combinatorial optimization”. In: Hybrid metaheuristics: An emerging approach to optimization (2008), pp. 31–62.
[7] Alan Sultan. Linear programming: An introduction with applications. Elsevier, 2014.

Original GAMS code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
option limrow=0, limcol=0;

set cpu_model,cpu_para,gpu_model,gpu_para,ssd_model,ssd_para,ram_model,ram_para;

parameter
cpu(cpu_model,cpu_para),
gpu(gpu_model,gpu_para),
ssd(ssd_model,ssd_para),
ram(ram_model,ram_para);


$gdxin data\cpu.gdx
$load cpu_model = Dim1
$load cpu_para = Dim2
$load cpu = d
$gdxin

$gdxin data\gpu.gdx
$load gpu_model = Dim1
$load gpu_para = Dim2
$load gpu = d
$gdxin

$gdxin data\ssd.gdx
$load ssd_model = Dim1
$load ssd_para = Dim2
$load ssd = d
$gdxin

$gdxin data\ram.gdx
$load ram_model = Dim1
$load ram_para = Dim2
$load ram = d
$gdxin

scalar B /400/;
binary variables cpu_i(cpu_model),gpu_i(gpu_model),ssd_i(ssd_model),ram_i(ram_model);
variables obj;

equations max, cpu_to1, gpu_to1, ram_to1, ssd_to1, limit;

max..
obj =e= sum(cpu_model, (cpu_i(cpu_model)*cpu(cpu_model, "bench")*cpu(cpu_model, "sample")*cpu(cpu_model, "valuable"))/cpu(cpu_model, "price"))
+sum(gpu_model, (gpu_i(gpu_model)*gpu(gpu_model, "bench")*gpu(gpu_model, "sample")*gpu(gpu_model, "valuable"))/gpu(gpu_model, "price"))
+sum(ssd_model, (ssd_i(ssd_model)*ssd(ssd_model, "bench")*ssd(ssd_model, "sample")*ssd(ssd_model, "valuable"))/ssd(ssd_model, "price"))
+sum(ram_model, (ram_i(ram_model)*ram(ram_model, "bench")*ram(ram_model, "sample")*ram(ram_model, "valuable"))/ram(ram_model, "price"));

limit..
sum(cpu_model,cpu_i(cpu_model)*cpu(cpu_model, "price")) + sum(gpu_model,gpu_i(gpu_model)*gpu(gpu_model, "price")) + sum(ssd_model,ssd_i(ssd_model)*ssd(ssd_model, "price")) + sum(ram_model,ram_i(ram_model)*ram(ram_model, "price")) =l= B;

cpu_to1..
sum(cpu_model,cpu_i(cpu_model)) =e= 1;

gpu_to1..
sum(gpu_model,gpu_i(gpu_model)) =e= 1;

ram_to1..
sum(ram_model,ram_i(ram_model)) =e= 1;

ssd_to1..
sum(ssd_model,ssd_i(ssd_model)) =e= 1;

model hw1_1 /all/;

solve hw1_1 using mip maximizing obj;
display cpu_i.l, gpu_i.l, ram_i.l, ssd_i.l, obj.l;

Browser scrapping code

A sample usage of puppeteer to scrap some simple information. It can avoid being blocked based on IP frequency.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
const puppeteer = require('puppeteer');
const urlencode = require('urlencode');
const HTMLParser = require('node-html-parser');
const core = require('@actions/core');
const UIDGenerator = require('uid-generator');
const uidgen = new UIDGenerator();
const fs = require('fs');

let browser;
var url = undefined;

function replaceAll(originalString, find, replace) {
return originalString.replace(new RegExp(find, 'g'), replace);
}

async function mainFunction1(name) {
// use try catch without timeout at here
// const browser = await puppeteer.launch({ headless: true , args: [`--no-sandbox`, `--disable-setuid-sandbox`]});\
let screenshot = false;
let hrstart = process.hrtime();
browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
// await page.setDefaultNavigationTimeout(500000);
await page.setViewport({ width: 1920, height: 1080 });
await page.setRequestInterception(true);

page.on('request', (req) => {
if (req.resourceType() === 'image' || req.resourceType() === 'media') {
req.abort();
}
else {
req.continue();
}
});

let hrend = process.hrtime(hrstart);

core.info("Start the first page, browser initialized in " + hrend[0] + "s")

hrstart = process.hrtime();
try {// wait for 60 seconds
await page.goto('https://' + name + '.userbenchmark.com/', { waitUntil: 'networkidle2', timeout: 60000 }); // wait until page load
} catch (e) {
core.error("Wait too long for login page, terminating the program")
await browser.close();
core.setFailed(`Action failed with error ${e}`);
process.exit(1);
}
// await page.waitForTimeout(2000000);
hrend = process.hrtime(hrstart);
core.info("Finish loading the first page, first page loaded in " + hrend[0] + "s")
core.info("Start to get " + name + " info");
let message = await page.evaluate(async () => {
function handleNumber(num) {
if (num.indexOf("k") != -1) {
return (parseFloat(num) * 1000).toFixed(0);
} else if (num.indexOf("M") != -1) {
return (parseFloat(num) * 1000 * 1000).toFixed(0);
} else {
return (parseFloat(num)).toFixed(0);
}
}
let toRe = "";
document.querySelector("[data-mhth='MC_PRICE']").click()
let counter = 0;
while (true) {
counter++;
if (counter >= 60) return "Error";
for (let i = 0; i < 7; i++) {
await new Promise(resolve => setTimeout(resolve, 1000 + Math.random() * 1500));
window.scrollTo(window.scrollX, window.scrollY + 500 + Math.random() * 300);
}
for (let i of document.getElementsByClassName("hovertarget")) {
let lists = i.getElementsByTagName("td");
let tempStr = lists[1].innerText.split("\n");
//name,price,sample,valuable,bench,bench_low,bench_high
if (lists[lists.length - 1].innerText.trim().length == 0)
return toRe;

if (tempStr[1].indexOf("$") == -1) {
toRe += tempStr[1].trim()
} else {
toRe += tempStr[1].substring(0, tempStr[1].indexOf("$")).trim()
}
toRe += "," + lists[lists.length - 1].innerText.split("\n")[0].replace("$", "").trim().replace(",", "") + "," + handleNumber(tempStr[2].replace("Samples", "").trim()) + "," + lists[3].innerText.split("\n")[0].trim();
tempStr = lists[4].innerText.split("\n");
toRe += "," + tempStr[0] + "," + tempStr[1].split("-")[0].trim() + "," + tempStr[1].split("-")[1].trim() + "\n";
}
document.getElementsByClassName("pagination pagination-lg")[0].lastElementChild.firstChild.click()
}
});
if (message.localeCompare("Error") == 0) throw new Error("Error happened");

await browser.close();
return message;
}

async function main() {
try {
let hrstart = process.hrtime();
let requests = ["cpu", "gpu", "ssd", "hdd", "ram"];
for (let i of requests) {
let mainMessage;
let count = 1;
try {
mainMessage = await mainFunction1(i);
} catch (e) {
core.error("Error happened: " + e + ", try again");
count = count + 1;
await browser.close();
try {
mainMessage = await mainFunction1(i);
} catch (e) {
core.error("Error happened: " + e + ", try again");
count = count + 1;
await browser.close();
mainMessage = await mainFunction1(i);
}
}
let hrend = process.hrtime(hrstart);
core.info("Done " + i + " in " + hrend)
// console.log(mainMessage);
if (!fs.existsSync('./data')) {
await fs.mkdirSync('./data');
}
await fs.writeFile("./data/" + i + ".csv", "name,price,sample,valuable,bench,bench_low,bench_high\n" + mainMessage, function (err) {
if (err) {
return core.info(err);
}
core.info(i + ".csv was saved!");
});
}
} catch (e) {
await browser.close();
core.setFailed(`Action failed with error ${e}`);
process.exit(1);
}
}

main();

 Comments