Conventional resampling method
Jack-knife
Bootstrap
Suppose is directly calculate on the lattice, is what we are intrested in
usually is a nonlinear function
Usually,
Error estimation of
Original Datasets:
time slice:
th Jack-knife sample
Estimator
, : covaiance matrix of Jack-knife samples
Vectroize your code instead of using loop
#!/usr/bin/env python3
# encoding: utf-8
import numpy as np
from os import listdir
from iog_reader import iog_read
from re import match
import cProfile
from functools import reduce
iog_path = "./Data/"
T = 72
files = listdir(iog_path)
iogs = [file for file in files if match(r".*2pt.*.dat.iog",file)!=None]
Ncnfg = len(iogs)
c2pt = np.zeros((Ncnfg,T))
for indx in range(0,Ncnfg):
dat=iog_read(iog_path+iogs[indx],["cnfg","hdrn","t"])
c2pt[indx] = dat.loc[(dat["hdrn"]=="0")]["Re"].to_numpy()
c2pt_jcknf_1 = np.zeros((Ncnfg,T))
c2pt_jcknf_2 = np.zeros((Ncnfg,T))
c2pt_jcknf_3 = np.zeros((Ncnfg,T))
def c2pt_jcknf_loop_1():
for i in range(0,Ncnfg):
for j in range(0,T):
c2pt_jcknf_1[i,j] = np.mean(np.delete(c2pt,[i],axis=0)[:,j])
def c2pt_jcknf_loop_2():
for i in range(0,Ncnfg):
c2pt_jcknf_2[i] = np.mean(np.delete(c2pt,[i],axis=0),axis=0)
def c2pt_jcknf():
c2pt_sum = np.sum(c2pt,axis=0)
global c2pt_jcknf_3
c2pt_jcknf_3 = ((c2pt_sum) - c2pt)/(Ncnfg-1)
cProfile.run('c2pt_jcknf_loop_1()')
print(" ------ \n")
cProfile.run('c2pt_jcknf_loop_2()')
print(" ------ \n")
cProfile.run('c2pt_jcknf()')
print(" ------ \n")
print(reduce(lambda x,y: x&y,np.fabs((c2pt_jcknf_1-c2pt_jcknf_2)/c2pt_jcknf_2)<1e-8))
print(reduce(lambda x,y: x&y,np.fabs((c2pt_jcknf_2-c2pt_jcknf_3)/c2pt_jcknf_2)<1e-8))
output
115780 function calls in 0.082 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
4824 0.003 0.000 0.039 0.000 <__array_function__ internals>:177(delete)
4824 0.002 0.000 0.036 0.000 <__array_function__ internals>:177(mean)
......
------
1612 function calls in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
67 0.000 0.000 0.001 0.000 <__array_function__ internals>:177(delete)
67 0.000 0.000 0.001 0.000 <__array_function__ internals>:177(mean)
......
------
13 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(sum)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
......
------
[ True True True True True True True True True True True True ......
Original Datasets:
time slice:
th Bootstrap sample
Repeat resampling times, , usually
denote the resampled data
Calculate
Centeral value
Error estimation
Jack-knife resample 🢩
Bootstrap resample 🢩
Parallel your code to do
: time slice indices
: cnfg indices
: time slices (w.r.t source)
: fit parameters 🢩
e.g. For two-point function (two-state fit)
- 🢩 mean value of the -th timeslices
- 🢩 standard deviation of the -th timeslices
python
array index starts from 0numpy
package has build-in mean
, std
, cov
functions
numpy
: numpy.mean
, numpy.std
, numpy.cov
support high rank tensors
axis
parameter in numpy functionR, R'
are numpy
arraysS=np.mean(R,axis=1)
🢪🢩 R*R'
🢩 R.shape
, S.shape
🢩 measure the deviation between data and model
least fit 🢩 optimize for minimal BUT resonable
lsqfit
🢩 a python package perform least -square fit
fit = lsqfit.nonlinear_fit(data=(var_val, dat_val),
fcn=model, prior=ini_val, debug=False)
var_val
: value of variables 🢩 -axisdat_val
: value of datas (with erros) 🢩 -axis (with err/cov)model
: fit model functionini_val
: initial values of (also constrains on) fit parameters in model
lsqfit
- an example#!/usr/bin/env python3
# encoding: utf-8
import gvar as gv
import lsqfit
import numpy as np
import matplotlib.pyplot as plt
c2pt_raw = np.load("./data_2pt.npy")
c2pt = c2pt_raw[:, 0, :, -2]
Ncnfg = c2pt.shape[0]
T = c2pt.shape[1]
T_hlf = T//2
#? jack-knife resample
#! only used in plot , compare original data with fit
c2pt_sum = np.sum(c2pt,axis=0)
c2pt_jcknf = ((c2pt_sum) - c2pt)/(Ncnfg-1) # jack-knife resample
c2pt_cntrl = np.mean(c2pt_jcknf,axis=0) # jack-knife mean
c2pt_err = np.sqrt(Ncnfg-1)*np.std(c2pt_jcknf,axis=0) # jack-knife std
#? drop the first data point, normalized to c2[1] ~ 1
t_ary = np.array(range(1,T))
c2pt_jcknf = c2pt_jcknf/c2pt_cntrl[1]
c2pt_cntrl = np.mean(c2pt_jcknf,axis=0) # jack-knife mean
c2pt_err = np.sqrt(Ncnfg-1)*np.std(c2pt_jcknf,axis=0) # jack-knife std
#? average forward/backward propgation, for better symmetry
c2pt_jcknf_fwrd = c2pt_jcknf[:,1:T_hlf+1] #? forward propgation part
c2pt_jcknf_bwrd = c2pt_jcknf[:,T_hlf:][:,::-1] #? backward propgation part
c2pt_jcknf_avg = (c2pt_jcknf_fwrd+c2pt_jcknf_bwrd)/2 #? average forward/backward propgation
c2pt_avg_cntrl = np.mean(c2pt_jcknf_avg,axis=0) # jack-knife mean
c2pt_avg_cov = (Ncnfg-1)*np.cov(np.transpose(c2pt_jcknf_avg,axes=(1,0))) # jack-knife covariance
T_strt = 10 #? starting point of the fit
T_end = 30 #? ending point of the fit
tsep_dctnry = {'c2pt': t_ary[T_strt:T_end]}
c2pt_dctnry = {'c2pt': gv.gvar(c2pt_avg_cntrl[T_strt:T_end], c2pt_avg_cov[T_strt:T_end, T_strt:T_end])}
def ft_mdls(t_dctnry, p):
mdls = {}
ts = t_dctnry['c2pt']
mdls['c2pt'] = p['c0']*np.exp(-p['E0']*T_hlf/2)*np.cosh(p['E0']*(ts-T_hlf))
return mdls
ini_prr = gv.gvar({'c0': '0(5)','E0': '0.2(0.5)'}) #? initial value
fit = lsqfit.nonlinear_fit(data=(tsep_dctnry, c2pt_dctnry), fcn=ft_mdls, prior=ini_prr, debug=True) #? excute the fit
print(fit.format(True)) #? print out fit results
#? time slices used in fit
t_ary = fit.data[0]['c2pt']
#? data to be fitted
c2pt_avg_dat_gvar = fit.data[1]['c2pt']
c2pt_avg_dat_cntrl = np.array([c2.mean for c2 in c2pt_avg_dat_gvar])
c2pt_avg_dat_err = np.array([c2.sdev for c2 in c2pt_avg_dat_gvar])
#? fitted function values
t_lst = np.linspace(10, T-10, 50)
c2pt_fit_fcn_gvar = fit.fcn({'c2pt':t_lst}, fit.p)['c2pt']
c2pt_fit_fcn_cntrl = np.array([c2.mean for c2 in c2pt_fit_fcn_gvar])
c2pt_fit_fcn_err = np.array([c2.sdev for c2 in c2pt_fit_fcn_gvar])
#? plots
fig, ax = plt.subplots(1,1, figsize=(10, 7*0.5))
ax.errorbar(np.array(range(0,T)),c2pt_cntrl,yerr=c2pt_err,fmt='bo',alpha=0.4,label="$C_2$") #? plot original data
ax.errorbar(t_ary,c2pt_avg_dat_cntrl,yerr=c2pt_avg_dat_err,fmt='go',alpha=0.5,label="frwrd/bckwrd avg. $C_2$") #? plot forward/backward averaged data
ax.plot(t_lst,c2pt_fit_fcn_cntrl,color="b",label="best fit") #? plot fitted function
ax.fill_between(t_lst,c2pt_fit_fcn_cntrl-c2pt_fit_fcn_err,c2pt_fit_fcn_cntrl+c2pt_fit_fcn_err,alpha=0.3) #? plot fitted function errorband
plt.yscale("log")
plt.xlabel('t/a')
plt.ylabel('$C_2$')
ax.legend(loc='upper center', fontsize=10, frameon=True, fancybox=True, framealpha=0.8, borderpad=0.3, \
ncol=1, markerfirst=True, markerscale=1, numpoints=1, handlelength=1.5)
fig.subplots_adjust(left=0.15, right=0.9, top=0.9, bottom=0.15)
fig.savefig("c2pt_fit.png")
var_val
, dat_val
, model
, ini_val
: dictionary
dictionary {index_1: value_1, index_2: value_2, ... }
key
value
Why wrap data in dictionaries?
Combined fit: e.g. 2 data sets, 2 models (with shared parameters)
label | label1 | label2 |
---|---|---|
data | ||
model | ||
ini_val | , |
Wraping data for combined fit
var_val = {'label1': [x1_1,x1_2,...], 'label2': [x2_1,x2_2,...
dat_val = {'label1': [y1_1,y1_2,...], 'label2': [y2_1,y2_2,...
prior = {'p_1': p_1,'p_2': p_2, ..., 'q_1': q_1, 'q_2': q_2, ... ,'r_1': r_1, 'r_2': r_2}
y_1,2
: gvar class 🢩 y_1 = gvar(y_1_cntrl, y_1_cov)
...
p_1,2
: gvar class 🢩 p_1 = gvar(p_1_cntrl, p_1_err)
...
Feed the var_val
, dat_val
and prior
to lsqfit.nonlinear_fit
Wrapping model for combined fit
def fit_mdls(t_dctnry, p):
mdls = {}
mdls['label_1'] = ...
mdls['label_2'] = ...
return mdls
Put multiple source in time direction to increase the statistics
Average over sources within each config (iog file)
Example of open boundary condition, two sources
Least Square Fit:
chi2/dof [dof] = 0.96 [20] Q = 0.5 logGBF = 116.43
Parameters:
c0 0.252 (10) [ 0.0 (5.0) ]
E0 0.1090 (27) [ 0.20 (50) ]
Error has been magnified with factor of 5 in figure
Fited effective mass:
Least Square Fit:
chi2/dof [dof] = 0.96 [20] Q = 0.5 logGBF = 116.43
Parameters:
c0 0.252 (10) [ 0.0 (5.0) ]
E0 0.1090 (27) [ 0.20 (50) ]
Fit:
key y[key] f(p)[key]
----------------------------------------
c2pt 0 0.277 (12) 0.2712 (79)
1 0.248 (11) 0.2435 (74)
2 0.223 (11) 0.2186 (69)
3 0.199 (10) 0.1964 (66)
4 0.1790 (94) 0.1765 (62)
5 0.1610 (87) 0.1586 (59)
6 0.1446 (79) 0.1427 (56)
7 0.1304 (75) 0.1284 (54)
8 0.1180 (71) 0.1157 (51)
9 0.1073 (68) 0.1044 (49)
10 0.0981 (68) 0.0943 (47)
11 0.0900 (69) 0.0853 (44)
12 0.0825 (68) 0.0773 (42)
13 0.0758 (66) 0.0703 (41)
14 0.0701 (64) 0.0641 (39)
15 0.0649 (62) 0.0586 (37) *
16 0.0603 (61) 0.0539 (36) *
17 0.0562 (58) 0.0498 (35) *
18 0.0523 (55) 0.0462 (33) *
19 0.0490 (52) 0.0433 (32) *
Settings:
svdcut/n = 1e-12/0 tol = (1e-08,1e-10,1e-10*) (itns/time = 9/0.1)
fitter = scipy_least_squares method = trf
Print this detailed log by setting print(fit.format(True))
fit = lsqfit.nonlinear_fit(data=(tsep_dctnry, c2pt_dctnry),
fcn=ft_mdls, prior=ini_prr, debug=True)
c2pt_fit_gvar = fit.fcn({'c2pt':t_lst}, fit.p)['c2pt']
tsep_dctnry = {'c2pt':np.array(range(T_strt,T_end))}
fit.p
🢩 fitting parameter, dictionry
fit.p['E0']
Case 1
Least Square Fit:
chi2/dof [dof] = 13 [10]
Q = 5.2e-24
logGBF = -74.281
Underfit: Model failed to discribe data
Case 2
Least Square Fit:
chi2/dof [dof] = 0.07 [10]
Q = 1
logGBF = -46.654
Case 3
Least Square Fit:
chi2/dof [dof] = 1 [10]
Q = 0.41
logGBF = -15.05
lsqfit
logLeast Square Fit:
chi2/dof [dof] = 1 [10] Q = 0.41 logGBF = -15.05
chi2/dof
: better around
Q
: The probability that the from the fit could have been larger, by chance, assuming the best-fit model is correct.
logGBF
: 🢩 Probability of obtaining the data by randomly sampling the fitted model
logGBF
is the one preferred by the data.c2pt_cntrl
, c2pt_cov
, t_ary
v.s. a
, b
, t
home
directory/dssg/home/acct-phyww/phyww/qazhang/training_camp/class1_xiong/Exercise_1
to the created directorylsqfit
to fit the pion's 2pt data and determine the gournd state energyiog_reader
, pandas.DataFrame.loc
,pandas.DataFrame.to_numpy
, numpy.mean
, numpy.cov
, numpy.std
, lsqfit.nonlinear_fit
Data
containing the 50 two-point function iog data files, from 50 configuration
72 lattice spacings in time direction
iog_reader
is provided in iog_read_demo.py
x=gv.gvar('2(0.5)')
: x.mean
㉱ 2.00
, x.sdev
🢩 0.50
Construct gvar from tuple x=gv.gvar((2.0.5))
Python
indices starts from 0, Mathmatica, Julia starts from 1
numpy.ndarray
slice, range doe not include the last index e.g.
a[2:5]
🢩 [a2,a3,a4]range(2,5)
🢩 [2,3,4]iog_read.py
returns pandas.Dataframe
, columns in this data frame are strings (not integers)
command --help
and man command
man
for "mannual"man --help
, man man
/
pwd
(present working directory)/home/chimaoshu/Nutstore_Files
pwd
./demo/c2pt_fit.png
, ../LPC_trnsvrsty
~
🢩 /home/USR
.
..
, also in combination ../..
cd path
ls path
touch file_name
🢩 create under PWDtouch path/file_name
🢩 create under path
mkdir dir_name
🢩 create under PWDmkdir path/dir_name
🢩 create under path
rm file_name
, rm path/file_name
rm -r dir_name
rm -r path/dir_name
rm -rf
cp
and move(rename) mv
cat
: show contents of a file
cat ./demo/lsqfit_2pt_iog.py
grep
: catch specified contents of a file
cat ./demo/lsqfit_2pt_iog.py | grep gvar
'|' : pipe
>
: redirect output stream
./lsqfit_2pt.py > lsq.log 2>&1
2>&1
append error message
tar
tar --help
ssh usr@host
ssh phyzjun-user3@sylogin.hpc.sjtu.edu.cn
usr
: username, host
: remote machinekey
instead of password
ssh-keygen
: create RSA key (public and private)id_rsa.pub
to othersid_rsa
privatessh-copyid usr@host
: install your pubkey to host~/.ssh/config
Host sysjtu
HostName sylogin.hpc.sjtu.edu.cn
User phyzjun-user3
IdentityFile /home/chimaoshu/.ssh/id_rsa
AddKeysToAgent yes
ssh sysjtu
powershell
has built-in ssh.exe
, ssh-keygen.exe
, scp.exe
ssh-copy-id
equivlent: type path_to_id_rsa.pub | ssh sysjtu "cat >> ~/.ssh/authorized_keys"scp source target
, (scp
= s
(ssh) + cp
) e.g.
scp ./demo/lsqfit_2pt.py sysjtu:~/demo
scp -r sysjtu:~/demo /home/chimaoshu
sysjtu
could be replaced byusr@host
if.ssh/config
is not set
vi, vim
Q: How do you generate a random string?
A: Put a Windows user in front of vi, and tell them to exit
Download to local 🢩 edit locally 🢩 upload to remote 🢩
compile/excute 🢩 download to local, ...
Recommanded: vscode + remote development extension
https://code.visualstudio.com/docs/remote/ssh-tutorial
better set.ssh/config
Extensions
Extensions
module avail
: list avaliable modulesmodule load mod
: load modmodule list
: list currenly loaded modulesmodule purge
: unload all modulesh5glance
: installed via pip
: pip3 install h5glance
h5glance h5_file [path]
path
: unix like path to data (string)h5glance --attrs h5_file [path]
h5glance --attrs h5_file path
h5glance -s slice --attrs h5_file path
slice
: numpy
like slice, e.g: 2,10:15
HDF5: Hierarchical Data Format Version 5
High-performance data management and storage suite
Utilize the HDF5 high performance data software library and file format to manage, process, and store your heterogeneous data. HDF5 is built for fast I/O processing and storage.
Open source library
Supported by C/C++, Julia, Mathmetaica, Python ...
In[1]:= c = 1 + 0.2*I;
In[2]:= Export["cmplx.h5", "/cmplx" -> c];
In[3]:= Import["cmplx.h5", {"Datasets", "/cmplx"}]
Out[4] = <|"Re" -> 1., "Im" -> 0.2|>
gzip
h5py
by setting filter
h5serv
Python and HDF5, Andrew Collette, ISBN: 9781449367831 (2013)
Based on iog.so
, iog_reader.py
, requiring h5py
python package
iog2h5_2pt.py: convert 2pt iog files to h5
iog2h5_3pt.py: convert 3pt iog files to h5
provided on demand
The last comment block of each slide will be treated as slide notes. It will be visible and editable in Presenter Mode along with the slide. [Read more in the docs](https://sli.dev/guide/syntax.html#notes)