A Study on Privacy Preserving Big Data Mining: Techniques and Challenges

anuradha dahiya

doi:10.26483/ijarcs.v13i5.6906

PDF

Published: Oct 21, 2022

DOI: https://doi.org/10.26483/ijarcs.v13i5.6906

Keywords:

Big data, Data mining, PPDM, Anonymization, Cryptography, Perturbation.

anuradha dahiya

kanya mahavidyalaya, kharkhoda

Abstract

The basic goal of data mining algorithms is to extract previously undiscovered patterns from the data. When mining the data, sensitive and confidential information should be secured simultaneously to protect privacy. Due to the widespread use of information technology, enormous amounts of data are being produced at an exponential rate by several organisations, including hospitals, insurance providers, banks, e-commerce, and stock exchanges, making privacy a crucial concern in data mining. Anonymization, Perturbation, Generalization, and Cryptography are some of the privacy-preserving data mining techniques that have been proposed in the literature. In this study, we have reviewed all of these state of artÂ techniques and presented a tabular comparison of work done by different authors as well as discussed the challenges of privacy preserving data mining.

Downloads

Download data is not yet available.

Issue

Vol. 13 No. 5 (2022): September-October 2022

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

M. Chen, S. Mao, and Y. Liu, â€˜Big Data: A Surveyâ€™, Mob. Netw. Appl., vol. 19, no. 2, pp. 171â€“209, Apr. 2014, doi: 10.1007/s11036-013-0489-0.

S. Yu, â€˜Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Dataâ€™, IEEE Access, vol. 4, pp. 2751â€“2763, 2016, doi: 10.1109/ACCESS.2016.2577036.

â€˜The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdfâ€™. Accessed: Jul. 06, 2022. [Online]. Available: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf

R. Mendes and J. P. Vilela, â€˜Privacy-Preserving Data Mining: Methods, Metrics, and Applicationsâ€™, IEEE Access, vol. 5, pp. 10562â€“10582, 2017, doi: 10.1109/ACCESS.2017.2706947.

J. Marques and J. Bernardino, â€˜Analysis of Data Anonymization Techniquesâ€™:, in Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Budapest, Hungary, 2020, pp. 235â€“241. doi: 10.5220/0010142302350241.

P. Samarati and L. Sweeney, â€˜Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppressionâ€™, p. 19.

A. Kiran and N. Shirisha, â€˜K-Anonymization approach for privacy preservation using data perturbation techniques in data miningâ€™, Mater. Today Proc., Jun. 2022, doi: 10.1016/j.matpr.2022.05.117.

S. Madan and P. Goswami, â€˜Adaptive Privacy Preservation Approach for Big Data Publishing in Cloud using k-anonymizationâ€™, Recent Adv. Comput. Sci. Commun. Former. Recent Pat. Comput. Sci., vol. 14, no. 8, pp. 2678â€“2688, Oct. 2021, doi: 10.2174/2666255813999200630114256.

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, â€˜L -diversity: Privacy beyond k -anonymityâ€™, ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, p. 3, Mar. 2007, doi: 10.1145/1217299.1217302.

B. B. Mehta and U. P. Rao, â€˜Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishingâ€™, J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1423â€“1430, Apr. 2022, doi: 10.1016/j.jksuci.2019.08.006.

O. Temuujin, J. Ahn, and D.-H. Im, â€˜Efficient L-Diversity Algorithm for Preserving Privacy of Dynamically Published Datasetsâ€™, IEEE Access, vol. 7, pp. 122878â€“122888, 2019, doi: 10.1109/ACCESS.2019.2936301.

N. Li, T. Li, and S. Venkatasubramanian, â€˜t-Closeness: Privacy Beyond k-Anonymity and l-Diversityâ€™, in 2007 IEEE 23rd International Conference on Data Engineering, Apr. 2007, pp. 106â€“115. doi: 10.1109/ICDE.2007.367856.

D. Roy and S. Jena, â€˜Determining t in t-closeness using Multiple Sensitive Attributesâ€™, Int. J. Comput. Appl., vol. 70, pp. 47â€“51, May 2013, doi: 10.5120/12179-8291.

N. Nasiri and M. Keyvanpour, â€˜Classification and Evaluation of Privacy Preserving Data Mining Methodsâ€™, in 2020 11th International Conference on Information and Knowledge Technology (IKT), Dec. 2020, pp. 17â€“22. doi: 10.1109/IKT51791.2020.9345620.

D. Liestyowati, â€˜Public Key Cryptographyâ€™, J. Phys. Conf. Ser., vol. 1477, no. 5, p. 052062, Mar. 2020, doi: 10.1088/1742-6596/1477/5/052062.

K. Munjal and R. Bhatia, â€˜A systematic review of homomorphic encryption and its contributions in healthcare industryâ€™, Complex Intell. Syst., May 2022, doi: 10.1007/s40747-022-00756-z.

J. Liu, Y. Tian, Y. Zhou, Y. Xiao, and N. Ansari, â€˜Privacy preserving distributed data mining based on secure multi-party computationâ€™, Comput. Commun., vol. 153, pp. 208â€“216, Mar. 2020, doi: 10.1016/j.comcom.2020.02.014.

N. Patel and S. Patel, â€˜A Study on Data Perturbation Techniques in Privacy Preserving Data Miningâ€™, vol. 02, no. 09, p. 6.

A. Shah and R. Gulati, â€˜Evaluating applicability of perturbation techniques for privacy preserving data mining by descriptive statisticsâ€™, in 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Sep. 2016, pp. 607â€“613. doi: 10.1109/ICACCI.2016.7732113.

K. Chen and L. Liu, â€˜Geometric data perturbation for privacy preserving outsourced data miningâ€™, Knowl. Inf. Syst., vol. 29, no. 3, pp. 657â€“695, Dec. 2011, doi: 10.1007/s10115-010-0362-4.

A. Siddhpura and P. D. V. Vekariya, â€˜An approach of Privacy Preserving Data mining using Perturbation & Cryptography Techniqueâ€™, Int. J. Future Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, Art. no. 4, Apr. 2018.

J. Vaidya, B. Shafiq, W. Fan, D. Mehmood, and D. Lorenzi, â€˜A Random Decision Tree Framework for Privacy-Preserving Data Miningâ€™, IEEE Trans. Dependable Secure Comput., vol. 11, no. 5, pp. 399â€“411, Sep. 2014, doi: 10.1109/TDSC.2013.43.

R. Kaur and M. Bansal, â€˜Transformation approach for boolean attributes in privacy preserving data miningâ€™, in 2015 1st International Conference on Next Generation Computing Technologies (NGCT), Sep. 2015, pp. 644â€“648. doi: 10.1109/NGCT.2015.7375200.

A. S. M. T. Hasan, Q. Jiang, J. Luo, C. Li, and L. Chen, â€˜An effective value swapping method for privacy preserving data publishing: An effective value swapping method for privacy preserving data publishingâ€™, Secur. Commun. Netw., vol. 9, Jul. 2016, doi: 10.1002/sec.1527.

K. Abrar Ahmed, Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Chennai â€“ 600017, Tamil Nadu, India, H. Abdul Rauf, and Sree Sastha Institute of Engineering and Technology, Chennai â€“ 600113, Tamil Nadu, India, â€˜Privacy Preserving Data using Fuzzy Hybrid Data Transformation Techniqueâ€™, Indian J. Sci. Technol., vol. 10, no. 24, pp. 1â€“6, Jun. 2017, doi: 10.17485/ijst/2017/v10i24/114039.

G. Li and R. Xue, â€˜A New Privacy-Preserving Data Mining Method Using Non-negative Matrix Factorization and Singular Value Decompositionâ€™, Wirel. Pers. Commun., vol. 102, no. 2, pp. 1799â€“1808, Sep. 2018, doi: 10.1007/s11277-017-5237-5.

A. Kiran and D. D. Vasumathi, â€˜Data Mining: Random Swapping based Data Perturbation Technique for Privacy Preserving in Data Miningâ€™, DATA Min., vol. 8, no. 1, p. 15, 2019.

D. Vashi, H. B. Bhadka, K. Patel, and S. Garg, â€˜An Efficient Hybrid Approach of Attribute Based Encryption For Privacy Preserving Through Horizontally Partitioned Dataâ€™, Procedia Comput. Sci., vol. 167, pp. 2437â€“2444, Jan. 2020, doi: 10.1016/j.procs.2020.03.296.

N. Kousika and K. Premalatha, â€˜An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbationâ€™, J. Supercomput., vol. 77, no. 9, pp. 10003â€“10011, Sep. 2021, doi: 10.1007/s11227-021-03643-5.

T. Jahan, G. R. Reddy, K. Shekhar, and M. Swapna, â€˜Novel hybrid geometric data perturbation technique by means of sampling data intervalsâ€™, Mater. Today Proc., Jul. 2021, doi: 10.1016/j.matpr.2021.06.420.

S. A. Abdelhameed, S. M. Moussa, N. L. Badr, and M. Essam Khalifa, â€˜The Generic Framework of Privacy Preserving Data Mining Phases: Challenges & Future Directionsâ€™, in 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Dec. 2021, pp. 341â€“347. doi: 10.1109/ICICIS52592.2021.9694174.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References